0% found this document useful (0 votes)
9 views

Robust Deepfake Detection Leveraging EfficientNet-B3 Backbone With Binary Classification Techniques

This document presents a deepfake detection approach utilizing EfficientNet-B3 as the backbone, enhanced with a custom binary classification head. The model, trained on a comprehensive dataset of authentic and manipulated images, achieved an accuracy of 94.51% in distinguishing real from fake content. It incorporates techniques such as Test-Time Augmentation, Binary Cross Entropy Loss, and the Adam optimizer to improve prediction reliability and efficiency.

Uploaded by

-Magic- Music-
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Robust Deepfake Detection Leveraging EfficientNet-B3 Backbone With Binary Classification Techniques

This document presents a deepfake detection approach utilizing EfficientNet-B3 as the backbone, enhanced with a custom binary classification head. The model, trained on a comprehensive dataset of authentic and manipulated images, achieved an accuracy of 94.51% in distinguishing real from fake content. It incorporates techniques such as Test-Time Augmentation, Binary Cross Entropy Loss, and the Adam optimizer to improve prediction reliability and efficiency.

Uploaded by

-Magic- Music-
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Robust Deepfake Detection Leveraging

EfficientNet-B3 Backbone with Binary


Classification Techniques
Swayam Arora Garima Singh
School of Engineering School of Engineering
Manav Rachna University, Manav Rachna University,
Faridabad, Haryana, India Faridabad, Haryana, India
[email protected] [email protected]

Varun Kumar Dr. Ranjna Jain


School of Engineering School of Engineering
Manav Rachna University, Manav Rachna University,
Faridabad, Haryana, India Faridabad, Haryana, India
[email protected] [email protected]

Abstract - In the 1940s and 1950s, the origin of artificial intelligence and deep learning as well as the neural network were introduced
respectively. Through human-like intelligence, various applications have been developing all over the world for a long time. One of
them is ‘Deepfake’ which is the ability to use digital tools and AI to create a vaguely realistic-looking video of a human. Deepfake
is a technique for human videos or image synthesis based on artificial intelligence. Deepfakes are created by coinciding and
integrating existing data which can be in the form of images, audio, or videos in contact with reference images, audio or videos
using the concept of deep learning which is also known as generative adversarial network. In this work, we present a deepfake
detection approach that utilizes EfficientNet as the core model, augmented with a custom classification head tailored for binary
classification tasks. The model is trained on a comprehensive dataset of both authentic and manipulated images, incorporating Test-
Time Augmentation (TTA) to enhance prediction reliability. Training is carried out using Binary Cross Entropy Loss, with
optimization handled by the Adam optimiser and a learning rate scheduler. The model achieved the highest accuracy of 94.51%,
representing its ability to distinguish between real and fake content effectively.

1. Introduction posting videos which use the face-switching technique and


technology to fit celebrities.
The world is progressing at a rapid pace. Every individual has the
access to technology today, which seemed impossible decades likeness. Deepfakes are produced by using different AI deep-
back. This easy accessibility has brought an ease of livelihood to learning algorithm which are Replication and Detection
everyone. Media is a major part of the technology that is widely algorithms.
used to express ourselves. Videos, audios, ideas, comments are the
prime medias that are used by almost every person. The visual 1.2 Deepfake Detection
media is a fascinating concept, but it also brings some tools and Deepfake Detection is a process of detecting and analysing a
techniques that make them more useful, appropriate, and visual media that is altered and modified from its original
expressive. Deepfake is a tool and a concept that is widely used for form with the use of deep learning.
the visual media in the recent era of technology.
As the technology of deepfake was introduced to the world,
1.1 Deepfake the misuse of deepfakes was grew rapidly. To detect these
Deepfake of a picture or video is the concept of altering the original alterations and recognize the rightful media, deepfake
form into another configuration, which looks realistic and it is just detection techniques were developed.
about impossible to differentiate with the naked eye, if the media is Initially Deepfake detection techniques primarily focused on
altered or not. After analysing, it is easy to build high quality identifying irregularities in combined images and videos.
deepfakes, but it is almost impossible to detect the deepfakes with These techniques detected irregularities in eye blinking
the naked eyes. The term ‘Deepfake’ came in 2017 to be used for patterns, lighting, colour and texture differences and
synthetic media. Reddit moderator created deepfakes and started instabilities in facial expressions. With time, machine
learning was advanced, detection techniques had evolved with the 2. Related Works
integration of Convolutional Neural Networks (CNNs) to automate
2.1 Detection using Deep Learning
the identification of deepfakes.
Comparative Analysis of Deep-Fake Algorithms
1.3 Impact of Deepfake
Comparison of Deepfake Detection Techniques through
a) The Good: Deep Learning

A Deepfake initiative involving soccer icon David Beckham The papers mainly examine the application of cutting-edge
depicted him communicating in nine different languages in a public deep learning architectures such as CNNs (e.g., VGG19,
service announcement aimed at raising awareness about malaria. MobileNetV2, InceptionV3, XceptionNet) and hybrid CNN
This innovative application of Deepfake technology enabled frameworks (e.g., InceptionResnet v2 integrated with
Beckham’s message to connect with a worldwide audience in their Xception). These models excel at evaluating high-
own languages, enhancing both engagement and effectiveness. dimensional visual features and maintaining temporal
consistency in deepfake videos. They are effective in both
In the world of films, Deepfake technology has facilitated the
individual frame assessments and overall evaluations of
recreation of performances without the need for extensive re-
videos. While deep learning techniques show potential, they
shoots, conserving both time and resources. A prominent example
must continually adapt to keep pace with the growing
of its application was in *Star Wars: The Rise of Skywalker*, where
complexity of deepfake content.a)
the late Carrie Fisher was digitally recreated to finish scenes, she
was unable to complete, paying tribute to her character in the 2.2 Detection using Machine Learning and Error-Level
narrative. Analysis

In *Furious 7*, filmmakers utilized Deepfake technology to finish Deepfake detection and classification using error‑level
Paul Walker's scenes after his tragic passing during the film's analysis
production. The crew brought in Walker's brothers as stand-ins,
This study utilizes conventional machine learning classifiers,
then used Deepfake and CGI methods to overlay Paul’s face onto
including SVM and KNN, alongside image preprocessing
theirs, achieving a convincing resemblance. This approach allowed
methods such as Error-Level Analysis (ELA) to identify
for a respectful conclusion to his character’s arc, honouring his
anomalies at the pixel level.
legacy and offering fans a heartfelt goodbye. The application of
Deepfake in this context showcased its ability to pay tribe to actors By applying ELA and utilizing deep feature extraction via
posthumously while ensuring continuity in the storyline. CNNs like ResNet18, the detection of digital alterations in
images is enhanced, resulting in satisfactory accuracy rates.
b) The Bad:
Nevertheless, these methods may face challenges when
During the 2020 Delhi Assembly elections, political parties utilized dealing with deepfake content that does not exhibit obvious
Deepfake technology to modify speeches. The BJP crafted a video visual irregularities.
featuring party leader Manoj Tiwari, making it seem like he was
2.3 Detection using Statistical and Computational
addressing audiences in both Haryanvi and English—languages he
did not originally speak—to attract a wider range of voters. This Imaging Approaches
sparked worries about political deception and the genuineness of Deep Learning for Deepfakes Creation and Detection
campaign content.
This study explores statistical models like the Expectation-
Numerous women in India, including public figures, have fallen Maximization (EM) algorithm and second
prey to non-consensual explicit Deepfake videos. These fabricated
videos, frequently circulated on social media, are utilized to bully order attention networks (SAN) to improve feature clarity
and slander individuals, resulting in psychological distress and and diminish noise in facial forgeries. These approaches
damage to their reputations. prioritize enhancing the quality of authentic images, which
assists in distinguishing between genuine and artificially
Deepfake videos are frequently shared on Indian social media generated visuals. Though beneficial for image verification
platforms to promote misinformation or communal narratives, purposes, these methods may struggle with extremely
particularly during periods of political strife. Such deceptive videos realistic deepfake images, requiring incorporation with more
can provoke public anger, generate confusion, and incite violence sophisticated deep learning models for effective detection.
or disorder.
2.4 Detection using Hybrid Approaches Combining Multiple
Techniques

Deepfake Detection Analyzing Hybrid Dataset Utilizing CNN


and SVM

This study introduces a hybrid methodology that integrates CNNs


with SVM for the purpose of identifying minor discrepancies
between authentic and generated images. By employing a
combination of real and synthetic datasets, this approach showcases
significant effectiveness by merging the feature extraction strengths
of deep learning with the classification accuracy of SVM. Such
hybrid approaches effectively tackle the shortcomings of models
that rely on a single method, resulting in improved accuracy,
particularly when detecting subtle alterations in deepfake content.

2.5 Holistic and Comparative Surveys on Deepfake Detection

a. Deep Insights of Deepfake Technology: A Review


b. The Emergence of Deepfake Technology: A Review
c. Comparative Analysis of Deep-Fake Algorithms
d. Comparison of Deepfake Detection Techniques through Deep
Learning

These survey articles explore a diverse array of methodologies,


categorize detection methods, and analyse essential datasets for the Figure 1. Architecture Flowchart
progress of deepfake detection (e.g., FaceForensics++, Celeb-DF,
FFHQ). They emphasize the advantages and disadvantages of
3.2 EfficientNet-B3 Backbone:
techniques such as CNNs, GANs, ELA, and statistical models. The
EfficientNet-B3 is a convolutional neural network
surveys also address the societal and ethical consequences of architecture known for its balance of accuracy and
deepfake technology, highlighting the importance of an integrated computational efficiency. It leverages a compound scaling
approach that merges detection technologies with regulatory technique, which adjusts the network's depth, width, and
measures and public awareness campaigns. These surveys serve as resolution in tandem to achieve optimal performance without
vital resources, directing future research avenues to tackle deepfake overloading computational resources. This balance makes
issues from both technical and regulatory standpoints. EfficientNet-B3 ideal for deepfake detection, where
3. Background precision is crucial, but model efficiency is also essential.

3.1 Overall Model Architecture Flowchart Theory:


Integrating all components, the overall architecture for this EfficientNet-B3 belongs to the EfficientNet family, which
deepfake detection model can be visualized as follows: introduces a unique scaling formula:

depth=𝛼 p ,width=β p ,resolution=y p


This flowchart captures the entire data flow, from input to the final
classification, showing how each layer and function contributes to where 𝛼, 𝛽 and 𝛾 are constants, and 𝑝 is a coefficient that
accurate deepfake detection. By leveraging EfficientNet-B3’s dictates scaling. This approach ensures a consistent
efficiency, GELU’s nuanced activation, binary classification’s improvement in accuracy without an exponential increase in
simplicity, and a well-calibrated loss and optimization strategy, this the computational load.
model achieves both accuracy and computational efficiency in
identifying deepfakes. In practice, EfficientNet-B3 acts as a powerful feature
extractor. Pre-trained on ImageNet, it is adept at identifying
intricate image characteristics, capturing textures, edges, and
finer details, which are crucial for distinguishing subtle
artifacts in deepfake images.
.

Figure 3. GELU Activation Flowchart


By applying GELU after the initial feature extraction, the
model can better refine complex, high-dimensional data,
which is critical for accurate classification.

3.4 Binary Classification Layer:


Figure 2. EfficientNet-B3 Flowchart
The Binary Classification Layer functions as the decision-
making part of the model, determining whether each image
EfficientNet-B3’s pre-trained features lay a strong foundation for is real or fake by utilizing a sigmoid function. The sigmoid
detecting fake elements in images, helping the model achieve high function is expressed as:
performance without sacrificing efficiency.
1
3.3 GELU Activation 𝜎(𝑥) = 3 5
1 + 𝑒 %#
The Gaussian Error Linear Unit (GELU) activation function plays a
key role in the architecture by enabling smooth and probabilistic
and the equation generates a probability score that ranges
activation. GELU is defined as:
between 0 to 1. Scores closer to 1 suggest a greater likelihood
! #
GELU(x) = 𝑥.𝑃(𝑋≤𝑥) = 𝑥."(1 + erf ) that the image is counterfeit, whereas scores nearing 0
√"
indicate that it is probably genuine.
where 𝑃(𝑋≤𝑥) represents the probability that a Gaussian random Theory:
variable which is represented as, XXX is less than or equal to xxx. The result from the binary classification layer produces a
In contrast to the ReLU function, which ignores all negative values, singular value representing the probability that the image is
GELU is capable of processing negative inputs by adjusting their identified as "fake". By applying the sigmoid function, the
significance based on their magnitude, which makes it especially model's output remains constrained and straightforward to
beneficial for deepfake detection, where subtle distinctions are interpret, enhancing training efficiency and promoting
crucial. stability in backpropagation.

Theory:
GELU allows the network to account for a broader range of input
values, smoothly adjusting activations in dense layers. This smooth
transition is essential for tasks like deepfake detection, where
distinguishing real and fake images requires a sensitivity to subtle
differences in features. GELU’s differentiable, smooth curve
enables the model to learn complex patterns, enhancing its
effectiveness in processing detailed features extracted by
EfficientNet-B3.
Figure 4. Binary Classification Flowchart

This concluding layer streamlines the problem into a binary


choice, refining the architecture for focused deepfake
identification.
Our proposed deepfake detection fra,mework is built around
3.5 Loss Function and Optimizer: the EfficientNet-B3 model as a core backbone for feature
The model employs Binary Cross-Entropy Loss (BCELoss) as its extraction, chosen for its high efficiency and strong
loss function, along with the Adam optimizer, paired with a learning performance on image classification tasks. EfficientNet-
rate scheduler to enhance the training process. B3’s compound scaling method,
Theory (BCELoss): Binary Cross-Entropy Loss quantifies the
difference between the predicted probability and the actual label (0 which balances depth, width, and resolution, allows for
for real and 1 for fake), penalizing the model more heavily for larger optimal performance without compromising computational
discrepancies. The formula is: efficiency. Building on this foundation, we add a binary
'
1 classification layer that distinguishes between authentic and
BCELoss = − >[𝑦& 𝑙𝑜𝑔(𝑦′& ) + (1 − 𝑦& )𝑙𝑜𝑔(1 − 𝑦′& )] manipulated media, making the model highly adaptable and
𝑁
&(! precise in recognizing deepfake artifacts. To enhance the
model’s sensitivity to subtle manipulations, we incorporate
where the true label is represented by 𝑦& , and 𝑦′& represents the attention mechanisms within the architecture. This allows
predicted probability. By penalizing incorrect predictions, BCELoss the model to prioritize regions commonly
promotes high accuracy in distinguishing real images from fake
ones. exhibit deepfake anomalies, such as facial landmarks, edges,
and textures. Integrating attention improves the model’s
Theory (Adam Optimizer): The Adam optimizer is popular for its ability to focus on discriminative features that are crucial for
adaptive learning rate, which expedites convergence and reduces detecting even advanced deepfake techniques.
training time. By adjusting the learning rate dynamically based on
past gradients, Adam allows the model to make subtle adjustments 4.2 Data Preparation and Augmentation
to parameters, improving its effectiveness in complex tasks like
deepfake detection. Given the variability in deepfake quality, the preprocessing
stage is designed to standardize input media while preserving
Learning Rate Scheduler: The learning rate scheduler fine-tunes the crucial detection cues. Input images and frames are resized
learning rate over time, gradually reducing it if improvements slow to match EfficientNet-B3’s input dimensions, followed by
down. This approach helps avoid overfitting and stabilizes the color normalization to standardize pixel values across the
model in later stages of training, allowing it to achieve a well- dataset. Additionally, data augmentation techniques—
optimized final solution. including rotation, flipping, brightness adjustment, and
cropping—are employed to create a diverse training set,
increasing the model’s robustness against variations in
lighting, orientation, and perspective.

Dataset:https://www.kaggle.com/datasets/peilwang/deepfak
e

In this setup, we introduce a multi-resolution training


approach, where the model processes inputs at varying
resolutions to improve resilience against adversarial attacks
and enhance feature generalization. By training with
multiple resolutions, the model can better learn both high-
Figure 5. Loss Function and Optimizer Flowchart
level contextual information and finer details that are
indicative of deepfake manipulations.
The combination of BCELoss, Adam, and a learning rate scheduler
supports the model in achieving high accuracy while maintaining
4.3 Training Strategy and Optimization Techniques
efficiency, fine-tuning its ability to differentiate real from fake
images.
To maximize the model’s performance, we adopt a multi-
phase training strategy, involving transfer learning, fine-
4. Proposed Methodology
tuning, and iterative optimization. Initially, the EfficientNet-
B3 model is pre-trained on a large-scale dataset (e.g.,
4.1 Model Architecture and Design
ImageNet), providing a robust feature extraction foundation.
The final layers of the model are then customized for binary datasets and deepfake generation techniques. The ensemble
classification, specifically adapted to deepfake detection. model is calibrated using a validation dataset to achieve
optimal weighting, ensuring consistent and reliable
The training process utilizes binary cross-entropy loss optimized via predictions.
the Adam optimizer, with an adaptive learning rate scheduler that
adjusts learning rates based on validation performance. 5. Results
Furthermore, early stopping is employed to
5.1 Model Performance Metrics
prevent overfitting, alongside data dropout and batch normalization In this study, our proposed model—built on the EfficientNet-
layers, which enhance stability and robustness. B3 architecture with added attention mechanisms—
demonstrated strong validation results, achieving an
To further refine model performance, we implement techniques accuracy of 94.51%. This high accuracy is a testament to the
such as focal loss, which adjusts the weighting of samples to model’s ability to differentiate genuine content from
prioritize challenging, hard-to-detect examples. Focal loss helps to deepfakes, capturing intricate distinctions that are critical for
balance class distributions and enhances the model’s ability to learn real-world applications. Additionally, the model’s Area
from subtle, non-obvious features. Under the Curve (AUC) score reached 93.81%, which
shows that it has a reliable capacity to classify deepfake and
4.4 Implementing Attention-Guided Binary Classification non-deepfake media correctly across varying thresholds.
Layer The model also recorded a true positive rate (TPR) at a false
positive rate (FPR) of 47.25%. While this result indicates a
The binary classification layer is designed to classify inputs as either strong overall performance, it highlights some areas where
authentic or deepfake, equipped with a dense layer and sigmoid the model could be improved, specifically in reducing false
activation. Here, we integrate attention-guided mechanisms, which positives. Nonetheless, the integration of attention-focused
weight the input features by relevance, allowing the model to features allowed the model to accurately identify regions that
emphasize areas that are more likely to contain deepfake commonly exhibit signs of manipulation—particularly in
manipulations. The attention mechanism directs the classification regions like eyes, lips, and facial textures where subtle
layer’s focus to potential regions of manipulation, enhancing changes often occur in deepfake media.
detection precision.

4.5 Cross-Modal Feature Fusion for Enhanced Detection

Our approach incorporates a cross-modal feature fusion step, where


image and metadata features are combined to enhance deepfake
detection accuracy. By including metadata cues such as frame
timestamps, resolution, and other contextual information, we bolster Figure 5. Training and Validation Data
the model’s ability to capture deeper patterns associated with
deepfake media. Cross-modal fusion offers a multi-dimensional 5.2 ROC Curve and AUC Analysis
perspective on the input, increasing the model’s accuracy and The ROC curve (illustrated in Figure X) reveals the model’s
adaptability across different manipulation styles. performance across various decision thresholds, with the
AUC of 93.81% underscoring its reliable ability to
4.6 Multi-Level Ensemble Strategy for Robustness differentiate between authentic and manipulated media. This
ROC curve provides a visual of how the model balances
To further increase the robustness of our detection framework, we between true positives and false positives, with the high
integrate a multi-level ensemble approach, where predictions from AUC value reinforcing that our approach is both sensitive
multiple instances of the EfficientNet-B3 model trained on different and specific—qualities that are essential for dependable
data subsets are combined. This ensemble strategy aggregates deepfake detection.
predictions from each model instance, smoothing out
inconsistencies and reducing the impact of outliers or noise in the
data.

This multi-level ensemble strategy also aids in generalizing the


model’s performance, making it more resilient to variations across
Figure 6. ROC Curve
Figure 7. Confusion Matrix
5.3 Comparison with Current Methods
To assess the significance of our model, we benchmarked it against 6. Conclusion
other prominent deepfake detection techniques. Our model’s
accuracy of 94.51% and AUC of 93.81% compare favorably, Deepfake technology, while innovative, poses substantial
achieving results that are competitive with or challenges due to its capacity to create highly realistic
superior to many standard models, such as those based on CNNs or manipulated media. Addressing this, our study proposes an
RNNs. By leveraging the EfficientNet-B3 backbone along with a efficient detection approach that integrates an EfficientNet-
focused attention mechanism, the model achieved approximately 2- B3 backbone, GELU activation, binary classification layer,
3% higher accuracy than several traditional approaches. This and Binary Cross-Entropy Loss optimized by the Adam
improvement validates the importance of our chosen architecture algorithm with a learning rate scheduler.
and attention layer, both of which enhance the model’s ability to
detect nuanced features that might otherwise be overlooked. The EfficientNet-B3 backbone offers a balanced architecture
Table Y provides a summary of this comparative performance, for efficient and accurate feature extraction, leveraging
highlighting the superiority of our method in terms of both accuracy compound scaling. This is complemented by GELU
and detection consistency. activation, which enhances nuanced feature representation,
critical for identifying subtle artifacts in manipulated images.
5.4 Analysis of Misclassifications
An examination of the model’s misclassifications highlights some The binary classification layer simplifies detection by
specific challenges. These errors tended to occur in cases with outputting a probability that an image is real or fake, aiding
lower-resolution or highly compressed media, where characteristic interpretability and precision. For optimization, Binary
deepfake signals are not as clearly defined. Some instances of false Cross-Entropy Loss with the Adam optimizer and a learning
negatives appeared in cases where manipulations were highly subtle rate scheduler ensures adaptive learning, reducing overfitting
or well-disguised. The attention-guided layer contributed to risks while improving classification accuracy.
reducing these errors by emphasizing critical facial regions, though
there remains room for improvement to further lower the rate of In summary, this method effectively combines feature-rich
misclassifications in ambiguous cases. architecture, adaptive optimization, and precise
classification, providing a robust solution to distinguish
5.6 Impact of Cross-Modal Fusion genuine media from deepfakes and addressing the pressing
The integration of cross-modal fusion, which combines metadata need for reliable detection technology in the digital age.
with visual data, was instrumental in improving the model’s
accuracy, particularly in cases where data quality posed a challenge. 7. References
In tests with lower-resolution or ambiguous samples, the metadata
features helped the model achieve more accurate predictions, raising [1] Mahmud, Bahar & Sharmin, Afsana. (2020). Deep
its true positive rate even in difficult cases. This multi-modal Insights of Deepfake Technology : A Review.
approach contributed to a notable improvement in overall
performance, making the model adaptable and reliable across
various media types.
[2] Rafique, Rimsha et al. “Deep fake detection and classification [12]Ebrima Hydara, Masato Kikuchi, Tadachika Ozono,
using error-level analysis and deep learning.” Scientific reports vol. "Empirical Assessment of Deepfake Detection: Advancing
13,1 7422. 8 May. 2023, doi:10.1038/s41598-023-34629-3 Judicial Evidence Verification Through Artificial
Intelligence", IEEE Access, vol.12, pp.151188-151203,
[3] Westerlund, Mika. (2019). The Emergence of Deepfake 2024.
Technology: A Review. Technology Innovation Management
Review. 9. 39-52. 10.22215/timreview/1282. [13] Amaan M. Kalemullah, Prakash P, Sakthivel V,
"Deepfake Classification For Human Faces using Custom
[4] Nguyen, Thanh & Hung, Nguyen & Nguyen, Tien & Nguyen, CNN", 2024 7th International Conference on Circuit Power
Duc & Huynh-The, Thien & Nahavandi, Saeid & Nguyen, Tam & and Computing Technologies (ICCPCT), vol.1, pp.744-750,
Pham, Viet & Nguyen, Cuong M.. (2022). Deep learning for 2024.
deepfakes creation and detection: A survey. Computer Vision and
Image Understanding.223.103525.10.1016/j.cviu.2022.103525. [14] Groh, M., Sankaranarayanan, A., Singh, N. et al. Human
detection of political speech deepfakes across transcripts,
audio, and video. Nat Commun 15, 7629 (2024).
https://doi.org/10.1038/s41467-024-51998-z
[5]Sontakke, Nikhil & Utekar, Sejal & Rastogi, Shivansh &
Sonawane, Shriraj. (2023). Comparative Analysis of Deep-Fake [15] Y. Zhai, T. Luan, D. Doermann, and J. Yuan, “Towards
Algorithms. 10.48550/arXiv.2309.03295.
generic image manipulation detection with weakly-
supervised self-consistency learning,” in ICCV, 2023.
[6] Khatri, Nishika & Borar, Varun & Garg, Rakesh. (2023). A
Comparative Study: Deepfake Detection Using Deep-learning. 1-5.
[16] Z. Sun, Y. Chen, and S. Xiong, “Ssat: A symmetric
10.1109/Confluence56041.2023.10048888.
semantic-aware transformer network for makeup transfer
and removal,” in AAAI, 2022.
[7] Heidari, Arash & Navimipour, Nima & Dag, Hasan & Unal,
Mehmet. (2023). Deepfake detection using deep learning methods:
[17]R. Jain, K. K. Singh, M. Hemani, J. Lu, M. Sarkar,
A systematic and comprehensive review. Wiley Interdisciplinary
D. Ceylan, and B. Krishnamurthy, “Vgflow: Visibility
Reviews: Data Mining and Knowledge Discovery. 14.
guided flow network for human reposing,” in CVPR, 2023.
10.1002/widm.1520.

[8] Mukta, Saddam & Ahmad, Jubaer & Raiaan, Mohaimenul & [18] B. Lei, K. Yu, M. Feng, M. Cui, and X. Xie,
Islam, Salekul & Azam, Sami & Ali, Mohammed Eunus & “Diffusiongan3d: Boosting text-guided 3d generation and
Jonkman, Mirjam. (2023). An Investigation of the Effectiveness of domain adaption by combining 3d gans and diffusion
Deepfake Models and Tools. Journal of Sensor and Actuator priors,” arXiv, 2023.
Networks. 12. 61. 10.3390/jsan12040061. [9] R. Kumar, P. Jaiswal,
and S. Jaiswal, "Deep insights of deepfake technology: A [19] C. Feng, Z. Chen, and A. Owens, “Self-supervised
review," ResearchGate,2021.[Online].Available: https://doi.org/10 video forensics by audio-visual anomaly detection,”
.3390/jsan12040061 in CVPR, 2023.

[10] S. T. Ikram, P. V, S. . Chambial, D. . Sood, and A. V, “A [20] D. Cozzolino, A. Pianese, M. Nießner, and
Performance Enhancement of Deepfake Video Detection through L. Verdoliva, “Audio-visual person-of-interest deepfake
the use of a Hybrid CNN Deep Learning Model”, IJECES, vol. 14, detection,” in CVPR, 2023.
no. 2, pp. 169-178, Feb. 2023.
[21] A. Ciamarra, R. Caldelli, F. Becattini, L. Seidenari, and
[11] Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, A. Del Bimbo, “Deepfake detection by exploiting surface
Aythami Morales, Javier Ortega-Garcia, Deepfakes and beyond: A anomalies: the surfake approach,” in WACV, 2024.
Survey of face manipulation and fake detection,Information Fusion,
Volume 64, 2020, Pages 131-148, ISSN 1566-2535, [22]T. Wang and K. P. Chow, “Noise based deepfake
https://doi.org/10.1016/j.inffus.2020.06.014. detection via multi-head relative-interaction,” in AAAI,
2023.
[23] Z. Guo, Z. Jia, L. Wang, D. Wang, G. Yang, and N. Kasabov,
“Constructing new backbone networks via space-frequency [25] P. Zhou, L. Xie, B. Ni, and Q. Tian, “Cips-3d++: End-
interactive convolution for deepfake detection,” TIFS, 2023. to-end real-time high-resolution 3d-aware gans for gan
inversion and stylization,” TPAMI, 2023.
[24] S. Aneja, J. Thies, A. Dai, and M. Nießner, “Clipface: Text-
guided editing of textured 3d morphable models,” in SIGGRAPH,
2023..

You might also like