Unmasking Deepfakes: Leveraging Augmentations and Features Variability for Deepfake Speech Detection

Rimon, Inbal; Gal, Oren; Permuter, Haim

Computer Science > Sound

arXiv:2501.05545 (cs)

[Submitted on 9 Jan 2025 (v1), last revised 13 Nov 2025 (this version, v2)]

Title:Unmasking Deepfakes: Leveraging Augmentations and Features Variability for Deepfake Speech Detection

Authors:Inbal Rimon, Oren Gal, Haim Permuter

View PDF HTML (experimental)

Abstract:Deepfake speech detection presents a growing challenge as generative audio technologies continue to advance. We propose a hybrid training framework that advances detection performance through novel augmentation strategies. First, we introduce a dual-stage masking approach that operates both at the spectrogram level (MaskedSpec) and within the latent feature space (MaskedFeature), providing complementary regularization that improves tolerance to localized distortions and enhances generalization learning. Second, we introduce compression-aware strategy during self-supervised to increase variability in low-resource scenarios while preserving the integrity of learned representations, thereby improving the suitability of pretrained features for deepfake detection. The framework integrates a learnable self-supervised feature extractor with a ResNet classification head in a unified training pipeline, enabling joint adaptation of acoustic representations and discriminative patterns. On the ASVspoof5 Challenge (Track~1), the system achieves state-of-the-art results with an Equal Error Rate (EER) of 4.08% under closed conditions, further reduced to 2.71% through fusion of models with diverse pretrained feature extractors. when trained on ASVspoof2019, our system obtaining leading performance on the ASVspoof2019 evaluation set (0.18% EER) and the ASVspoof2021 DF task (2.92% EER).

Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2501.05545 [cs.SD]
	(or arXiv:2501.05545v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2501.05545

Submission history

From: Inbal Rimon [view email]
[v1] Thu, 9 Jan 2025 19:31:10 UTC (558 KB)
[v2] Thu, 13 Nov 2025 09:49:59 UTC (388 KB)

Computer Science > Sound

Title:Unmasking Deepfakes: Leveraging Augmentations and Features Variability for Deepfake Speech Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Unmasking Deepfakes: Leveraging Augmentations and Features Variability for Deepfake Speech Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators