Page 81 - 2025S
P. 81
74 UEC Int’l Mini-Conference No.54
Detection Model for Audio Deepfakes
Using Self-Supervised Learning to Prevent
Identity Spoofing Attacks
¹Medina Castro Maria Vianney, ²Prof. Minami Yasuhiro
¹Instituto Politecnico Nacional, Mexico, UEC Exchange Study Program JUSST,
Department of Computer and Network Engineering, ²Graduate School of Informatics and Engineering
The University of Electro-Communications, Japan
INTRODUCTION
Voice cloning techniques based on artificial intelligence have
advanced significantly, enabling the generation of synthetic
audio that is nearly indistinguishable from real human a)
voices. These artificial voices, known as audio deepfakes,
pose a growing threat to digital security and the authenticity
of communications, as they can be used to carry out identity
spoofing, manipulate conversations, or spread
misinformation.
METHODOLOGY b)
Fig. 3 Visualization of the frequency spectrum over time for
a real audio signal (a) and a fake one (b).
EXPERIMENTAL RESULTS AND CONCLUSIONS
The system correctly distinguished between
genuine and synthetic audio with high accuracy.
Representations extracted with HuBERT were used
for classification.
The results achieved an f1-score of up to 0.905 and
showed a low error rate.
Fig. 1 Process of extracting features from an audio signal
using HuBERT.
1410 3
The processing flow of an audio signal is shown, up to the
obtaining of embeddings using a CNN encoder and a
transformer. 8 1405
Fig. 4 Representation Fig. 5 Classifier confusion
embeddings matrix
Table 1. Results for SSL feature selection
Fig. 2 Process of classifying audio characteristics to determine
their authenticity.
As future work, the model will be evaluated in various
References environments, different languages, and more
asdasd
[1] Abdeldayem, M. (2024). The Fake-or-Real (FoR) Dataset (deepfake audio) [Data set]. databases.