Page 80 - 2025S
P. 80

UEC Int’l Mini-Conference No.54                                                               73









              Detection Model for Audio Deepfakes Using Self-Supervised Learning

                                    to Prevent Identity Spoofing Attacks


                                 Maria Vianney MEDINA      *1  and Minami YASUHIRO     2

                                   1 UEC Exchange Study Program (JUSST Program)
                                            2 Yasuhiro Minami’s Department
                                The University of Electro-Communications, Tokyo, Japan



             Keywords: Voice deepfake, Self-Supervised Learning (SSL), Spoofing attacks, Synthetic audio, Ma-
             chine learning.



                                                        Abstract
                    The advancement of generative speech models has enabled the creation of highly convincing synthetic
                 audio, known as voice deepfakes. These artificially generated utterances pose significant risks to security
                 systems, biometric authentication, and the credibility of digital communications. This paper presents
                 a method for detecting voice deepfakes by leveraging HuBERT, a self-supervised speech representation
                 model. The approach involves extracting latent acoustic embeddings from raw audio using HuBERT,
                 followed by classification through a neural network. The model is trained to distinguish bonafide speech
                 from spoofed audio, exploiting subtle inconsistencies introduced by generative processes. The system is
                 evaluated on benchmark datasets and compared against traditional handcrafted features such as MFCC
                 and CQCC, as well as alternative neural back-ends. Experimental results are expected to demonstrate
                 the effectiveness of HuBERT representations in detecting various types of audio spoofing attacks and
                 highlight the potential of self-supervised learning for secure and generalizable deepfake detection.































                The author is supported by JASSO Scholarship.
               *
   75   76   77   78   79   80   81   82   83   84   85