Talluri, Tarun Kumar (2024) A security application for social media and web platforms to identify the sound based deepfakes using signal processing infused deep learning framework. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (1MB) | Preview |
Preview |
PDF (Configuration Manual)
Download (617kB) | Preview |
Abstract
Deepfake audio is a severe danger to digital communication since the AI used to create fakes can imitate voices and intonations at a high level of accuracy. This research fits the current gaps in the detection framework focusing on signal processing and deep learning as a preferred, more efficient methodology. Moreover, Mel-Frequency Cepstral Coefficients (MFCCs) and performance metrics (jitter and shimmer) having been incorporated into convolutional (CNN) and recurrent (LSTM) neural network would help the framework detect both the temporal and spectral changes in power audio. The study compares multiple models across various datasets: XGBoost outperforms others with clean audio database of ‘In-the-Wild’, 99.20% accuracy; CNN+LSTM on noisy audio database: Fake-or-Real, 69% precision. In improving the detection performance, preprocessing and integration of different modalities contribute well toward scalability and generalization on other novel deepfake generation algorithms. Realization in near real-time through an API recognizes the practical relevance of the application of the framework. This work lays the groundwork for adaptive, scalable audio deepfake detection systems that are necessary in emerging trust and secure societies.
Actions (login required)
![]() |
View Item |