Enhancing Deepfake Detection with MultiModal Transformers

Hombal, Sahana

Enhancing Deepfake Detection with MultiModal Transformers

Tools

Hombal, Sahana (2024) Enhancing Deepfake Detection with MultiModal Transformers. Masters thesis, Dublin, National College of Ireland.

Preview	PDF (Master of Science) Download (865kB) \| Preview
Preview	PDF (Configuration Manual) Download (1MB) \| Preview

Abstract

The recent development of deepfake technology has made it more challenging to verify the accuracy of digital content, especially when it comes to preventing manipulation and protecting cybersecurity. Advanced multi-modal false information is difficult to identify as current detection methods tend to utilize single modalities, such as visual or audio data only. These limitations are however addressed by this research through proposing a multi-modal deepfake detection system that adopts transformer architectures in analyzing both image and audio data. The proposed system is designed to improve the accuracy and efficiency of the detection techniques to afford a complete solution for detecting manipulations of media content. For the image-based detection, the study used Deepfake and Real Images Dataset and for the audio-based detection, the study used the Fake-or-Real Dataset. Real and fake images were predicted using ResNet50, VGG16, MobileNetV2, and InceptionV3 with adjusted layers; a convolutional and recurrent model was designed to perform on the audio data. Data enhancement techniques, normalization and spectrogram formation for audio corpus used for training and testing are applied for better accuracy. Performance of the models was measured based on parameters such as accuracy, precision, recall, and F1-score in order to make the assessment modality exhaustive. The data confirm the efficiency of the multi-modal system with a ResNet50 model accuracy of 94.5% for image detection, and the CNN-LSTM 91.4% F1-score for audio detection. By combining the spatial and temporal elements of the system, these results show how the method excels at identifying minute artifacts across media. However, the system has using difficulties when considering distortions and conditions for real markets. This work sets a new state-of-the-art in terms of deepfake detection, providing important implications for media authentication, cybersecurity, and countering fake news. As for the future work, the main areas to improve the system is to make it work better in real environment and to integrate more modalities to improve the strength of the detection system.

Item Type:	Thesis (Masters)
Supervisors:	Name Email Menghwar, Teerath Kumar UNSPECIFIED
Uncontrolled Keywords:	Deepfake Detection; Multi-Modal Transformer; PyTorch; TensorFlow; Media Forensics; CNN-LSTM Hybrid Model; Digital Content Authenticity
Subjects:	Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions:	School of Computing > Master of Science in Data Analytics
Depositing User:	Ciara O'Brien
Date Deposited:	02 Sep 2025 12:48
Last Modified:	02 Sep 2025 12:48
URI:	https://norma.ncirl.ie/id/eprint/8710

Actions (login required)

View Item