Deepfake Detection Using a Lightweight Hybrid CNNViT Model for Low-Power Devices

Elayadath, Ajul Gopu

Deepfake Detection Using a Lightweight Hybrid CNNViT Model for Low-Power Devices

Tools

Elayadath, Ajul Gopu (2025) Deepfake Detection Using a Lightweight Hybrid CNNViT Model for Low-Power Devices. Masters thesis, Dublin, National College of Ireland.

Preview	PDF (Master of Science) Download (1MB) \| Preview
Preview	PDF (Configuration Manual) Download (1MB) \| Preview

Abstract

The rise in deepfake technology has introduced significant challenges in maintaining the authenticity of digital content. While high-performance deepfake models exist, they are typically large and resource-intensive, making them unsuitable for deployment on low-power devices such as smartphones and CPU-only systems. This research aims to address this gap by proposing a lightweight hybrid model that combines the strengths of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs).

To achieve this, a hybrid architecture combining MobileNetV2 (a compact CNN) for efficient local feature extraction and TinyViT (a small Vision Transformer) for capturing global context was designed, ensuring both performance and computational efficiency. The model is trained on preprocessed facial regions from deep-fake videos, which are extracted using MTCNN for face detection and OpenCV for frame extraction. Data augmentation, normalization, and frame-skipping techniques are employed to improve generalization and reduce redundancy. The training process involves freezing and unfreezing backbone layers, applying focal loss to handle class imbalance, and using early stopping to prevent overfitting. Post-training quantization is applied to compress the model, reducing its size and improving inference speed without significantly degrading accuracy.

The proposed system achieved 99.66% validation accuracy on FaceForensics++ and 69.0% accuracy on the unseen Celeb-DF v2 dataset. Quantisation reduced model size by 53% (32.8 MB to 15.4 MB) and improved fake recall from 0.50 to 0.55 and accuracy from 69% to 79% indicating enhanced sensitivity to manipulated content. The results show the feasibility of deploying the system in a resource-constrained environment.

Item Type:	Thesis (Masters)
Supervisors:	Name Email Del Rosal, Victor UNSPECIFIED
Uncontrolled Keywords:	Deepfake detection; Convolutional Neural Networks; Vision Transformers; MobileNetV2; TinyViT; Model compression; Quantisation
Subjects:	Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence > Computer vision Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence > Computer vision
Divisions:	School of Computing > Master of Science in Artificial Intelligence for Business
Depositing User:	Ciara O'Brien
Date Deposited:	24 Jun 2026 11:18
Last Modified:	24 Jun 2026 11:18
URI:	https://norma.ncirl.ie/id/eprint/9399

Actions (login required)

View Item