Building a Balanced Deepfake Dataset: Aligned Faces for Robust Model Training and Evaluation

Islam, Md Raisul; Islam, Mohammad Monirul; Rakib, MD. Aminul Islam; Anik, Md. Shihab Mahmud; Kundu, Amit Kumar; Tuhin, Md. Golam Morshed

Building a Balanced Deepfake Dataset: Aligned Faces for Robust Model Training and Evaluation

Tools

Islam, Md Raisul, Islam, Mohammad Monirul, Rakib, MD. Aminul Islam, Anik, Md. Shihab Mahmud, Kundu, Amit Kumar and Tuhin, Md. Golam Morshed (2026) Building a Balanced Deepfake Dataset: Aligned Faces for Robust Model Training and Evaluation. In: 2025 IEEE 4th International Conference on Robotics, Automation, Artificial-Intelligence and Internet-of-Things (RAAICON). IEEE, Dhaka, Bangladesh, pp. 486-491. ISBN 979-8-3315-9282-0

Full text not available from this repository.

Official URL: https://doi.org/10.1109/RAAICON69033.2025.11502474

Abstract

Deepfake media - highly realistic AI-generated face swaps - pose a growing threat to the authenticity of digital content. As a contribution towards research into detecting such forgeries, we introduce a deepfake video dataset that is large-scale and well-annotated. It comprises 480 videos (110,694 frames) which are equally split between genuine and forged content. Deepfake videos were generated with two modern face-swap workflows (open-source Roop Face-Swapper and commercial Akool AI tool), worked on the videos of 30 volunteers (15 males, 15 females). Frames were downsampled at 5 fps, and a face detector (MTCNN) was used for cropping and aligning a main face to extract 3, 7 4 6 real-face images and 1 0 6, 9 4 8 fake-face images. It is demographically and generation-method balanced which can serve as a diverse reference for deepfake detection. Ethical clearance was gained from Daffodil International University. The whole dataset, including images, labels and metadata, is accessible to the research community. Our dataset is unique from previous works such as FaceForensics++ and Celeb-DF due to the demographic diversity, multi-pipeline generation, and transparent creation pipeline. To illustrate its effectiveness, a custom CNN classifier trained on the dataset achieved an accuracy of 97.8% at differentiating between real and fake faces. For the sake of transparency and reproducibility, we present full information of the recording, generation, and preprocessing pipeline.

Item Type:	Book Section
Uncontrolled Keywords:	Akool AI; data preprocessing; Deepfake dataset; digital forensics; face swapping; facial alignment; Roop
Subjects:	Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence > Computer vision Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence > Computer vision
Divisions:	School of Computing
Depositing User:	Tamara Malone
Date Deposited:	23 Jun 2026 13:25
Last Modified:	23 Jun 2026 13:25
URI:	https://norma.ncirl.ie/id/eprint/9387

Actions (login required)

View Item