A Comparative Study on the Impact of Input Length on Transformer Model Performance in Misinformation Classification

-, Mohammad Shehnaj

A Comparative Study on the Impact of Input Length on Transformer Model Performance in Misinformation Classification

Tools

-, Mohammad Shehnaj (2025) A Comparative Study on the Impact of Input Length on Transformer Model Performance in Misinformation Classification. Masters thesis, Dublin, National College of Ireland.

Preview	PDF (Master of Science) Download (3MB) \| Preview
Preview	PDF (Configuration Manual) Download (2MB) \| Preview

Abstract

Context – The rise of digital platforms has accelerated the spread of fake news undermining trust and influencing public opinion. Transformer-based models are widely used for detection, yet the impact input text length ranging from short headlines to full articles in shaping their performance has only been partially investigated in prior studies.

Objective — This study evaluates how varying input lengths (short, medium, long, hybrid) affect the classification performance and computational efficiency of five transformer models: BERT, RoBERTa, Longformer, BigBird and LLaMA to establish benchmarks that balance accuracy, resource use and inference speed for real-world misinformation detection.

Method – Using a balanced dataset of 157,690 news items from multiple benchmark sources, all samples were preprocessed, tokenized and categorized into length bins before fine-tuning each model separately.

Results – Input length affected performance, repeated-measures ANOVA showed significant effects for RoBERTa, BERT and BigBird (p < 0.05) while most pairwise tests were non-significant. Longer inputs generally achieved higher F1-scores with RoBERTa, BigBird and Longformer exceeding 0.98 while shorter inputs lowered accuracy for most models mainly LLaMA (F1 < 0.71). Medium and hybrid lengths offered a balanced trade-off with BERT delivering competitive accuracy alongside the fastest inference time (15.69 ms per sample) and BigBird with lowest memory usage. RoBERTa maintained strong performance across all lengths whereas LLaMA consistently underperformed indicating limited ability to leverage extended context. These results highlight input length as a critical factor in balancing accuracy and computational efficiency in transformer-based fake news detection.

Conclusion – These findings provide length-aware benchmarks that guide the selection of transformer architectures and input strategies enabling a balanced tradeoff between accuracy, efficiency and deployment feasibility in real-world misinformation detection systems. These findings are supported with a token attribution analysis that highlights how predictive cues concentrate in short texts and diffuse in longer articles.

Item Type:	Thesis (Masters)
Supervisors:	Name Email Razzaq, Abdul UNSPECIFIED
Subjects:	Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4150 Computer Network Resources > The Internet > World Wide Web > Websites > Online social networks T Technology > TK Electrical engineering. Electronics. Nuclear engineering > Telecommunications > The Internet > World Wide Web > Websites > Online social networks
Divisions:	School of Computing > Master of Science in Artificial Intelligence
Depositing User:	Ciara O'Brien
Date Deposited:	28 May 2026 11:32
Last Modified:	28 May 2026 11:36
URI:	https://norma.ncirl.ie/id/eprint/9309

Actions (login required)

View Item