NORMA eResearch @NCI Library

Transformer Framework for Language Translation in Low Resource Languages

Gumus, Muhammet (2025) Transformer Framework for Language Translation in Low Resource Languages. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (3MB) | Preview

Abstract

Language translation refers to the process of converting written content from one natural language into another while preserving its meaning and intent. Low-resource languages are those that have a limited availability of resources compared to widely spoken languages. A transformer model is a type of neural network architecture used in machine learning, particularly for processing sequential data. While this relatively new architecture is a significant improvement over older machine translation systems, determining which model will consistently produce the highest quality translation for each sentence remains a persistent challenge. This research proposes a transformer framework that leverages multiple transformer-based translation models to detect the best candidate of translation on a per-sentence basis. The proposed framework combines general-purpose multilingual models, LoRA fine-tuned models on domain-specific datasets and a meta-learning component to select the best translation model for each sentence that predicts the most suitable model for each input. Experiments use English–Turkish TED2020, Tatoeba and Opus100 datasets and show practical and research value in efficient model selection for low-resource translation. Translation quality is evaluated using BLEU scores and reference-free signals from large language models, providing informative supervision for the meta-learner. Results show that the meta-learner system powered by transformer models can achieve accuracy up to 71 % with the ensemble strategy of ML models like RandomForest, XGBoost, and LightGBM on the limited datasets and budget.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Stynes, Paul
UNSPECIFIED
Subjects: P Language and Literature > P Philology. Linguistics
Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Artificial Intelligence
Depositing User: Ciara O'Brien
Date Deposited: 28 May 2026 13:58
Last Modified: 28 May 2026 13:58
URI: https://norma.ncirl.ie/id/eprint/9321

Actions (login required)

View Item View Item