NORMA eResearch @NCI Library

Cross-Lingual RAG for English News article Summarization using Hindi Context

Kumar, Sandeep (2025) Cross-Lingual RAG for English News article Summarization using Hindi Context. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (459kB) | Preview

Abstract

The research project that has been proposed examines the efficacy of Cross-Lingual Retrieval-Augmented Generation (RAG) in augmenting the English news summarization using Hindi contextual information. The paper refers to the problem of the creation of high-quality, condensed summaries of English news articles based on the incorporation of the relevant information extracted in a Hindi news corpus. We introduce a new framework that allows us to use multilingual sentence embeddings (LaBSE) to vectorize Hindi articles, store them in ChromaDB and retrieve contextual chunks according to their semantic similarity to English news articles. The most important is that we follow the strategy of translating the retrieved Hindi context into English by utilizing the Opus-MT model so that it could be combined with the English summarization models. To test the proposing framework, we have performed four different experiments, which are a baseline summarization (English article only), RAG with untranslated Hindi context, RAG with translated Hindi context, and RAG with semantically re-ranked Hindi chunks using a multilingual re-ranker (cross-encoder/ms-marco-MiniLM-L-6-v2). Such summarization models are BART, T5, Mistral, and Gemini. Assessment was carried out through Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Bilingual Evaluation Understudy (BLEU), and BERTScore scores and also using human assessment manually. In our study, we have proved that the inclusion of translated Hindi context in English news summaries increases the quality and informativeness of the summaries especially when there is a lack of sufficient details in the English article. The research is useful to gain an insight into the promises of cross-lingual RAG systems to enhance cross-lingual access and comprehension to information.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Haque, Rejwanul
UNSPECIFIED
Subjects: P Language and Literature > P Philology. Linguistics
Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Divisions: School of Computing > Master of Science in Artificial Intelligence
Depositing User: Ciara O'Brien
Date Deposited: 02 Jun 2026 11:05
Last Modified: 02 Jun 2026 11:05
URI: https://norma.ncirl.ie/id/eprint/9329

Actions (login required)

View Item View Item