Kumar, Sandeep (2025) Cross-Lingual RAG for English News article Summarization using Hindi Context. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (1MB) | Preview |
Preview |
PDF (Configuration Manual)
Download (459kB) | Preview |
Abstract
The research project that has been proposed examines the efficacy of Cross-Lingual Retrieval-Augmented Generation (RAG) in augmenting the English news summarization using Hindi contextual information. The paper refers to the problem of the creation of high-quality, condensed summaries of English news articles based on the incorporation of the relevant information extracted in a Hindi news corpus. We introduce a new framework that allows us to use multilingual sentence embeddings (LaBSE) to vectorize Hindi articles, store them in ChromaDB and retrieve contextual chunks according to their semantic similarity to English news articles. The most important is that we follow the strategy of translating the retrieved Hindi context into English by utilizing the Opus-MT model so that it could be combined with the English summarization models. To test the proposing framework, we have performed four different experiments, which are a baseline summarization (English article only), RAG with untranslated Hindi context, RAG with translated Hindi context, and RAG with semantically re-ranked Hindi chunks using a multilingual re-ranker (cross-encoder/ms-marco-MiniLM-L-6-v2). Such summarization models are BART, T5, Mistral, and Gemini. Assessment was carried out through Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Bilingual Evaluation Understudy (BLEU), and BERTScore scores and also using human assessment manually. In our study, we have proved that the inclusion of translated Hindi context in English news summaries increases the quality and informativeness of the summaries especially when there is a lack of sufficient details in the English article. The research is useful to gain an insight into the promises of cross-lingual RAG systems to enhance cross-lingual access and comprehension to information.
| Item Type: | Thesis (Masters) |
|---|---|
| Supervisors: | Name Email Haque, Rejwanul UNSPECIFIED |
| Subjects: | P Language and Literature > P Philology. Linguistics Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing |
| Divisions: | School of Computing > Master of Science in Artificial Intelligence |
| Depositing User: | Ciara O'Brien |
| Date Deposited: | 02 Jun 2026 11:05 |
| Last Modified: | 02 Jun 2026 11:05 |
| URI: | https://norma.ncirl.ie/id/eprint/9329 |
Actions (login required)
![]() |
View Item |
Tools
Tools