NORMA eResearch @NCI Library

Customer Churn Prediction using RAG-Based Sentiment Analysis with LLMs and CatBoost

Kalungepatil, Sakshi Kacheshwar (2025) Customer Churn Prediction using RAG-Based Sentiment Analysis with LLMs and CatBoost. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (992kB) | Preview

Abstract

In today’s competitive e-commerce era with vast quantity of sentiment-rich textual and behavioral customer data available, predicting customer churn becomes crucial in understanding customer behavior for business sustainability, user engagement, and rapid market growth. This is achievable with the integration of advanced techniques required to extract meaningful insights for data-driven decision-making. This research investigates the use of a hybrid approach for predicting early signs of costumer churn by integrating a fine-tuned large language model LLM with an ensemble machine learning technique. The study explores the idea of combining the Retrieval-Augmented Generation (RAG) framework with instruction-following fine-tuned Large Language Model Meta AI (LLaMA) for sentiment analysis through customer review data. To boost predictive performance, sentiment-driven features achieved from the RAG module are combined with structural features such as verified purchase and review length, and are passed to the CatBoost model for final churn prediction. The research used a Kaggle dataset consisting of Amazon customer reviews 2023, containing the combination of textual and behavioral characteristics. This hybrid approach reveals that the model achieved an accuracy of 86.75% for the RAG-based fine-tuned LLaMA model and an accuracy of 75.9% for the CatBoost model. Adapting such a hybrid approach validates the effectiveness of combining sentiment-rich textual data with structural features for churn prediction in real-world applications.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Haque, Rejwanul
UNSPECIFIED
Subjects: Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
H Social Sciences > HF Commerce > Marketing > Consumer Behaviour
H Social Sciences > HF Commerce > Electronic Commerce
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Ciara O'Brien
Date Deposited: 01 Jul 2026 11:03
Last Modified: 01 Jul 2026 11:03
URI: https://norma.ncirl.ie/id/eprint/9430

Actions (login required)

View Item View Item