Comparative performance of RF and GBM for short-term customer segmentation forecasting

Jose, Thomas

Comparative performance of RF and GBM for short-term customer segmentation forecasting

Tools

Jose, Thomas (2024) Comparative performance of RF and GBM for short-term customer segmentation forecasting. Masters thesis, Dublin, National College of Ireland.

Preview	PDF (Master of Science) Download (1MB) \| Preview
Preview	PDF (Configuration Manual) Download (1MB) \| Preview

Abstract

The research explores the use of Recency, Frequency, and Monetary (RFM) analysis for customer segmentation. Companies often use segmentation techniques to generate insights on purchase behavior to quantitatively rank and group customers for targeted marketing campaigns. A typical question analysts face is how much data they need to perform those analyses and the confidence of predictions. However, little is known about what month is the easiest one to classify. Or, what is the best month in which the largest share of target customers is found? Here we present a detailed analysis of 1-month customer segmentation forecasting using two well-known machine learning techniques. We critically evaluate the classification accuracy on three segments, for “good”, “medium” and “bad” customers.

Based on the findings, we also evaluate the need for automated model selection and hyper-parameter optimization of the customer segmentation models. While many papers would go straight into the optimisation part, we want to verify whether such a range of tools is actually needed for improved classification accuracy. Based on the literature review, Random Forest and GBM classifiers were singled out as the top techniques for this classification task. It is however not known which one will deliver the best classification accuracy. As part of the research, we will show that plain RF and GBM are unsuitable for the task, as the distribution of “good” customers is uneven. To avoid this class imbalance, we used stratified classification.

The findings of the study point toward that, although April is the best month for prediction with RF, discussions have highlighted issues such as the classification of a few “good” customers and small sample sizes, and the need to tune the RFM score function or the “good” customers threshold. Besides, GBM outperforms RF by far, especially in those months that have a smaller number of “good” customers. GBM and RF give 98.5% and 91% classification accuracy, respectively with stratified classification using 10-fold cross-validation. Models are good enough, well over 90% accurate, and hence there is no need for further boosting of hyper-parameters. The results and research methodology are expected to provide valuable insights for analysts planning to do customer segmentation and forecasting of customer behaviour

Item Type:	Thesis (Masters)
Supervisors:	Name Email Estrada, Giovani UNSPECIFIED
Subjects:	Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Cloud computing H Social Sciences > HF Commerce > Marketing > Consumer Behaviour
Divisions:	School of Computing > Master of Science in Cloud Computing
Depositing User:	Ciara O'Brien
Date Deposited:	03 Jun 2025 13:31
Last Modified:	03 Jun 2025 13:31
URI:	https://norma.ncirl.ie/id/eprint/7726

Actions (login required)

View Item