NORMA eResearch @NCI Library

Enhancing Machine Learning Performance using Feature Engineering Techniques for Online Course Recommendation System

Kattukottai Mani, Srivatsav (2023) Enhancing Machine Learning Performance using Feature Engineering Techniques for Online Course Recommendation System. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (1MB) | Preview

Abstract

A recommendation system is a machine learning-based system that is used to recommend suggestions to users based on their previous records that provides the capabilities of decision-making to the users. Some of the major recommendation systems were developed for applications like medical search, movie search, movie reviews, course recommendations etc. Online courses provide learners with high quality and flexible online courses with no limitations regarding time and location. But, recommendation without proper features or feature engineering may lead to less effective and less chances of personalized recommended items to users. Also, feature engineering helps to make a decent recommendation to users when they are looking for new items especially when the data is sparse and unstructured such as text or images. To address these limitations, in this research, Feature engineering techniques such as Stopword removal, Stemming, Decontraction, sent to words, lemmatization and Vectorization is focussed to see how it can improve machine learning algorithms that can be fed into a real world Recommendation system to recommend online courses to everyone. There are lots of e-learning courses available on the websites like Coursera, Edx, Udacity etc. Here for the analysis we consider the publicly available Udacity dataset from the Kaggle website. The raw data is transformed to more suitable data and fed into three classification models- Support Vector Machine (SVM), K-Nearest Neighbors and Adaptive Boosting models. A comparative analysis is performed on both raw and transformed data by using a confusion matrix that provides recall, accuracy, precision, and F1 score to measure the performance of the developed models. The comparative results shows that all the models using the transformed data has shown promising improvement and proves the model developed will accurately provide relevant courses to the users tailored to their preferences. Accuracy of AdaBoost model has shown 72.5% accuracy on the transformed data compared to 37.25% on raw data.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Trinh, Anh Duong
UNSPECIFIED
Uncontrolled Keywords: Recommendation System; Machine learning; Feature Engineering; SVM; KNN; AdaBoost; Stopwords; Stemming; Decontraction; sent to words; lemmatization; Vectorization
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
L Education > LC Special aspects / Types of education > E-Learning
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 26 Nov 2024 11:43
Last Modified: 26 Nov 2024 11:43
URI: https://norma.ncirl.ie/id/eprint/7198

Actions (login required)

View Item View Item