NORMA eResearch @NCI Library

A recommender systems and social networking approach to alleviate the issue of cold start

Matele, Mahesh Arjun (2021) A recommender systems and social networking approach to alleviate the issue of cold start. Masters thesis, Dublin, National College of Ireland.

[img]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[img]
Preview
PDF (Configuration manual)
Download (895kB) | Preview

Abstract

Cold start is the most frequent issue faced by recommender systems (RS). The reason for its happening is because it happens to ever new user that enters the system. A good RS is characterized by its recommendations, which completely depend upon the user’s preferences to watch movies of genres it likes. Cold start is when the new user enters the system and none of his preferences are available to RS to make recommendations. As the user is new it hasn’t set any preferences in the system or hasn’t rated an movies, because of which no history for a user is available with RS.

As part of this research we proposed a classification supervised model to recommend movies to user with the help of online social networks (OSN). The combination of OSN along with the user’s demographic data is used to recommend movies to the user. Which also assists the goal of the research to recommend movies with minimal demographic data. The options for fetching the OSN data is Twitter, Amazon or simply MovieLens which can be fed to the RS. This will contribute to the RS community to recommend movies to a new user with a minimal number of demographic variables.

When conducting the experiment we were able to achieve AUC-ROC curve of 0.70 using hyperparameterized KNN algorithm and were able to recommend movies of different genres to the user. The start-of-the-art experiment of J. Herce-Zelaya (2020) uses Random Forest and scored a Mean Absolute Error of 0.298 with 12 demographic variables from the twitter profile of the user. The tuned KNN algorithm was based on finding the optimal K neighbours using silhouette method. The optimized values for leaf node was 28 distance calculation between data points was most efficient using manhattan method with best neighbours values of 28 using
brute algorithm.

As mentioned earlier the AUC-ROC curve of 0.69 was achieved as part of this research with a precision of 0.68 and a recall score of 0.76 using KNN which is an improvement over the state-of-the-art experiment by 5% on recall score while keeping the same precision score. Besides, this the state-of-the-art experiment uses 12 demographic variables whereas the score achieved in this experiment uses 2 demographic variables namely gender and age to recommend movies. The infrastructure and other limitations and challenges faced during the experiment are highlighted in the challenges 3.1 section.

For further studies on this research topic for the community of enthusiasts this research has set a benchmark of 2 demographic variables with precision at par with the state-of-the-art experiment. The other research can continue with advanced machine learning models and try to improve the precision with greater number of demographic variables.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science

Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4150 Computer Network Resources > The Internet > World Wide Web > Websites > Online social networks
T Technology > TK Electrical engineering. Electronics. Nuclear engineering > Telecommunications > The Internet > World Wide Web > Websites > Online social networks
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Clara Chan
Date Deposited: 08 Dec 2021 19:10
Last Modified: 08 Dec 2021 19:10
URI: http://norma.ncirl.ie/id/eprint/5188

Actions (login required)

View Item View Item