NORMA eResearch @NCI Library

Opinion Mining for Automated Restaurant Reviews Rating: Technical Report

Mijakovac, Dejan (2022) Opinion Mining for Automated Restaurant Reviews Rating: Technical Report. Undergraduate thesis, Dublin, National College of Ireland.

[thumbnail of Bachelor of Science]
PDF (Bachelor of Science)
Download (2MB) | Preview


Restaurant industry is a high grossing one, both in Ireland and worldwide. With most businesses moving online during the COVID-19 pandemic, customers started to spend more time online, writing online reviews. Not only were more reviews being written, but people also relied more on those reviews when deciding whether to use a particular restaurant’s service. All this put additional pressure on restaurants to maintain a positive online presence and high ratings. This pressure has increased recently with the high levels of inflation, which forces customers to reduce their outdoor dining and food orders. As a result of this, they are even more careful when it comes to online ratings.

The aim of this project was to, using the KDD methodology, develop a model which can predict, solely on the basis of the textual review, the rating of a review, on a scale of 1 to 5. An additional aim was to develop a model which can predict the review sentiment, it being negative, positive or neutral. While the sentiment analysis classification is a well-known topic studied in the Natural Language Processing field, the idea to classify the reviews according to rating is one that is not usually entertained due to the complexity of the classification process.

The textual reviews were vectorized using two different approaches – Count Vectorizer and TfIdf Vectorizer - in order for them to be used by the machine learning algorithms. Four out of five algorithms used are the ones typically used for sentiment analysis (Logistic Regression, Support Vector Machines, Random Forest and Naïve Bayes), while Neural Networks were introduced in this project due to the high number of features and reviews that needed to be processed. As Neural Networks perform better with large, complex datasets and highly-dimensional features, their choice was a logical one, especially for the ratings classification.

The final results show that Logistic Regression is the best-performing algorithm in both use cases – ratings and sentiment classification. Furthermore, Support Vector Machines and Neural Networks also perform very well. When sentiment classification is compared to the results obtained using a tool already on the market (Vader), all models developed during the project outperform it. This means that these models, especially the best-performing ones, can be used either on restaurants’ websites to predict ratings or sentiment, or internally, when reviewing customer feedback, to identify problems that need to be rectified and positive feedback which helps to note what customers enjoy.

Item Type: Thesis (Undergraduate)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
H Social Sciences > HF Commerce > Customer Service
H Social Sciences > HD Industries. Land use. Labor > Specific Industries > Hospitality Industry
Divisions: School of Computing > Bachelor of Science (Honours) in Computing
Depositing User: Clara Chan
Date Deposited: 12 Sep 2022 09:01
Last Modified: 12 Sep 2022 09:01

Actions (login required)

View Item View Item