NORMA eResearch @NCI Library

Predicting Airline Customer Review Scores Based on Sentiment Analysis of Reviews: Technical Report

Graham, Jordan-Lee (2022) Predicting Airline Customer Review Scores Based on Sentiment Analysis of Reviews: Technical Report. Undergraduate thesis, Dublin, National College of Ireland.

[thumbnail of Bachelor of Science]
PDF (Bachelor of Science)
Download (750kB) | Preview


The purpose of this research was to contribute to further understanding of text classification sentiment analysis for use in the implementation of prediction models. Much of the available literature focuses on lexicon-based approaches, which aim to match words in text with words from ‘positive’ or ‘negative’ word dictionaries and apply a binary classification of negative or positive sentiment. In this study, a statistical approach to text classification was taken by using the ‘Term Frequency – Index Document Frequency’ (TF-IDF) algorithm to weight words based on their number of occurrences in a review in comparison to all review documents. The TF-IDF features of 100 and 300 were supplied to the most consistent supervised learning model from relevant literature – a ‘Support Vector Machine’ (SVM). The model was tested with a variety of hyper parameters in combination with the linear and radial basis function (RBF) kernels. The non-parametric kernel RBF performed best in classifying values in the range 1-10, with an accuracy of 44%. Methods were implemented to improve the model to determine the best possible accuracy the multiclass classification model could achieve in comparison to similar studies performing binary classification. The result was a performance of 83% accuracy on the multiclass classification of ‘negative’, ‘neutral’ and ‘positive’. A precision of 77% was achieved on the under represented ‘neutral’ label due to data imbalances. The best hyper-parameters identified for the best results on each attempt of classification are recorded. A recommendation is drawn for the use of more TF-IDF features alongside a non-parametric kernel for best results for the solution to multiclass classification of sentiment.

Item Type: Thesis (Undergraduate)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
H Social Sciences > HD Industries. Land use. Labor > Specific Industries > Aviation Industry
Divisions: School of Computing > Bachelor of Science (Honours) in Computing
Depositing User: Clara Chan
Date Deposited: 30 Aug 2022 13:05
Last Modified: 30 Aug 2022 13:05

Actions (login required)

View Item View Item