NORMA eResearch @NCI Library

Early prediction of a film's box office success using natural language processing techniques and machine learning

O'Driscoll, Sean (2016) Early prediction of a film's box office success using natural language processing techniques and machine learning. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (955kB) | Preview

Abstract

This research applied natural language processing and machine learning techniques to film scripts in order to try to predict whether or not the film will be financially successful. The film scripts were transformed into a term document matrix, with term frequency-inverse document frequency scores used to assign feature importance. The machine learning algorithms used in this research were decision trees, random forest, naive Bayes, and support vector machines. The results were evaluated using accuracy, precision, recall, F1 score and where appropriate, Cohen's Kappa. The results were also compared to predictions made using information about the films that is either known or can be reasonably estimated before the film has been made. Film scripts were also analysed after first segregating them by genre, in order to compare scripts with more similar/ related material. Overall, the predictions made using data generated from the film scripts were poor, while the predictions made using information about the films were only slightly better, based on this research's stringent evaluation criteria.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
H Social Sciences > HD Industries. Land use. Labor > Specific Industries > Film Industry
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Caoimhe Ní Mhaicín
Date Deposited: 30 Jan 2017 09:20
Last Modified: 30 Jan 2017 09:20
URI: https://norma.ncirl.ie/id/eprint/2531

Actions (login required)

View Item View Item