NORMA eResearch @NCI Library

Beyond the Eye Test: Improving Football Recruitment Through The Use Of Clustering And Support Vector Machines: Data Science Report

Cannon, Thomas (2023) Beyond the Eye Test: Improving Football Recruitment Through The Use Of Clustering And Support Vector Machines: Data Science Report. Undergraduate thesis, Dublin, National College of Ireland.

[thumbnail of Bachelor of Science]
PDF (Bachelor of Science)
Download (2MB) | Preview


The objective for this project is to use data retrieved from a 3rd party source for analysis to try to make predictions on football players positions from the world’s top football leagues, while also seeking to identify similarities between players through the use of various clustering and classification methods on past season’s performance data. This report will detail a number of machine learning methods and technologies which were analysed to identify the most accurate method of predicting football players positions as well as identifying their similarities. Data from FBREF relating to the most recent season of football from the top 11 leagues around the world were combined to create a dataset which contains over 100 datapoints pertaining to each individual players performance throughout the season.

It is hoped that learnings from this project could then be used in the form of an application which could be leveraged to identify talent from other leagues or even applied to data from lower-level leagues to identify lesser-known players. This system could be used by professional football clubs and scouts to aid in the process of scouting and analysing transfer prospects.

Methods such as K Means clustering, K-Nearest Neighbour (KNN), and Support Vector Machines (SVM) are among some of the methods which were tested throughout the course of this project. Transfer learning through the use of a Support Vector Machine was applied to the raw data to create a high-level vector space in the hopes of achieving more accurate results.

The highest accuracies achieved by the various models created are as follows:
• Players Position Category through SVM: 88.3% accuracy
• Players Sub-Position through SVM: 74.84% accuracy

A more accurate model was deemed to be found to identify similar players across the leagues through the use of techniques used which was verified by a questionnaire posed to participants with an interest in football to validate the results.

Results of a T-test to identify if there was a significant difference between the choices of the questionnaire participants returned results showing that the participants asked leaned towards the output of the proposed improved model.

Item Type: Thesis (Undergraduate)
Clifford, William
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
G Geography. Anthropology. Recreation > GV Recreation Leisure > Sports > Soccer
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Bachelor of Science (Honours) in Computing
Depositing User: Tamara Malone
Date Deposited: 15 Jan 2024 17:40
Last Modified: 15 Jan 2024 17:40

Actions (login required)

View Item View Item