NORMA eResearch @NCI Library

Investigating the validity of FPL data in determining player performance and the most impactful players in the English Premier League teams

Meena, Nishant (2023) Investigating the validity of FPL data in determining player performance and the most impactful players in the English Premier League teams. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (3MB) | Preview

Abstract

The purpose of this research is to figure out how much information collected from the Fantasy Premier League (FPL) dataset can aid the real professional football managers to make accurate judgements of performance of players in English Premier League (EPL). The goal of this project is to find out the potential benefits for football managers/clubs using the FPL dataset to predict the performance of the players before the match and potentially selecting the best starting eleven for their club. In order to achieve this goal, this research aggregates and analyses FPL data from the previous EPL seasons and build Machine Learning models for predicting the performance of the players. This research incorporates the secondary data from other data sources for additional stats not present in the FPL dataset like xG (Expected Goals) and xA (Expected Assists) to check whether adding these secondary features to the FPL dataset improve the prediction results or not. The data is statistically analysed and machine learning models are used to build the models that forecast the performance of the players in an English Premier League club. The information derived from the FPL dataset and their dependency on the player performance can be used for better understanding of how the FPL data could be useful as a decision making tool for real world football managers. To find out the impact of secondary dataset, this study was conducted in two phases. In the first phase only FPL dataset was used to predict the points achieved by the players, and in the next phase the additional features from the secondary dataset were added to observe the change in prediction of the points of the players. The findings of this study suggests that the best performing model in both the experiments is Random Forest and it is determined that the secondary statistics such as xG and xA have minimal impact of prediction of the players’ performances, which was concluded as the first experiment which includes only the data from FPL dataset performs slightly better as compared to the second experiment which included the secondary stats, xG and xA. This study will have an impact on team managers, allowing them to apply data-driven strategies to choose their best starting eleven with a much greater certainty of players who will perform better in the football match. Furthermore, it will allow FPL in game managers to better understand how will each player performs before the actual match, allowing them to get a much deeper understanding and use of the statistics available to them to achieve high points in the game and potentially winning the game.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Horta, Vitor
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
G Geography. Anthropology. Recreation > GV Recreation Leisure > Sports > Soccer
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 29 Nov 2024 13:08
Last Modified: 29 Nov 2024 13:08
URI: https://norma.ncirl.ie/id/eprint/7211

Actions (login required)

View Item View Item