NORMA eResearch @NCI Library

An analysis of the causes and prevention of diabetes

Kelly, Benjamin (2023) An analysis of the causes and prevention of diabetes. Undergraduate thesis, Dublin, National College of Ireland.

[thumbnail of Bachelor of Science]
PDF (Bachelor of Science)
Download (1MB) | Preview


The main aim of the project is to explore the factors that influence the onset of diabetes and examine how diabetes can be controlled by analysing and understanding risk factors associated with the disease. The project aims to identify the most suitable model or classifier that could be used to predict the likelihood of someone becoming diabetic based on various data points. The project datasets were collected in CSV format as this was most suitable for use on a variety of platforms. The project report outlines the approach to data cleansing, analysis in classification models and exploration of what models worked best.

Following data preparation, I carried out statistical analysis and exploratory analysis using various methods and platforms including R Studio. I then applied statistical methods such as Students T-Test and the Chi-Square test as they were appropriate for classification problems. These tests were able to explain important points about the data and features, and the initial results of these tests are discussed in the results section of this report.

A number of classification models were implemented in this project including Decision trees, Random Forest and Naïve Bayes in addition to Neural Network modelling. The results summarise how these performed in the context of the aims of the project. The results are further summarised in the conclusions section.

I identified previous research carried out in the area of diabetes data analysis in the past including academic sources and medical journals widely available on the web. They highlight the importance of this research to healthcare professionals and how it can help us understand the increasing rates of diabetes. When I analysed the results and models against the project objective (i.e. to predict the incidence of diabetes), the three top performing models were:

• Neural Networks Smote Hyper Model,
• Random Forest Down sampled model
• Down sampled Decision Tree model

In terms of overall model performance, the Neural Network Smote model performed the best with an accuracy > 90% beating other models’ accuracy of between 70-75%. The project also illustrates that the use of AI over normal machine learning models was superior to normal machine learning models in the context of the project.

Item Type: Thesis (Undergraduate)
Bradford, Michael
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Q Science > QP Physiology > Nutrition
Divisions: School of Computing > Bachelor of Science (Honours) in Computing
Depositing User: Tamara Malone
Date Deposited: 16 Jan 2024 16:34
Last Modified: 16 Jan 2024 16:34

Actions (login required)

View Item View Item