NORMA eResearch @NCI Library

Analysis of the Crime Rate in London Using Machine Learning and Data Mining Models: Technical Report

Woldegiorgis, Tiblet (2021) Analysis of the Crime Rate in London Using Machine Learning and Data Mining Models: Technical Report. Undergraduate thesis, Dublin, National College of Ireland.

[thumbnail of Bachelor of Science]
PDF (Bachelor of Science)
Download (970kB) | Preview


Crime is one of the biggest problems faced by countries that negatively impacts society and the economy. Numerous studies have been carried out to identify the causes of crime rate in London. Compared to the years used for this analysis, currently crime rate in the London has been increasing year on year, with the steepest increase occurring after the year 2016. This report investigatesfactors that contribute the crime rate in London. There could be many factors that contribute to crime rate but seen in this study factors are unemployment rate, homelessness rate and employees earning below the living wage in London area.

The datasets were collected from public source such as ( and ( These datasets were pre-processed manually using excel, and furthered pre-processed in Jupyter using Python. Machine learning and data mining models like Pearson's correlation coefficient, multiple linear regression analysis, and k-means clustering used in different analytics tools such as Python and R and Tableau for visualisation.

Pearson's correlation coefficient is a statistical test that measures the statistical relationship, between two constant variables. It is suitable for this analysis since the goal is finding a correlation between each of the three factors namely Unemployment, homelessness, Employees below the living minimum wage as related to the crime rate in London.

Multiple linear regression is a statistical model that assesses the connection between one dependent variable and one or more independent variable utilizing a line. Multiple linear regression is also suitable for this analysis since it helps us to identifying a relationship between each of the variables in dataset. K-means Clustering is used when there are unlabelled data point like such as data without defined categories or groups so the algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. For this analysis K-means was used to see the dataset be groups in to 3 clusters.

The analysis indicated that Crime Rate and Low Income have a strong negative correlation while homelessness rate had a strongly and positively relation in correction analysis. Unemployment is also related positively to crime but weaker than homeless. Multiple linear regression analysis also show that these factors have a strong relationship as indicated by the F-statistic and the P-values.

K-means clustering grouped the datapoints in three cluster which we can label as Low Crime Area, Medium Crime, and High Crime Area.

The result of all the analysis were consistent in that Crime Rate in London between 2012 and 2016 was highly influenced by unemployment rate, low wage rate, and homelessness, but at different and varying levels.

Item Type: Thesis (Undergraduate)
Subjects: H Social Sciences > HV Social pathology. Social and public welfare > Criminology > Crimes and Offences
Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Divisions: School of Computing > Bachelor of Science (Honours) in Computing
Depositing User: Clara Chan
Date Deposited: 16 Sep 2021 10:25
Last Modified: 16 Sep 2021 16:46

Actions (login required)

View Item View Item