NORMA eResearch @NCI Library

Air Quality Quantification in Taiwan Using Machine Learning Techniques in Apache Spark Platform

Kandath, Sreenand (2019) Air Quality Quantification in Taiwan Using Machine Learning Techniques in Apache Spark Platform. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (3MB) | Preview

Abstract

In this era where oxygen is sold in bottles due to the deteriorated air quality outside, the exigency to reduce the air pollution, which proportionally increases the air quality is very high. This research is based on the historical air quality data of the Island of Taiwan. Various machine learning algorithms were manoeuvred to
predict the PM2.5 with various meteorological factors which act as the main component in calculating the air quality index (AQI) . This project was able to determine the prediction accuracy of various regression models and ensemble models in Apache Spark environment and compare the performance of each model relative to the performance efficiency and root mean square error. Linear regression, neural network regression, decision forest, decision tree with boosted decision tree models - AdaBoost and gradient boosted trees were modelled in both Apache spark client and cluster environment. Through multiple comparison parameters ensemble models with Boosted Tree were found to be the best models in predicting the air quality index with a prediction accuracy of 80%.
Keywords : Apache Spark, AdaBoost, Neural Network Regression, PM2.5

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
G Geography. Anthropology. Recreation > GE Environmental Sciences > Environment
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Dan English
Date Deposited: 16 Jun 2020 12:54
Last Modified: 16 Jun 2020 12:54
URI: https://norma.ncirl.ie/id/eprint/4297

Actions (login required)

View Item View Item