NORMA eResearch @NCI Library

Food Authentication Using Dimensionality Reduction techniques and Ensemble Algorithms on Spectroscopic Datasets

Patil, Tushar (2020) Food Authentication Using Dimensionality Reduction techniques and Ensemble Algorithms on Spectroscopic Datasets. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
PDF (Master of Science)
Download (710kB) | Preview
[thumbnail of Configuration manual]
PDF (Configuration manual)
Download (563kB) | Preview


The objective of the studies in food authentication domain is to correctly label the unknown food samples. In this research study three different food authenticity datasets of different types : meat, olive oil and honey are studied. The samples collected using Near-infrared spectroscopy method pose major challenges : the resulting datasets are high dimensional data, i.e. number of predictor variables(p) are much more than the number of observations (n),(n<<p) and the datapoints suffer from inherent collinearity problems. This research study proposes to apply three different dimensionality reduction algorithms to determine the principal components and then feed these embedding spaces to AdaBoost classifiers with DCT and SVM as base estimator. In addition, Random Forests classifier is also applied on the datasets. The aim of this research is to find the optimal combination of the dimensionality reduction algorithms and the classification algorithms that yields optimum level of accuracy. From the results of the study, it is observed that in case of meat data, LDA-AdaBoost Svm approach outperforms other approaches, whereas in case of honey dataset Random Forest classifier outperforms other approaches. In case of olive oil dataset AdaBoostDct with original dataset without any transformation outperforms other approaches.
Keywords : Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), t-SNE (Stochastic Neighbour Embedding), Adaptive Boosting (AdaBoost), Decision tree (DCT), Support Vector Machines (SVM), Random Forest (RF).

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Dan English
Date Deposited: 25 Jan 2021 13:58
Last Modified: 25 Jan 2021 13:58

Actions (login required)

View Item View Item