NORMA eResearch @NCI Library

Classification of Human Age Group by Implementing Deep Learning Models on Audio Data

Pandey, Srijan (2020) Classification of Human Age Group by Implementing Deep Learning Models on Audio Data. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
PDF (Master of Science)
Download (3MB) | Preview
[thumbnail of Configuration manual]
PDF (Configuration manual)
Download (913kB) | Preview


Audio data classification is one of the most challenging fields in data science because of the complexity in the pre-processing of the data. This motivates the researchers to perform various techniques to reduce the complexity of data and improve performance. This research is aimed to predict the age group of an individual based on his/her voice data. This technique would be beneficial for the organization that wants to focus on its target age group and pitch the right product to the right section of the society. This project aimed at reducing the complexity of the audio data by first separating noise and dead air from the audio to generate clean data using signal enveloping. After that key audio features were directly extracted using the Mel Frequency Cepstral Coefficient rather than taking the Discrete Cosine Transform of log of filter bank. The reason for choosing MFCC was because it retains a large amount of information from the audio as it carries with itself time, frequency, and energy domain in each frame. The coefficients were scaled and then converted into an array of audio features. The labels were generated with the corresponding CSV file. The techniques applied had a temporal approach that was directly used on the audio samples. Speech Accent Archive dataset was used and the model was trained using a Fully Connected Convolutional Neural Network and Time Distributed Long Short Term Memory Recurrent Neural Network. This research also compares the performance of both the model on the same dataset through the accuracy obtained by both of them. CNN gave an accuracy of 62.45% on the test set whereas the LSTM-RNN model outperformed CNN and gave an accuracy of 66.07% on the same dataset.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Dan English
Date Deposited: 25 Jan 2021 13:09
Last Modified: 25 Jan 2021 13:09

Actions (login required)

View Item View Item