NORMA eResearch @NCI Library

Classification of Speaker’s Age, Gender and Nationality using Transfer Learning

Koli, Rohan Narayan (2021) Classification of Speaker’s Age, Gender and Nationality using Transfer Learning. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration manaul]
PDF (Configuration manaul)
Download (1MB) | Preview


Audio classification remains one of the most complex and challenging problems in the 21st century. While much analysis and research has been adopted in audio classification in sub-categories of audio scene classification and bio-acoustics, there has been very few researches adopting Human voice classification. In this research I explored the use of pretrained deep convolutional neural networks learning models for the classification task on log-Mel Spectrograms. Five pretrained models (Xception, Vgg16, Vgg198, ResNet50, Inception V3) along with model stacking are compared with respect to two datasets namely, Mozilla Common Voice and Speech Accent dataset. The research was able to achieve 95% accuracy for gender classification while the age group and nationality classification achieved satisfactory results with accuracy 52% and 48% accuracy respectively which can further be utilized to develop enhanced models.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Audio Classification; Pretrained Networks; Stacked Ensemble Model; Transfer Learning
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Clara Chan
Date Deposited: 06 Dec 2021 12:46
Last Modified: 06 Dec 2021 12:46

Actions (login required)

View Item View Item