Koli, Rohan Narayan (2021) Classification of Speaker’s Age, Gender and Nationality using Transfer Learning. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (2MB) | Preview |
Preview |
PDF (Configuration manaul)
Download (1MB) | Preview |
Abstract
Audio classification remains one of the most complex and challenging problems in the 21st century. While much analysis and research has been adopted in audio classification in sub-categories of audio scene classification and bio-acoustics, there has been very few researches adopting Human voice classification. In this research I explored the use of pretrained deep convolutional neural networks learning models for the classification task on log-Mel Spectrograms. Five pretrained models (Xception, Vgg16, Vgg198, ResNet50, Inception V3) along with model stacking are compared with respect to two datasets namely, Mozilla Common Voice and Speech Accent dataset. The research was able to achieve 95% accuracy for gender classification while the age group and nationality classification achieved satisfactory results with accuracy 52% and 48% accuracy respectively which can further be utilized to develop enhanced models.
Item Type: | Thesis (Masters) |
---|---|
Uncontrolled Keywords: | Audio Classification; Pretrained Networks; Stacked Ensemble Model; Transfer Learning |
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QA Mathematics > Computer software T Technology > T Technology (General) > Information Technology > Computer software |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Clara Chan |
Date Deposited: | 06 Dec 2021 12:46 |
Last Modified: | 06 Dec 2021 12:46 |
URI: | https://norma.ncirl.ie/id/eprint/5177 |
Actions (login required)
View Item |