Chimthankar, Priyanka Prashant (2021) Speech Emotion Recognition using Deep Learning. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (2MB) | Preview |
Preview |
PDF (Configuration manual)
Download (1MB) | Preview |
Abstract
Speech Emotion Recognition (SER) has a broad range of applications and there has been a significant amount of research in this fascinating area in recent years. However, the entertainment sector suffers from a lack of study in this research. The Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) architectures will be utilized to categorize the emotions in audio recordings captured by actors expressing various emotions. An innovative method will be discussed that combines 2D CNN+LSTM with MFCC features extracted from audio data. Multiple experiments are used to determine the reliability of such systems that use deep learning. The model is based on four widely used datasets in SER: SAVEE, RAVDESS, TESS, and CREMA-D, and has a validation accuracy of 67.58%. Additionally, this model was evaluated on an unknown dataset that included audio samples in the German language and achieved a testing accuracy of 71.28%.
Item Type: | Thesis (Masters) |
---|---|
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QA Mathematics > Computer software T Technology > T Technology (General) > Information Technology > Computer software |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Clara Chan |
Date Deposited: | 15 Nov 2021 16:59 |
Last Modified: | 15 Nov 2021 16:59 |
URI: | https://norma.ncirl.ie/id/eprint/5142 |
Actions (login required)
View Item |