NORMA eResearch @NCI Library

Recommendation Framework on English Speaking Podcasts using Textual Information Analysis

Singh, Surabhi Shripal (2023) Recommendation Framework on English Speaking Podcasts using Textual Information Analysis. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration manual]
PDF (Configuration manual)
Download (2MB) | Preview


Podcasts are known as an audio style of talk show which can be tuned by listeners on demand. They have gained a lot of popularity in the digital era and are now available on prominent music streaming platforms. Most recently, several studies are being conducted to identify a suitable approach to increase the engagement of listeners with the platform through the podcast. Creating a recommendation system using a re-ranking approach based on ratings and reviews of the listener has been one of the noticeable methods. In these systems extracted features like listener’s ratings and reviews are passed to a decoder-like long short-term memory (LSTM) model and embedded in multi-directional vector space using the DistMult algorithm. One major challenge faced in this approach is the limitation of the system to only recommend podcasts that are high on rating or have higher reviews. This discourages producers to create content in different genres as they may not be tuned by active listeners. To overcome this challenging research, suggest creating models based on semantic features like podcast description, textual documentation of audio content, etc. Although few models have been created to conduct textual analysis, the area is still in the early stages of development. The research proposed, provides a novel framework to perform textual analysis on a description or podcast and its episodes using word embeddings and cosine similarity. The framework provides a list of the top 9 podcasts for each of the genres based on the user’s current choice of the podcast being consumed. The proposed model contains a customized Word2Vec model applied on the corpora derived from podcast descriptions, along with cosine similarity to identify a list of the most significant podcast titles to input data. The model is trained over a secondary dataset consisting of around 4300 podcasts in 19 genres and was collected by performing a query to connect with Spotify’s Web API. The author has also used PorterStemmer and WordNeLemmatizer techniques available in Natural Language processing to normalize the description for faster textual analysis. To create a significant framework, the cosine similarity matrix proposed model is compared with three of the existing models – ‘Count Vectorizer, TF-IDF, and GloVe’ presented by various researchers. Details about these models are given in related works. According to the comparison analysis, Count Vectorizer is considered as a baseline and hence performs significantly lower than all models. The proposed Customized Word2Vec shows the best performance with similarity at 9 percent higher than GloVe considering the hat proposed model was inspired by the GloVe framework. The model is developed to provide a framework design to prioritize recommendations based on the user’s own choice of listening type of podcast over popular suggestions.

Item Type: Thesis (Masters)
Cosgrave, Noel
Subjects: P Language and Literature > PE English
Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
H Social Sciences > HM Sociology > Information Science > Communication
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 26 May 2023 15:06
Last Modified: 26 May 2023 15:06

Actions (login required)

View Item View Item