NORMA eResearch @NCI Library

Media Content Analysis of Covid-19 Virus Using Natural Language Processing Techniques

Rouxel, Anaelle (2020) Media Content Analysis of Covid-19 Virus Using Natural Language Processing Techniques. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (802kB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (8MB) | Preview

Abstract

Covid-19 outbreak in December 2019 spread worldwide during the first half of 2020, affecting populations and economies. The pandemic created an emergency context and emphasized on the lack of knowledge in the domain of crisis informatics. The impact of coronavirus on media usage and population emotional reactions will contribute to the current state of art. This research project objectives are to assess the public's interests and responses to Covid-19 and assess the use of social media and news media in communicating on the emerging virus, using Natural Language Processing (NLP) techniques. Extracted topics relate to the pandemic development with infection cases updates and protection measures. Themes are more varied in news media (environment, entertainment, economy, vaccine) whereas Twitter data evoke behaviours instructions and more negative latent topics (search for the virus origin, testing). LDA models achieved a coherence score of 0.381 on tweets and 0.475 on news corpus. The sentiment analysis showed the importance of the neutral class, 100% of news articles and 90.2% of tweets fall into this category. The distribution showed 7.2% of tweets are positive and 2.6% are negative. Statistical paired t-test comparing tweets scores means before and after text pre-processing confirmed the operation impacts polarity results. Tweets were more distributed to the neutral class after pre processing. Lexicon-based emotion detection showed a dominance of fear in tweets against trust evoked in news, whereas sadness and anticipation emotions are similarly present in both corpuses analysed. The project also featured a literature review and concluded on the research gaps concerning media content analysis using NLP techniques.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
R Medicine > RA Public aspects of medicine
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Dan English
Date Deposited: 18 Jan 2021 15:00
Last Modified: 18 Jan 2021 15:00
URI: https://norma.ncirl.ie/id/eprint/4372

Actions (login required)

View Item View Item