NORMA eResearch @NCI Library

Enhancing Climate Change Stance Detection Through Advanced Synthetic Data Augmentation

Prakash, Likhitha Konasale (2025) Enhancing Climate Change Stance Detection Through Advanced Synthetic Data Augmentation. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (774kB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (4MB) | Preview

Abstract

Climate change stance identification seeks to classify social media messages automatically into various viewpoint groups on climate change, commonly separating those who believe in climate science, those who refuse, and those who are neutral. The study proposes a sophisticated synthetic data augmentation system to enhance the accuracy of social media stance identification, especially for minority under represented opinions. The main goal is to fix the big class imbalance in climate debate data, where minority opinions are often less than 12% of the training data and models can't find these important opinions. This work shows that synthetic data generation can be used to balance training datasets. For example, the Twitter Climate Change Sentiment Dataset has only 11.51% of samples that are against climate change.

The paper suggests a general augmentation framework built on OpenAI's GPT-4.1 Mini. It includes three main new ideas: stance-adapted generation strategies based on linguistic analysis of climate discourse, a parallel processing architecture that runs 60+ samples per minute, and a five-layer validation system to make sure the quality of the synthetic data. Ten specific strategies were developed through careful linguistic analysis to make real samples for under-represented anti- and neutral stances. Validation tests on seven models showed big improvements. The best model, RoBERTa, was 88.92% accurate and improved the identification of minority classes by 47%. The system made 20,000 high-quality synthetic instances out of 41,000 tries, which changed the dataset's anti-stance representation from 11.51% to 25.59%.

These experiments provide a pragmatic solution to the problem of class imbalance for stance detection and a theoretical advance towards synthetic data generation for ideologically sensitive tasks. The approach can be generalized further to other types of polarized discourse where minority perspective identification is still essential to public opinion dynamics understanding.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Maniganze, Taylou
UNSPECIFIED
Uncontrolled Keywords: Climate Change Stance Detection; Synthetic Data Augmentation; Large Language Models; GPT-4.1 Mini; Class Imbalance; Social Media Analysis; Natural Language Processing; Transformer Models; Multi-layer Validation; Twitter Discourse
Subjects: Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
G Geography. Anthropology. Recreation > GE Environmental Sciences > Environment
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4150 Computer Network Resources > The Internet > World Wide Web > Websites > Online social networks
T Technology > TK Electrical engineering. Electronics. Nuclear engineering > Telecommunications > The Internet > World Wide Web > Websites > Online social networks
Divisions: School of Computing > Master of Science in Artificial Intelligence
Depositing User: Ciara O'Brien
Date Deposited: 02 Jun 2026 11:52
Last Modified: 02 Jun 2026 11:52
URI: https://norma.ncirl.ie/id/eprint/9338

Actions (login required)

View Item View Item