NORMA eResearch @NCI Library

Automated Meta-Optimization of Text Pre-Processing Pipelines using DARTS: A Domain-Adaptive Approach for NLP Tasks

Kasturi, Yasaswini (2025) Automated Meta-Optimization of Text Pre-Processing Pipelines using DARTS: A Domain-Adaptive Approach for NLP Tasks. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (791kB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (4MB) | Preview

Abstract

Text pre-processing significantly impacts Natural Language Processing performance, yet practitioners rely on manual trial-and-error to optimize pre-processing pipelines. This research presents the first adaptation of Differentiable Architecture Search (DARTS) to automate text pre-processing optimization across diverse NLP domains. The framework transforms traditionally discrete pre-processing operations—sentiment amplification, negation handling, context enhancement, keyword extraction, entity recognition, and syntactic analysis—into differentiable components through continuous relaxation, enabling gradient-based optimization via bilevel learning and temperature annealing. Comprehensive experiments across three domains (IMDB movie reviews, fake news detection, and financial sentiment analysis) demonstrate the framework's effectiveness, achieving statistically significant F1-score improvements of 0.15% to 1.69% (p < 0.05) compared to strong transformer baselines. Notably, the framework automatically discovers domain-appropriate strategies: sentiment operations dominate for movie reviews (21.09%), entity recognition proves crucial for fake news (24.51%), while keyword extraction leads in financial text (27.10%). Despite computational overhead of approximately 10x training time, this one-time investment eliminates iterative manual pre-processing design. The research validates that pre-processing optimization can be successfully automated through architecture search, providing both theoretical insights into domain-adaptive pre-processing and practical tools for improving NLP pipelines. This work opens new research directions at the intersection of AutoML and NLP, demonstrating that neural architecture search principles extend effectively beyond model design to pre-processing optimization.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Zahoor, Sheresh
UNSPECIFIED
Subjects: Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Divisions: School of Computing > Master of Science in Artificial Intelligence
Depositing User: Ciara O'Brien
Date Deposited: 28 May 2026 14:17
Last Modified: 28 May 2026 14:19
URI: https://norma.ncirl.ie/id/eprint/9324

Actions (login required)

View Item View Item