Kasturi, Yasaswini (2025) Automated Meta-Optimization of Text Pre-Processing Pipelines using DARTS: A Domain-Adaptive Approach for NLP Tasks. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (791kB) | Preview |
Preview |
PDF (Configuration Manual)
Download (4MB) | Preview |
Abstract
Text pre-processing significantly impacts Natural Language Processing performance, yet practitioners rely on manual trial-and-error to optimize pre-processing pipelines. This research presents the first adaptation of Differentiable Architecture Search (DARTS) to automate text pre-processing optimization across diverse NLP domains. The framework transforms traditionally discrete pre-processing operations—sentiment amplification, negation handling, context enhancement, keyword extraction, entity recognition, and syntactic analysis—into differentiable components through continuous relaxation, enabling gradient-based optimization via bilevel learning and temperature annealing. Comprehensive experiments across three domains (IMDB movie reviews, fake news detection, and financial sentiment analysis) demonstrate the framework's effectiveness, achieving statistically significant F1-score improvements of 0.15% to 1.69% (p < 0.05) compared to strong transformer baselines. Notably, the framework automatically discovers domain-appropriate strategies: sentiment operations dominate for movie reviews (21.09%), entity recognition proves crucial for fake news (24.51%), while keyword extraction leads in financial text (27.10%). Despite computational overhead of approximately 10x training time, this one-time investment eliminates iterative manual pre-processing design. The research validates that pre-processing optimization can be successfully automated through architecture search, providing both theoretical insights into domain-adaptive pre-processing and practical tools for improving NLP pipelines. This work opens new research directions at the intersection of AutoML and NLP, demonstrating that neural architecture search principles extend effectively beyond model design to pre-processing optimization.
| Item Type: | Thesis (Masters) |
|---|---|
| Supervisors: | Name Email Zahoor, Sheresh UNSPECIFIED |
| Subjects: | Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing |
| Divisions: | School of Computing > Master of Science in Artificial Intelligence |
| Depositing User: | Ciara O'Brien |
| Date Deposited: | 28 May 2026 14:17 |
| Last Modified: | 28 May 2026 14:19 |
| URI: | https://norma.ncirl.ie/id/eprint/9324 |
Actions (login required)
![]() |
View Item |
Tools
Tools