Defensive AI for Customer Service Chatbots: Detecting and Mitigating Adversarial Prompt Injections

Boyce, Alan

Defensive AI for Customer Service Chatbots: Detecting and Mitigating Adversarial Prompt Injections

Tools

Boyce, Alan (2025) Defensive AI for Customer Service Chatbots: Detecting and Mitigating Adversarial Prompt Injections. Masters thesis, Dublin, National College of Ireland.

Preview	PDF (Master of Science) Download (992kB) \| Preview
Preview	PDF (Configuration Manual) Download (607kB) \| Preview

Abstract

The rise of Large Language Models (LLMs) in customer service chatbots has created new opportunities for faster, more scalable support while introducing new vulnerabilities, particularly in the form of prompt injection attacks. These attacks exploit the LLMs, which are designed to understand and respond to virtually any input. This allows malicious users to bypass restrictions or expose confidential data from the system. Although many defensive strategies have been proposed, most remain static and fail to adapt to evolving threats or multi-turn adversarial patterns.

This research proposes a proof-of-concept hybrid adversarial detection system that combines a fine-tuned RoBERTa classifier alongside a session-aware Graph Neural Network (GNN) capable of modelling conversational context. The system also includes a logging and feedback loop to support retraining over time, adapting to emerging threats. The chatbot utilises the Mistral-7B model, accessed through a Streamlit app frontend and backed by an SQLite database for logging interactions.

Evaluation was conducted using a structured test suite alongside real-world Man-In-The-Middle (MITM) simulation attacks. The baseline chatbot, with minimal defences, displayed detection accuracy of 52.00%, failing to block a significant number of adversarial inputs. The classifier alone improved accuracy to 87.00%, while the complete hybrid system saw that figure rise to 95.00% in both structured testing and real-world scenarios. These results highlight the effectiveness of combining static and contextual reasoning in adversarial detection.

As a proof-of-concept, this work demonstrates the viability and efficacy of adaptive, multi-layered defences in protecting LLM-powered chatbots and lays a foundation for future research into self-improving, context-aware systems.

Item Type:	Thesis (Masters)
Supervisors:	Name Email Mustafa, Raza Ul UNSPECIFIED
Subjects:	Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QH Natural history > QH301 Biology > Methods of research. Technique. Experimental biology > Data processing. Bioinformatics > Artificial intelligence Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Artificial intelligence P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing Q Science > QA Mathematics > Computer software > Computer Security T Technology > T Technology (General) > Information Technology > Computer software > Computer Security
Divisions:	School of Computing > Master of Science in Cyber Security
Depositing User:	Ciara O'Brien
Date Deposited:	15 Jun 2026 12:45
Last Modified:	15 Jun 2026 12:45
URI:	https://norma.ncirl.ie/id/eprint/9348

Actions (login required)

View Item