NORMA eResearch @NCI Library

Evaluation of Large Language Models on MedQUAD Dataset

Shakeel, Muhammad Hassan (2024) Evaluation of Large Language Models on MedQUAD Dataset. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (980kB) | Preview

Abstract

In recent times, small-sized LLMs have outperformed bigger LLMs such as GPT2 for domain-specific tasks after fine-tuning. This paper fine-tunes small-sized LLMs such as Gemma-2 (2 billion), Phi-2 (2.7 billion) and Llama-2 (7 billion) parameters for question-answering task on MedQUAD dataset. Among Gemma-2, Phi-2 and Llama-2, Llama-2 has outperformed others with ROUGE-1=0.455, ROUGE2=0.289, ROUGE-L=0.373 and BLEU=0.275. On the dimensions of informativeness, relevance, grammaticality, naturalness and sentiment for human evaluation, the three models produced similar performance, however Llama-2 outperformed with the average score of 7.492. This paper observed a pattern observed a correlation between model parameter size and model performance, big model gives better performance compare to small models.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Trinh, Anh Duong
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Divisions: School of Computing > Master of Science in Artificial Intelligence
Depositing User: Ciara O'Brien
Date Deposited: 20 Jun 2025 10:37
Last Modified: 20 Jun 2025 10:37
URI: https://norma.ncirl.ie/id/eprint/7968

Actions (login required)

View Item View Item