Bhujabl, Rutuja Vishnu (2023) Building a question-answering system to extract information from PDF files using BERT transformers. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (924kB) | Preview |
Preview |
PDF (Configuration Manual)
Download (2MB) | Preview |
Abstract
The comprehension of complex PDF such as research documents, clinical reports, and scientific manuals is a time-consuming task. Previous studies have demonstrated significant success in building question-answering systems to provide contextually relevant answers to user queries. However, addressing puzzling questions within a single end-to-end trained ML model remains a rigorous task. Such systems require a huge amount of labeled training data to train the base models for specific tasks. The creation of such datasets is still a challenge for complicated documents like the annual reports of big tech companies. This research paper addresses this challenge by focusing on the construction of a question-answering system tailored for PDF files, specifically targeting domains such as finance, biomedicine, and scientific literature. Curated data sets for the PDF from chosen domains were created manually for the evaluation. Pre trained Bidirectional Encoder Representations from Transformers (BERT) Models from the Hugging Face Library were utilized for the chosen domains and evaluated with an F1 score. A score of 44% was achieved for the BERT Large.
Item Type: | Thesis (Masters) |
---|---|
Supervisors: | Name Email Menghwar, Teerath Kumar UNSPECIFIED |
Uncontrolled Keywords: | Question answering; Bidirectional Encoder Representations from Transformers |
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science Q Science > QA Mathematics > Computer software T Technology > T Technology (General) > Information Technology > Computer software Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Ciara O'Brien |
Date Deposited: | 07 May 2025 11:46 |
Last Modified: | 07 May 2025 11:46 |
URI: | https://norma.ncirl.ie/id/eprint/7502 |
Actions (login required)
![]() |
View Item |