Amariei, Gabriel (2024) Document Clustering of Irish Government Circulars using Machine Learning Techniques. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (2MB) | Preview |
Preview |
PDF (Configuration Manual)
Download (5MB) | Preview |
Abstract
Text clustering has emerged as a powerful tool to address the issue of exponential growth in the volume of textual documents that are generated by organizations worldwide. It has enabled the organization of large document corpora into distinct groups based on content similarity thus enhancing the efficiency and effectiveness of information retrieval within vast collections.
In this study, we have clustered the Irish Government circulars with the goal of enhancing the accessibility and retrieval of information from these documents. Given the lack of prior categorization and the unknown number of clusters within this dataset, unsupervised learning methods were employed to discover the inherent structure of the documents. More specifically, we utilized three advanced document representation techniques: the TF-IDF, Word2Vec, and BERT, together with three clustering algorithms: K-Means, Eigenspace-based Fuzzy C-Means (EFCM), and a version of the Long Short-Term Memory (LSTM) neural network.
Our findings indicate that among the document representation techniques tested, Word2Vec outperformed both TF-IDF and BERT in effectively capturing the nuances of the documents within the Irish Government circulars. When it came to clustering, K-Means proved to be the most effective and consistent algorithm for this task. The exploratory use of the LSTM-based method showed promise, but further refinement and testing would be needed to fully assess its capabilities in this specific application.
Item Type: | Thesis (Masters) |
---|---|
Supervisors: | Name Email Onwuegbuche, Faithful UNSPECIFIED |
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science J Political Science > JN Political institutions (Europe) > Ireland > Government Departments J Political Science > JN Political institutions (Europe) > Ireland Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning |
Divisions: | School of Computing > Master of Science in Artificial Intelligence |
Depositing User: | Ciara O'Brien |
Date Deposited: | 17 Jun 2025 18:18 |
Last Modified: | 17 Jun 2025 18:18 |
URI: | https://norma.ncirl.ie/id/eprint/7895 |
Actions (login required)
![]() |
View Item |