NORMA eResearch @NCI Library

Document Clustering of Irish Government Circulars using Machine Learning Techniques

Amariei, Gabriel (2024) Document Clustering of Irish Government Circulars using Machine Learning Techniques. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (5MB) | Preview

Abstract

Text clustering has emerged as a powerful tool to address the issue of exponential growth in the volume of textual documents that are generated by organizations worldwide. It has enabled the organization of large document corpora into distinct groups based on content similarity thus enhancing the efficiency and effectiveness of information retrieval within vast collections.

In this study, we have clustered the Irish Government circulars with the goal of enhancing the accessibility and retrieval of information from these documents. Given the lack of prior categorization and the unknown number of clusters within this dataset, unsupervised learning methods were employed to discover the inherent structure of the documents. More specifically, we utilized three advanced document representation techniques: the TF-IDF, Word2Vec, and BERT, together with three clustering algorithms: K-Means, Eigenspace-based Fuzzy C-Means (EFCM), and a version of the Long Short-Term Memory (LSTM) neural network.

Our findings indicate that among the document representation techniques tested, Word2Vec outperformed both TF-IDF and BERT in effectively capturing the nuances of the documents within the Irish Government circulars. When it came to clustering, K-Means proved to be the most effective and consistent algorithm for this task. The exploratory use of the LSTM-based method showed promise, but further refinement and testing would be needed to fully assess its capabilities in this specific application.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Onwuegbuche, Faithful
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
J Political Science > JN Political institutions (Europe) > Ireland > Government Departments
J Political Science > JN Political institutions (Europe) > Ireland
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Artificial Intelligence
Depositing User: Ciara O'Brien
Date Deposited: 17 Jun 2025 18:18
Last Modified: 17 Jun 2025 18:18
URI: https://norma.ncirl.ie/id/eprint/7895

Actions (login required)

View Item View Item