Redmond, Stephen (2017) A novel text mining approach using a bipartite graph projected onto two dimensions. Masters thesis, National College of Ireland.
Preview |
PDF
- Accepted Version
Download (1MB) | Preview |
Abstract
The collection of text data is exploding. From blogs to news reports, to helpdesk tickets, there seems to be a never-ending supply of writings. The owners of these data see methods to group texts and look for clusters of topics. Because of the size of the data, solutions that scale on clustered computer solutions are ideal. The traditional term vector approach can lead to the curse-of-dimensionality. Simple solutions are better than complex because it is often necessary to explain the model to either business users or even regulators. This paper demonstrates a method of keyword mining using the graph-of-words technique and classification by projecting the bipartite graph of terms and documents onto two dimensions. This method can be scaled using a cluster computing technology such as Apache Spark, and the results are easily surfaced to users.
Item Type: | Thesis (Masters) |
---|---|
Subjects: | Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4150 Computer Network Resources |
Divisions: | School of Computing > Master of Science in Data Analytics |
Depositing User: | Timothy Lawless |
Date Deposited: | 17 Aug 2018 11:15 |
Last Modified: | 17 Aug 2018 11:15 |
URI: | https://norma.ncirl.ie/id/eprint/3072 |
Actions (login required)
View Item |