NORMA eResearch @NCI Library

A novel text mining approach using a bipartite graph projected onto two dimensions.

Redmond, Stephen (2017) A novel text mining approach using a bipartite graph projected onto two dimensions. Masters thesis, National College of Ireland.

[thumbnail of MSc Research Project - Data Analytics - Stephen Redmond - x15021815.pdf]
Preview
PDF - Accepted Version
Download (1MB) | Preview

Abstract

The collection of text data is exploding. From blogs to news reports, to helpdesk tickets, there seems to be a never-ending supply of writings. The owners of these data see methods to group texts and look for clusters of topics. Because of the size of the data, solutions that scale on clustered computer solutions are ideal. The traditional term vector approach can lead to the curse-of-dimensionality. Simple solutions are better than complex because it is often necessary to explain the model to either business users or even regulators. This paper demonstrates a method of keyword mining using the graph-of-words technique and classification by projecting the bipartite graph of terms and documents onto two dimensions. This method can be scaled using a cluster computing technology such as Apache Spark, and the results are easily surfaced to users.

Item Type: Thesis (Masters)
Subjects: Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4150 Computer Network Resources
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Timothy Lawless
Date Deposited: 17 Aug 2018 11:15
Last Modified: 17 Aug 2018 11:15
URI: https://norma.ncirl.ie/id/eprint/3072

Actions (login required)

View Item View Item