NORMA eResearch @NCI Library

Extractive text summarization of image extracted text

Addya, Sufal (2020) Extractive text summarization of image extracted text. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (3MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (306kB) | Preview

Abstract

Text summarization is a huge field in text analytics, research is tried to propose an unique approach to find text summarization from images. Optical character recognition using PyTesseract with OpenCV perform very well to extract text from images and research applied two unsupervised extractive text summarization algorithms Textrank and TF-IDF algorithms on that text to find a meaningful summary. This proposed sequence of program pipeline produce a very attractive output with can be applied in future to implement in making text summarization application. Here, Tesseract with OpenCV perform outstanding to extract the text and two extractive summarization algorithm produce a meaningful extractive summary successfully but evaluating accuracy of generated summary is a challenging part of this research which needs to overcome in future.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
Q Science > QA Mathematics > Computer software
T Technology > T Technology (General) > Information Technology > Computer software
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Dan English
Date Deposited: 22 Jan 2021 10:20
Last Modified: 22 Jan 2021 10:20
URI: https://norma.ncirl.ie/id/eprint/4430

Actions (login required)

View Item View Item