NORMA eResearch @NCI Library

An Ensemble model to predict the classification of goods using text description

Hernández, Edith (2023) An Ensemble model to predict the classification of goods using text description. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (95kB) | Preview

Abstract

The main purpose of this research is to tackle the challenge of classifying 6-digit codes based on product descriptions. In order to achieve this, we will suggest an approach that combines NLP techniques, pre-trained word embedding and similarity search libraries.

There is a growing need, for effective methods to categorise products from large datasets, especially for customs authorities. The experiment intends to have the potential to improve the accuracy and efficiency of categorizing imported goods by leveraging advancements in Natural Language Processing (NLP) and deep learning. The research process will involve data collection, analysis, and experimental assessment. Every step is properly aligned with the CRISP-DM model.

Integrating FAISS in the proposed experiment improves the accuracy in using RoBERTa classification, which achieves 80%. The opposite case using FAISS and Distilbert classification got less than 1%.

The expected outcomes include gaining an understanding of the challenges and possibilities associated with classifying goods as well as developing a practical solution that can be applied in various contexts.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Haque, Rejwanul
UNSPECIFIED
Uncontrolled Keywords: Natural Language Processing; HS code classification; RoBERTa; Similarity search
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
H Social Sciences > Economics > Business
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Ciara O'Brien
Date Deposited: 08 May 2025 16:11
Last Modified: 08 May 2025 16:11
URI: https://norma.ncirl.ie/id/eprint/7526

Actions (login required)

View Item View Item