NORMA eResearch @NCI Library

Safety Report Topic Classification with Transformer-based Data Augmentation

Payne, Jason (2022) Safety Report Topic Classification with Transformer-based Data Augmentation. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (2MB) | Preview
[thumbnail of Configuration manual]
Preview
PDF (Configuration manual)
Download (891kB) | Preview

Abstract

Safety Leading Indicators (SLIs) are incident traits, sometimes referred to as weak signals, that, when tracked, enable organisations to proactively plan actions to mitigate significant incident occurrences. This research presents an implementation method for SLIs based on topic classification of safety reports. Rather than imposing a mandatory reporting format, indirect implementation based on text/content analysis significantly reduces implementation complexity and potential for KPI exploitation/manipulation. The method works in low and unlabelled data regimes and is independent of reporting systems, formats and taxonomies. A new multi-label rule-based approach was developed to assign crafted SLI categories to unlabeled safety reports. This labelled data was then used to fine-tune pre-trained Language Models (LMs) for advanced Transformer-based Data Augmentation (TrDA). TrDA was combined with conventional text augmentation techniques to train performant supervised topic classifiers using Bidirectional LSTM (Bi-LSTM) models. The Bi-LSTM models were shown to outperform the upstream rule-based methods on new/unseen data. The proposed methodology is organisation and process agnostic, and the solution is practically deployable via commonly available cloud services.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Iqbal, Zahid
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
P Language and Literature > P Philology. Linguistics > Computational linguistics. Natural language processing
H Social Sciences > HD Industries. Land use. Labor > Issues of Labour and Work > Health and Safety at Work.
Divisions: School of Computing > Master of Science in Data Analytics
Depositing User: Tamara Malone
Date Deposited: 23 May 2023 16:23
Last Modified: 23 May 2023 16:23
URI: https://norma.ncirl.ie/id/eprint/6632

Actions (login required)

View Item View Item