Ramesh, Akshai, Uhana, Haque Usuf, Parthasarathy, Venkatesh Balavadhani, Haque, Rejwanul and Way, Andy (2021) Augmenting Training Data for Low-Resource Neural Machine Translation via Bilingual Word Embeddings and BERT Language Modelling. In: International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1-8. ISBN 978-1-6654-3900-8
Full text not available from this repository.Abstract
Neural machine translation (NMT) is often described as ‘data hungry’ as it typically requires large amounts of parallel data in order to build a good-quality machine translation (MT) system. However, most of the world's language-pairs are low-resource or extremely low-resource. This situation becomes even worse if a specialised domain is taken into consideration for translation. In this paper, we present a novel data augmentation method which makes use of bilingual word embeddings (BWEs) learned from monolingual corpora and bidirectional encoder representations from transformer (BERT) language models (LMs). We augment a parallel training corpus by introducing new words (i.e. out-of-vocabulary (OOV) items) and increasing the presence of rare words on both sides of the original parallel training corpus. Our experiments on the simulated low-resource German–English and French–English translation tasks show that the proposed data augmentation strategy can significantly improve state-of-the-art NMT systems and outperform the state-of-the-art data augmentation approach for low-resource NMT.
Item Type: | Book Section |
---|---|
Uncontrolled Keywords: | Machine translation; Neural machine translation; Transformer; Language modelling |
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science P Language and Literature > P Philology. Linguistics > Language Services |
Divisions: | School of Computing > Staff Research and Publications |
Depositing User: | Clara Chan |
Date Deposited: | 01 Oct 2021 15:16 |
Last Modified: | 07 Feb 2022 11:47 |
URI: | https://norma.ncirl.ie/id/eprint/5080 |
Actions (login required)
View Item |