NORMA eResearch @NCI Library

Evaluation of Multimodal Transformer Data Fusion Techniques

Bamikole, David Oluwatimilehin (2024) Evaluation of Multimodal Transformer Data Fusion Techniques. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (1MB) | Preview
[thumbnail of Configuration Manual]
Preview
PDF (Configuration Manual)
Download (1MB) | Preview

Abstract

Good context provides better insight into understanding a message, and this can be obtained by extracting information from different mediums through which the message is passed. In cases where the medium used does not provide a complete insight to best understand the message, assumptions are generated based on the extracted context or the message is left un-understood, both of which do not lead to good comprehension of the message. This also applies to the use of a single modal data such as text, audio or video for machine learning tasks. However, using multiple modalities requires the fusion of data from different modalities. There are existing data fusion strategies such as feature-level, decision-level and hybrid fusion approaches, all of which produce different levels of effectiveness along with several corresponding attributes. This resulted in this research work where data fusion techniques for multimodal transformers were evaluated. The CMU-MOSI dataset which has audio, textual and visual modalities was used to implement early concatenation fusion, cross-modal attention fusion and hierarchical modal attention fusion. The best hyperparameter was obtained for each strategy. Using the mean absolute error (MAE), Pearson coefficient correlation, parameter size and training time to evaluate the performance of the models, the hierarchical model performs best with 0.0111 MAE and 0.5509 coefficient score but also the largest and slowest model. The cross-modal transformer has the smallest parameter size and the early concatenation fusion has the fastest speed.

Item Type: Thesis (Masters)
Supervisors:
Name
Email
Anand, Devanshu
UNSPECIFIED
Uncontrolled Keywords: Multimodal transformer; data fusion techniques; early concatenation fusion; cross-modal attention fusion; hierarchical modal attention fusion
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
H Social Sciences > HM Sociology > Information Science > Communication
Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning
Divisions: School of Computing > Master of Science in Artificial Intelligence
Depositing User: Ciara O'Brien
Date Deposited: 17 Jun 2025 18:34
Last Modified: 17 Jun 2025 18:34
URI: https://norma.ncirl.ie/id/eprint/7898

Actions (login required)

View Item View Item