Bamikole, David Oluwatimilehin (2024) Evaluation of Multimodal Transformer Data Fusion Techniques. Masters thesis, Dublin, National College of Ireland.
Preview |
PDF (Master of Science)
Download (1MB) | Preview |
Preview |
PDF (Configuration Manual)
Download (1MB) | Preview |
Abstract
Good context provides better insight into understanding a message, and this can be obtained by extracting information from different mediums through which the message is passed. In cases where the medium used does not provide a complete insight to best understand the message, assumptions are generated based on the extracted context or the message is left un-understood, both of which do not lead to good comprehension of the message. This also applies to the use of a single modal data such as text, audio or video for machine learning tasks. However, using multiple modalities requires the fusion of data from different modalities. There are existing data fusion strategies such as feature-level, decision-level and hybrid fusion approaches, all of which produce different levels of effectiveness along with several corresponding attributes. This resulted in this research work where data fusion techniques for multimodal transformers were evaluated. The CMU-MOSI dataset which has audio, textual and visual modalities was used to implement early concatenation fusion, cross-modal attention fusion and hierarchical modal attention fusion. The best hyperparameter was obtained for each strategy. Using the mean absolute error (MAE), Pearson coefficient correlation, parameter size and training time to evaluate the performance of the models, the hierarchical model performs best with 0.0111 MAE and 0.5509 coefficient score but also the largest and slowest model. The cross-modal transformer has the smallest parameter size and the early concatenation fusion has the fastest speed.
Item Type: | Thesis (Masters) |
---|---|
Supervisors: | Name Email Anand, Devanshu UNSPECIFIED |
Uncontrolled Keywords: | Multimodal transformer; data fusion techniques; early concatenation fusion; cross-modal attention fusion; hierarchical modal attention fusion |
Subjects: | Q Science > QA Mathematics > Electronic computers. Computer science T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science H Social Sciences > HM Sociology > Information Science > Communication Q Science > Q Science (General) > Self-organizing systems. Conscious automata > Machine learning |
Divisions: | School of Computing > Master of Science in Artificial Intelligence |
Depositing User: | Ciara O'Brien |
Date Deposited: | 17 Jun 2025 18:34 |
Last Modified: | 17 Jun 2025 18:34 |
URI: | https://norma.ncirl.ie/id/eprint/7898 |
Actions (login required)
![]() |
View Item |