Go to main content

PDF

Description

Multimodal learning, which consists of building models that can take information from different modalities as input, is growing in popularity due to its potential. Deep learning-based multimodal models can be applied to a variety of downstream tasks such as video description, sentiment analysis, event detection, cross-modal translation, and cross-modal retrieval. Inherently, we can expect multimodal models to outperform unimodal models because the additional modalities provide more information. The way humans experience and learn is multimodal, as we combine multiple senses to experience the world around us. In the ideal case, we assume completeness of data, meaning that all modalities are entirely present. However, this assumption is not always guaranteed at test time, meaning that it is necessary to create multimodal models robust to missing modalities in real-world applications. We choose to address this missing modality problem during test time by comparing several feature reconstruction methods on multimodal emotion recognition datasets.

Details

Files

Statistics

from
to
Export
Download Full History
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS