Multi-task learning has proven to be effective when applied to cross-modal retrieval, the process of making a query in one modality and retrieving relevant results in a different modality. However, the performance of multi-task learning can vary greatly depending on hand-tuned task weights. These task weights determine how much each task in the cross-modal retrieval model contributes to the model’s overall loss function and training process. Recent studies have shown that these task weights can be learned during a model’s training process. The learned task weights are a measure of the amount of uncertainty in each task, as the model should learn to de-emphasize more uncertain tasks and focus its learning on tasks with less uncertainty. These learned task weights can be particularly useful in the setting of cross-modal retrieval because multimedia datasets commonly contain large amounts of label noise and uncertainty, so task weights should reflect this uncertainty accordingly. However, the behavior of the model with learned task weights could be very unpredictable in the presence of label noise. In this report, we first show how learned task weights react to noisy polynomial data with added label noise, and we compare these results to a multi-task learning model with hand-tuned weights. We also analyze the conditions in which learned task weights are robust to label noise in image classification tasks. We then apply these findings to a cross-modal retrieval model to better understand how factors such as the complexity of the tasks and the modalities being used affect how robust the learned task weight model is to label noise. We find that with high noise datasets, uncertainty-based learned task weights are insufficient when learning task weights and that task complexity must also be considered. Task complexity should also be accounted for in the form of dimensionality reduction or as a learned parameter when learning task weights with noisy data.




Download Full History