Go to main content

PDF

Description

Learning affordances of unseen objects is an important aspect of learning how to interact with and understand the world. However, current research on this subject is restricted to small datasets that are limited in variety. Recent efforts developing weakly-supervised approaches show progress in making affordance prediction more generalized, but performance gaps remain from supervised methods. This paper attempts to close this gap by proposing that affordance prediction is primarily a representation learning task between inactive images and active videos. As a result, this paper proposes using an unsupervised representation learning task that can be be trained on images not in the training data. The final model improves robustness of learning Grounded Interactions Hotspots to changes in model types and to out-of-domain objects and further bridges the gap between weak and strong supervision approaches all while requiring no extra parameters.

Details

Files

Statistics

from
to
Export
Download Full History
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS