Description
Recent breakthroughs in computer vision and natural language processing have been largely propelled by scaling up both dataset diversity as well as model capacity, leading to robust generalization. In this thesis I am addressing the question of 1) whether for learning-based robotic manipulation we can similarly scale up dataset diversity and model capacity in order to achieve generalization and adaptation to new scenes and environments, new objects, new tasks and even different types of robots, and 2) the question of how re-collecting data from scratch for every new task and environment can be avoided, since this often leads to poor generalization and performance. To answer these questions we propose two different methodologies, a model-based reinforcement learning approach based on video-prediction, and a model-free and imitation-learning-based approach. We collect several of the biggest robotic interaction datasets to date, and show that by leveraging and effectively reusing diverse prior datasets, we can allow an agent to generalize to never-before-seen objects, learn new tasks based on only a handful of demonstrations, and even adapt to new robot types.