PDF

Description

In recent years, artificial learning systems have demonstrated tremendous advances on a number of challenging domains such as computer vision, natural language processing and speech recognition. A striking characteristic of these recent advances has been the seemingly simple formula of combining flexible deep function approximators with large datasets collected for specific problems. These systems struggle however to leverage their learned capabilities when generalizing to new inputs for acquiring new capabilities, often requiring re-training from scratch on a similarly large dataset from scratch. This is in stark contrast to humans, who have a remarkable ability to build upon their prior experiences and learn new concepts from only a few examples. In the first part of this thesis, we will study the question of how to construct systems that mimic this ability to adapt rapidly to new tasks. One of the core principles that will underlie this part of the thesis will be to leverage structure in a large number of prior experiences/tasks to enable fast adaptation and uncertainty. We will start first by studying the setting of reward specification, a common challenge in reinforcement learning, and next study how a probabilistic framing of the meta-learning setting can enable reasoning under uncertainty.

In the second part of this thesis, given the established potential that a prior datasets of tasks can play in accelerating learning, we will ask the natural question of how to enable agents to collect data completely autonomously. This would remove the need of a human to ”curate” the dataset of tasks for the artificial agent and enable fully scalable never ending embodied learning. The central theme of the approach we take will be to consider the online real world nature of “tasks” that an agent must solve, and through it revisit the basic assumptions of episodic RL. Finally, we will conclude with a demonstration of these ideas in the domain of real world dexterous manipulation and provide some hints for future work in this more "autonomous" reinforcement learning setting.

Details

Files

Statistics

from
to
Export
Download Full History