What Supervision Scales? Practical Learning Through Interaction

Campo, Carlos Florensa

PDF

Description

To have an agent learn useful behaviors, we must be able to specify what are the desired outcomes. This supervision can come in many forms, like the reward in Reinforcement Learning (RL), the target state-action pairs in Imitation Learning, or the dynamics model in motion-planning. Each form of supervision needs to be evaluated along three cost dimensions that dictate how well it scales: how much domain knowledge is required to provide that supervision form, what is the total volume of interaction under such supervision needed to learn a task, and how does this amount increases for every new task that the agent has to learn. For example, guiding rewards provided at every time-step might speed up an RL algorithm, but it is hard to design such dense rewards that are easy-to-provide and induce a solution to the task at hand, and the design process must be repeated for every new task; On the other hand, a completion signal is a weaker form of supervision because non-expert users can specify the objective to many tasks in this way, but unfortunately standard RL algorithms struggle with such sparse rewards. In the first part of this dissertation we study how overcome this limitation by means of learning hierarchies over re-usable skills. In the second part of this dissertation, we extend the scope to explicitly minimize the supervision needed to learn distributions of tasks. This paradigm shifts the focus away from the complexity of learning a single task, hence paving the way towards more general agents that efficiently learn from multiple tasks. To achieve this objective, we propose two automatic curriculum generation methods. In the third part of the dissertation, we investigate how to leverage different kinds of partial experts as supervision. First we propose a method that does not require any reward, and is still able to largely surpass the performance of the demonstrator in goal-reaching tasks. This allows to leverage sub-optimal “experts", hence lowering the cost of the provided supervision. Finally we explore how to exploit a rough description of a task and an “expert" able to operate in only parts of the state-space. This is a common setting in robotic applications where the model provided by the manufacturer allows to execute efficient motion-planning as long as there’s no contacts or perception errors, but fails to complete the last contact-rich part of the task, like inserting a key. These are all key pieces to provide supervision that scales to generate robotic behavior for practical tasks.

Details

Title

What Supervision Scales? Practical Learning Through Interaction

Creator

Campo, Carlos Florensa, Author

Published

2020-05-30

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Type

Text

Format

technical reports

Extent

118 p

Language

eng

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket