Extracting and Using Preference Information from the State of the World

Shah, Rohin

PDF

Description

Typically when learning about what people want and don't want, we look to human action as evidence: what reward they specify, how they perform a task, or what preferences they express can all provide useful information about what an agent should do. This is essential in order to build AI systems that do what we intend them to do. However, existing methods require a lot of expensive human feedback in order to learn even simple tasks. This dissertation argues that there is an additional source of information that is rather helpful: the state of the world.

The key insight of this dissertation is that when a robot is deployed in an environment that humans have been acting in, the state of the environment is already optimized for what humans want, and is thus informative about human preferences.

We formalize this setting by assuming that a human H has been acting in an environment for some time, and a robot R observes the final state produced. From this final state, R must infer as much as possible about H's reward function. We analyze this problem formulation theoretically and show that it is particularly well suited to inferring aspects of the state that should not be changed -- exactly the aspects of the reward that H is likely to forget to specify. We develop an algorithm using dynamic programming for tabular environments, analogously to value iteration, and demonstrate its behavior on several simple environments. To scale to high-dimensional environments, we use function approximators judiciously to allow the various parts of our algorithm to be trained without needing to enumerate all possible states.

Of course, there is no point in learning about H's reward function unless we use it to guide R's decision-making. While we could have R simply optimize the inferred reward, this suffers from a "status quo bias": the inferred reward is likely to strongly prefer the observed state, since by assumption it is already optimized for H's preferences. To get R to make changes to the environment, we will usually need to integrate the inferred reward with other sources of preference information. In order to support such reward combination, we use a model in which R must maximize an unknown reward function known only to H. Learning from the state of the world arises as an instrumentally useful behavior in such a setting, and can serve to form a prior belief over the reward function that can then be updated after further interaction with H.

Details

Title

Extracting and Using Preference Information from the State of the World

Creator

Shah, Rohin, Author

Published

EECS Department, University of California at Berkeley, Berkeley, California, 12/17/2020

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Type

Text

Format

technical reports

Extent

120 p

Language

eng

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket