Learning when Objectives are Hard to Specify

Bhatia, Kush

PDF

Description

Deploying learning systems in the real-world requires aligning their objectives with those of the humans they interact with. Existing algorithmic approaches for this alignment try to infer these objectives through human feedback. The correctness of these algorithms crucially depends on several simplifying assumptions on 1) how humans represent these objectives, 2) how humans respond to queries given these objectives, and 3) how well the hypothesis space represents these objectives. In this thesis, we question the robustness of existing approaches to misspecifications in these assumptions and develop principled approaches to overcome such misspecifications.

We begin by studying misspecifications in the hypothesis class assumed by the learner and propose an agnostic learning setup where we demonstrate that all existing approaches based on learning from comparisons would incur constant regret. We further show that it is necessary for humans to provide more detailed feedback in the form of higher-order comparisons and obtain sharp bounds on the regret as a function of the order of comparisons. Next, we focus on misspecifications in human behavioral models and establish, through both theoretical and empirical analyses, that inverse RL methods can be extremely brittle in worst case. However, under reasonable assumptions, we exhibit that these methods do exhibit robustness and are able to recover underlying reward functions up to a small error term. We then proceed to study misspecifications in assumptions on how humans represent objective functions. We begin by showing that taking a uni-criterion approach to modeling human preferences fails to capture real-world human objectives and propose a new multi-criteria comparison based framework which overcome these limitations. In the next part, we shift our focus to hand-specified reward functions in reinforcement learning, an alternative to learning rewards from humans. We empirically study the effects of such misspecifications showing that over-optimizing such proxy rewards can hurt performance in the long run.

Details

Title

Learning when Objectives are Hard to Specify

Creator

Bhatia, Kush, Author

Published

EECS Department, University of California at Berkeley, Berkeley, California, 6/22/2022

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Type

Text

Format

technical reports

Extent

186 p

Language

eng

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket