Synergy of Prediction and Control in Model-based Reinforcement Learning

Lambert, Nathan

PDF

Description

Model-based reinforcement learning (MBRL) has often been touted for its potential to improve on the sample-efficiency, generalization, and safety of existing reinforcement learning algorithms. These model-based algorithms constrain the policy optimization during trial-and-error learning to include a structured representation of the environment dynamics. To date, the posited benefits have largely been left as directions for future work. This thesis attempts to illustrate the central mechanism in MBRL: how a learned dynamics model interacts with decision making. A better understanding of this interaction will point the field in the direction of enabling the posited benefits.

This thesis encompasses the interaction of model-learning with decision making with respect to two central issues: compounding prediction errors and objective mismatch. The compounding error challenge emerges from accumulating errors on recursive passes of any one-step transition model. Most dynamics models are trained for single-step accuracy, which often results in models with substantial long-term prediction error. Additionally, the model being trained for accurate transitions need not guarantee high-performance policies on the downstream task. The lack of correlation between model and policy metrics in separate optimization is coined and studied as Objective Mismatch.

These challenges are primarily studied in the context of sample-based model predictive control (MPC) algorithms, where the learned model is used to simulate trajectories and their resulting predicted rewards. To mitigate compounding error and objective mismatch, the trajectory-based dynamics model is a feedforward prediction parametrization containing a direct representation of time. This model represents one small, but important steps towards more useful dynamics models in model-based reinforcement learning. This thesis concludes with future directions on the synergy of prediction and control in MBRL, primarily focused on state-abstractions, temporal correlation, and future prediction methodologies.

Details

Title

Synergy of Prediction and Control in Model-based Reinforcement Learning

Creator

Lambert, Nathan, Author

Published

EECS Department, University of California at Berkeley, Berkeley, California, 5/11/2022

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Type

Text

Format

technical reports

Extent

108 p

Language

eng

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket