PDF

Description

Learning effective policies with model-based reinforcement learning is highly dependent on the accuracy of the dynamics model. Recently, a new parametrization called the trajectory-based model was introduced, which takes in an initial state, a future time index, and control policy parameters, and returns the state at that future time index [3]. This new method has demonstrated improved prediction accuracy in long horizons, increased sample efficiency, and ability to predict the task reward. However, this model has limited transferability to MBRL due to the limited expressivity of its low-dimensional control policy parameter inputs. In this work, we look at the effectiveness of the trajectory-based model at predicting environment dynamics with higher-dimensional and expressive neural network control policies. The trajectory-based model has demonstrated some capability in learning from these neural network policies, and still outperforms the traditional state-action one-step model due to less compounding error.

Details

Files

Statistics

from
to
Export
Download Full History