Autonomous skill acquisition has the potential to dramatically expand the tasks robots can perform in settings ranging from manufacturing to household robotics. Reinforcement learning onullers a general framework that enables skill acquisition solely from environment interaction with little human supervision. As a result, reinforcement learning presents itself as a scalable approach for widespread adoption of robotic agents. While reinforcement learning has achieved tremendous success, it has been limited to simulated domains; such as video games, computer graphics, and board games. Its most promising methods typically require large amount of interaction with the environment to learn optimal policies. In real robotic systems, signi cant interaction can cause wear and tear, create unsafe scenarios during the learning process, or become prohibitively time consuming to enable potential applications. One promising venue to minimize the interaction between the agent and environment are the methods under the umbrella of model-based reinforcement learning. Model-based methods are characterized by learning a predictive model of the environment that is used for learning a policy or planning. By exploiting the structure of the reinforcement learning problem and making a better use of the collected data, model-based methods can achieve better sample complexity. Previous to this work, model-based methods were limited to simple environments and tended to achieve lower performance than model-free methods. In here, we illustrate the model-bias problem: the set of dinullculties that prevent typical model- based methods to achieve optimal policies; and propose solutions that tackle model-bias. The methods proposed are able to achieve the same asymptotic performance as model- free methods while being two orders of magnitude more sample enullcient. We unify these methods into an asynchronous model-based framework that allow fast and enullcient learning. We successfully learn manipulation policies, such as block stacking and shape matching, on the real PR2 robot within 10 min of wall-clock time. Finally, we take a further step towards real-world robotics and propose a method that can enullciently adapt to changes in the environment. We showcase it on a real 6-legged robot navigating on dinullerent terrains, like grass and rock.




Download Full History