PDF

Description

In this work, we strive to narrow the gap between theory and practice in bonus-based exploration, providing some new connections between the UCB algorithm and Random Network Distillation as well as some observations about pre- and post-projection reward bonuses. We propose an algorithm that reduces to UCB in the linear case and empirically evaluate the algorithm in challenging exploration environments. In the Randomised Chain and Maze environments, our algorithm consistently outperforms Random Network Distillation in reaching unseen states during training.

Details

Files

Statistics

from
to
Export
Download Full History