PDF

Description

Generating textual descriptions of environments in reinforcement learning can be difficult, especially in the absence of explicit signal for recognizing the relationships within them. In this work, we propose a curiosity-driven approach for collecting the trajectories necessary for generating textual descriptions of environments. We formulate the task of finding the exploration policy as a two-player game between the policy and a forward model which predicts state transitions. In addition, we propose a meta-training scheme allowing them to adapt to changes in the environment. We also propose a text generation scheme that helps to generate natural language descriptions from trajectories. Finally, we evaluate our model in the RTFM domain [65], in which two monster and weapon pairs exist. The agent is expected to learn a policy that helps it to figure out all the monster and weapon relation pairs through interaction with the environment and generate a descriptive natural language document that summarizes all the environment dynamics.

Details

Files

Statistics

from
to
Export
Download Full History