Generating textual descriptions of environments in reinforcement learning can be difficult, especially in the absence of explicit signal for recognizing the relationships within them. In this work, we propose a curiosity-driven approach for collecting the trajectories necessary for generating textual descriptions of environments. We formulate the task of finding the exploration policy as a two-player game between the policy and a forward model which predicts state transitions. In addition, we propose a meta-training scheme allowing them to adapt to changes in the environment. We also propose a text generation scheme that helps to generate natural language descriptions from trajectories. Finally, we evaluate our model in the RTFM domain [65], in which two monster and weapon pairs exist. The agent is expected to learn a policy that helps it to figure out all the monster and weapon relation pairs through interaction with the environment and generate a descriptive natural language document that summarizes all the environment dynamics.




Download Full History