It is typical to ignore the cost of computing an observation or action in the perception-action loop, such that the agent is free to sense the environment and prepare its decision at length before time steps forward. Decision making and dynamics of the environment are treated as synchronous. Yet, the need to act and react efficiently is a basic constraint in natural environments that shapes the behavior of animals. We consider a setting in which the environment is asynchronous, and the computational components of decision making such as observation, prediction, and action selection involve associated costs. The costs of operations affect the policy by inducing a need to trade off between them, and can be incorporated in the learning setting as an intrinsic reward function. As a first attempt, we develop a simple hierarchical approach that adaptively chooses between OPs – explicitly oserving or implicitly predicting – in order to update its hidden state, and analyze emergent strategies on a number of environments involving partial observability and stochastic dynamics.




Download Full History