PDF

Description

Markov Decision Processes (MDPs) allow us to build policies for maximizing the expectation of an objective in stochastic environments where the state of the world is fully observed. Partially Observable MDPs (POMDPs) give us machinery for planning when we have additional uncertainty over the state of the world. We can make a similar jump from MDPs to characterize uncertainty over other elements of the environment. Namely, we can also have uncertainty over the transition and reward functions in an MDP. Here, we introduce new classes of uncertain MDPs for dealing with these kinds of uncertainty and present several folk theorems showing that certain subsets of these can be reduced to the standard POMDP with uncertainty only over the state. We are particularly interested in developing these frameworks to explore applications to Negotiable Reinforcement Learning, a method for dynamically balancing the utilities of multiple actors.

Details

Files

Statistics

from
to
Export
Download Full History