Before applying an AI system to any real-world problem, it is first necessary to explicitly specify an objective for the system to solve. Whether this is a loss function over a dataset, an MDP to plan in, or a simulator in which to train an RL agent, we must eventually write down a specification that reduces our vague real-world problem to an explicit computational problem. In this thesis, I view writing an explicit problem specification as an act of communication. In this way, a problem specification is more effective if it better achieves the ends of the AI designer in communicating the true problem specification.

We begin by formalizing the problem faced by the designer, which we call the Specification Design Problem. In this problem, the designer must choose between a set of available problem specifications to be solved and aims to select one whose solution performs well on the designer's true unspecified problem. The designer's goal in the specification design problem is to minimize the specification error -- the utility the designer has lost by simply writing down the wrong problem. We show that specification error decomposes into underspecification error, caused by leaving out helpful information, and misspecification error, caused by specifying incorrect information. Furthermore, we can analyze each of these errors in turn through a new measure of the value of information and a novel geometric analysis respectively. These tools allow us to provide describe when the AI designer should stop specifying.

In the second part, we use this perspective to study a class of problem specifications often used in practice to mitigate misspecification errors. Such specifications, including maximizing worst-case utility, are often used to avoid specifying complex features of the true problem. These approaches contrast the Bayesian framework, which requires fixing some distribution for that feature. However, it has yet to be shown that such approaches are strictly necessary. One may hope we could eventually replace them with an appropriate model of uncertainty within the Bayesian framework. However, we provide an example where a specification resembling maximizing worst-case utility is practically unavoidable, as any explicit specification within the standard Bayesian framework requires specifying the task in excessive and impractical detail. This example shows that, counterintuitively, the optimal solution for a Bayesian AI designer in a specification design problem may be to create a non-Bayesian agent.

To match the needs of the AI designer in such specification design problems, I provide an extension to the Bayesian framework to include these helpful problem specifications. In addition, it enables us to create specifications that utilize other natural self-referential claims like “you are unlikely to be able to fix the engine”. Such claims are natural in everyday conversation and are critical to the efficient solution of a wide range of specification design problems.

Finally, I discuss how the problem specifications discussed in this theory can be effectively solved by scalable RL systems through automatically designing training environments. We formalize this approach as Unsupervised Environment Design (UED). We propose a UED algorithm called Protagonist Antagonist Induced Regret Environment Design (PAIRED), which finds minimax regret strategies as the Nash equilibrium of a 3-agent system. Finally, we show that this approach has benefits for promoting the transfer of the resulting policy and results in a natural automatic curriculum of increasing complexity.

Overall, by analyzing the specification design problem, understanding the effectiveness of self-referential claims in problem specifications, and allowing for their scalable implementation through UED, this thesis lays the necessary groundwork for a study of problem specification, well-grounded theoretically, empirically, and practically.




Download Full History