We begin by presenting a class of models called neural module networks (NMNs) and their application to natural language question answering problems. NMNs are designed to simultaneously exploit the representational capacity of deep networks and the compositional linguistic structure of questions, in order to target question answering applications not well supported by standard logical approaches. Our approach decomposes questions into their linguistic substructures, and uses these structures to dynamically instantiate question-specific networks built from an inventory of reusable modules. The resulting compound networks are jointly trained. We evaluate our approach on datasets for question answering backed by images and structured knowledge bases.
Next, we apply the same modeling principles to family of policy learning problems. We describe a framework for multitask reinforcement learning guided by policy sketches. Sketches annotate each task with a sequence of named subtasks, providing information about high-level structural relationships among tasks, but not the detailed guidance required by previous work on learning policy abstractions for reinforcement learning (e.g. intermediate rewards, subtask completion signals, or intrinsic motivations). Our approach associates every subtask with its own modular subpolicy, and jointly optimizes over full task-specific policies by tying parameters across shared subpolicies. Experiments illustrate two main advantages of this approach: first, it outperforms standard baselines that learn task-specific or shared monolithic policies; second, it naturally induces a library of primitive behaviors that can be recombined to rapidly acquire policies for new tasks.
The final two chapters explore ways of using information from language the context of less explicitly structured models. First, we exhibit a class of problems in which the space of natural language strings provides a parameter space that captures natural task structure. We describe an approach that, in a pretraining phase, learns a language interpretation model that transforms inputs (e.g. images) into outputs (e.g. labels) given natural language descriptions. To learn a new concept (e.g. a classifier), we then propose to search directly in the space of descriptions to minimize the interpreter's loss on training examples. We then show that a related technique can be used to generate explanations of model behaviors: using the core insight that learned representations and natural language utterances carry the same meaning when they induce the same distribution over observations, we are able to automatically translate learned communication protocols into natural language.