The desire to efficiently solve problems has driven humans to create tools to accomplish more with less. To be useful in a variety of contexts, a tool must encode knowledge of how to solve a general problem, knowledge that models the system that the tool manipulates. For most of human history, tools enabled humans to better manipulate only physical systems, such as using a lever for lifting heavy objects. These tools implicitly modeled the physical system via their specialized design. The computer is significant because it was the first universal tool for modeling and manipulating any system.

Unfortunately, this universality has historically been restricted to systems that only humans can manually model and manipulate, via code. Humans have long acted as the interface between computers and the physical world, but we will increasingly become the bottleneck to progress as computers become more powerful and the world becomes more complex. If we could build machines that automatically model and manipulate systems on their own, then we would solve more problems with less effort: we would need only specify what the problem is rather than bother with how to solve it.

The problem of building machines that automatically model and manipulate systems is not new and arguably encompasses the entire field of artificial intelligence (AI). Solving such a problem implies two things: first, that the machine can represent system interactions and second, that the machine can learn such representations automatically. What it means to represent system interactions is to represent the entities in the environment, the transformations that change the state of these entities, and choices the agent makes to apply these transformations. What it means to learn representations automatically is for these representations to be learned functions of the machine's raw sensorimotor stream. For such representations to be effective for automatically modeling and manipulating systems, they need to generalize over the combinatorial space of possible combinations of entities, of transformations, and of choices, and criterion that I call combinatorial generalization.

My central thesis is that there is a deep similarity between electronic circuits and neural networks, and that adapting the methods we invented almost a century ago for creating modular software programs on top of analog circuits can enable neural networks to exhibit similar generalization properties as software does. I argue that the principle of separation of concerns was the key design principle that enabled representations in software to generalize and that contextual refinement was the key technique that enabled us to implement the principle of separation of concerns at every level of the computing stack. This thesis presents various ways for how to instantiate contextual refinement in neural networks and shows the gains in combinatorial generalization that this technique brings.




Download Full History