(This is a condensed version. See dissertation for the full version.)

Machine learning is the embodiment of an unapologetically data-driven philosophy that has increasingly become one of the most important drivers of progress in AI and beyond. Existing machine learning methods, however, entail making trade-offs in terms of computational efficiency, modelling flexibility and/or formulation faithfulness. In this dissertation, we will cover three different ways in which limitations along each axis can be overcome, without compromising on other axes.

Computational Efficiency

We start with limitations on computational efficiency. Many large-scale machine learning methods require performing nearest neighbour search under the hood. Unfortunately, all exact algorithms suffer from either the curse of ambient dimensionality, or of intrinsic dimensionality. In fact, despite 40+ years of research, no exact algorithm can run faster than naive exhaustive search when the intrinsic dimensionality is high, which is almost certainly the case in machine learning.

We introduce a new family of exact algorithms, known as Dynamic Continuous Indexing, which overcomes both the curse of ambient dimensionality and of intrinsic dimensionality. The key insight is that existing methods require distances between each point and a query to be approximately preserved in the data structure, whereas a method that only approximately preserves the *ordering* of nearby points relative to distant points would suffice. In practice, our algorithm achieves a 14 - 116x speedup and a 21x reduction in memory consumption compared to locality-sensitive hashing (LSH).

Modelling Flexibility

Next we move onto probabilistic modelling, which is critical to realizing one of the central objectives of machine learning, which is to model the uncertainty that is inherent in prediction. There is often a tradeoff between modelling flexibility and computational efficiency: simple models can often be learned straightforwardly and efficiently but are not expressive; complex models are expressive, but in general cannot be learned both exactly and efficiently.

Implicit probabilistic models, like generative adversarial nets (GANs), aim to get around this tradeoff by using a highly expressive function, e.g.: a neural net, in its sampling procedure. Unfortunately, GANs fall short of learning the underlying distribution because of mode collapse, i.e.: they can effectively ignore some arbitrary subset of the training data. We argue this arises from the direction in which generated samples are matched to the real data - by inverting this direction, we devise a new method, known as Implicit Maximum Likelihood Estimation (IMLE), which fundamentally overcomes mode collapse. This can be shown to be equivalent to maximizing a lower bound on the log-likelihood.

Formulation Faithfulness

Finally we introduce a novel formulation that can enable the automatic discovery of new iterative gradient-based optimization algorithms, which have become the workhorse of modern machine learning. This effectively allows us to apply machine learning to improve machine learning, which has been a dream of machine learning researchers since the early days of the field. The key challenge, however, is that it is unclear how to represent a complex object like an algorithm in a way that is amenable to machine learning.

We get around this issue by observing that an optimization algorithm can be uniquely characterized by its update formula - different iterative optimization algorithms only differ in their choice of the update formula. Therefore, if we can learn the update formula, we can then automatically discover new optimization algorithms. We approximate the update formula using a neural net - then by learning the parameters of the neural net, we can search over different yet-to-be-discovered optimization algorithms efficiently.




Download Full History