Description
The advent of deep convolutional networks has powered a new wave of progress in visual recognition. These learned representations vastly outperform hand-engineered features, achieving much higher performance on visual tasks while generalizing better across datasets. However, as general as these models may seem, they still suffer when there is a mismatch between the data they were trained on and the data they are being asked to operate on. Domain adaptation offers a potential solution, allowing us to adapt networks from the source domain that they were trained on to new target domains, where labeled data is sparse or entirely absent. However, before the rise of end-to-end learnable representations, visual domain adaptation techniques were largely limited to adapting classifiers trained on top of fixed, hand-designed visual features. In this thesis, we show how visual domain adaptation can be integrated with deep learning to directly learn representations that are resilient to domain shift, thereby enabling models to generalize beyond the source domain.
In Chapter 2 we demonstrate how we can design losses that attempt to measure how different two domains are. We show that by optimizing representations to minimize these losses, we can learn representations that generalize better from source to target.
In Chapters 3 and 4, rather than hand-designing these domain losses, we show that we can train models that attempt to measure domain discrepancy. Since these models are themselves end-to-end learnable, we can backpropagate through them to learn representations that minimize the learned discrepancy. This is similar in concept to Generative Adversarial Networks, and we additionally explore the relationship between the two and how we can use techniques developed for GANs in an adversarial setting as well.
Finally, in Chapters 5 and 6, we show that adaptation need not be limited to intermediate features of a deep network. Adversarial adaptation techniques can also be used to train models that directly alter the pixels of images, transforming them into cross-domain analogues. These transformed images can then be used as a labeled pseudo-target dataset to learn supervised models better suited for the target domain. We show that this technique is complementary to feature-based adaptation, yielding even better performance when the two are combined.