Description
Data efficient representation learning focuses on learning useful representations with less data (labeled or unlabeled), which as discussed throughout this dissertation, can be particularly important for applications with limited data availability. Label efficient representation learning focuses on learning useful representations with little or no human annotations for the training data. As will be discussed, this is important for applications where it is often difficult or impossible to obtain accurately labeled data, such as in privacy sensitive fields or for applications with highly ambiguous label definitions.
The four chapters in this dissertation that address these topics include: (1) SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning, which explored how to develop augmentation policies with little/no labeled training data and small amount of unlabeled data for unsupervised learning pipelines. (2) Data Efficient Self-Supervised Representation Learning, which explored how to leverage a form of hierarchical pretraining for 80x more data efficient pretraining. (3) Region Similarity Representation Learning, which explored one of the first methods for learning region-level representation by performing contrastive learning at a region (patch-based) level and let to substantial improvements for downstream tasks such as object detection/segmentation when few labeled data were available. (4) Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning, which explored methods for leveraging known scale information for geospatial representation learning.