Representation Learning for Perception and Control

Lakshminarayanan, Aravind Srinivas

PDF

Description

The goal of extracting reusable and rich representations that capture what you care about for downstream tasks remains challenging even though the field of deep learning has made tremendous progress in this direction. This thesis presents a few promising contributions to further that goal. The two axes of contributions are: (1) self-supervised (or unsupervised) representation learning; (2) deep neural network architectures powered by self-attention. Progress in architectures and the ability to leverage massive amounts of unlabeled data have been responsible for major advances in NLP such as GPT-x and BERT. This thesis presents small steps towards realizing such progress for perceptual and reinforcement learning tasks. This is a thesis by articles containing four articles, two focused on computer vision benchmarks, with the other two focused on reinforcement learning.

With respect to the first axis, the thesis presents three articles: (1) Data-Efficient Image Recognition using Contrastive Predictive Coding (CPCv2); (2) Contrastive Unsupervised Representations for Reinforcement Learning (CURL); (3) Reinforcement Learning with Augmented Data (RAD). The first two articles explore a form of unsupervised learning called contrastive learning, a technique better suited for raw inputs such as images compared to generative pre-training that is popular for language. The first article presents results for label-efficient image recognition. The second article presents the benefits of contrastive learning for sample-efficient reinforcement learning from pixels. Contrastive learning in practice is heavily dependent on data augmentations, and the third article presents a detailed investigation and discussion of its role.

As for the second axis, the thesis presents a thorough empirical investigation of the benefits of self-attention and Transformer-like architectures for computer vision through the article: Bottleneck Transformers for Visual Recognition. Self-attention has revolutionized language processing but computer vision presents a challenge to vanilla Transformers through high resolution inputs that challenge the quadratic memory and computational complexity of the primitive. The article presents the empirical effectiveness of a straightforward hybrid composed of convolutions and self-attention and unifies the ResNet and Transformer based architecture design for computer vision.

Details

Title

Representation Learning for Perception and Control

Creator

Lakshminarayanan, Aravind Srinivas, Author

Published

2021-08-13

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Type

Text

Format

technical reports

Extent

114 p

Language

eng

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket