Perceiving People over Long Periods: Algorithms, Architectures &amp; Datasets

Mangalam, Karttikeya

PDF

Description

Long-form video understanding remains as one of the last enduring open problems in computer vision. While the natural world offers long periods of visual stimuli, most computer vision systems still operate within a limited temporal scope, typically just a few seconds in both input and output. This thesis presents my work developing the neural machinery, i.e., the algorithms, architectures and datasets, that extend the temporal capacity of video understanding systems to minutes and beyond. I start by presenting my work on algorithms for long-term multimodal human motion forecasting, termed PECNet and Y-net. Next, I introduce my contributions on neural architectures for hierarchical, temporally scalable and memory-efficient neural architectures for understanding long-form videos in form of MViT and Rev-ViT. Finally, I close by presenting my work on EgoSchema, the first certifiably long-form video-language dataset, which serves as a benchmark for evaluating the long-form understanding capabilities of multimodal models. The presented benchmark results on EgoSchema highlight the existing performance gap between current state-of-the-art models and human-level long-form video understanding. I believe that my presented advancements in algorithms, architectures, and datasets not only address several existing limitations but also open new avenues for future research and application.

Details

Title

Perceiving People over Long Periods: Algorithms, Architectures & Datasets

Creator

Mangalam, Karttikeya, Author

Published

EECS Department, University of California at Berkeley, Berkeley, California, 12/15/23

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2023-282

Type

Text

Format

technical reports

Extent

139 p

Language

eng

Archive

The Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket