Video has become a primary medium for producing and consuming entertainment, educational content, and documented events. As the cost of recording, sharing, and storing video has decreased, the prevalence of video has increased. Yet, video remains challenging to use as an informative medium because it is difficult to search, browse and skim the underlying content.

Using a timeline-based video player, users must scrub back-and-forth through a video to gain an overview or locate content of interest. Some video viewing interfaces allow users to search and browse videos by text transcripts or caption files. However, transcribed and captioned speeches or conversations often contain disfluencies and redundancies typical of speech which make the transcripts time-consuming to read and difficult to skim. Further, transcripts and captions lack structure -- transcripts consist of long blocks of text, while captions have a series of short phrases. Without structured organization, it can be difficult for viewers to browse topics or get a high-level overview of the content.

This thesis explores new ways to search, browse and skim videos through structured text. We aim to create navigable representations of videos that let users ask questions of the video as efficiently and flexibly as possible, and facilitate low cost production. This thesis introduces systems that embody these goals, using structured-text documents aligned to video to enable efficient and flexible video searching, browsing and skimming across three domains: (1) informational lecture videos, (2) films, and (3) casually recorded feedback.




Download Full History