Unsupervised Analysis of Structured Human Artifacts

Berg-Kirkpatrick, Taylor

PDF

Description

The presence of hidden structure in human data—including natural language but also sources like music, historical documents, and other complex artifacts—makes this data extremely difficult to analyze. In this thesis, we develop unsupervised methods that can better cope with hidden structure across several domains of human data. We accomplish this by incorporating rich domain knowledge using two complementary approaches: (1) we develop detailed generative models that more faithfully describe how data originated and (2) we develop structured priors that create useful inductive bias. First, we find that a variety of transcription tasks—for example, both historical document transcription and polyphonic music transcription—can be viewed as linguistic decipherment problems. By building a detailed generative model of the relationship between the input (e.g. an image of a historical document) and its transcription (the text the document contains), we are able to learn these models in a completely unsupervised fashion—without ever seeing an example of an input annotated with its transcription—effectively deciphering the hidden correspondence. The resulting systems have turned out not only to work well for both tasks—achieving state-of-the-art-results—but to outperform their supervised counterparts. Next, for a range of linguistic analysis tasks—for example, both word alignment and grammar induction—we find that structured priors based on linguistically-motivated features can improve upon state-of-the-art generative models. Further, by coupling model parameters in a phylogeny-structured prior across multiple languages, we develop an approach to multilingual grammar induction that substantially outperforms independent learning.

Details

Title

Unsupervised Analysis of Structured Human Artifacts

Creator

Berg-Kirkpatrick, Taylor, Author

Published

2015-12-15

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2015-235

Type

Text

Format

technical reports

Extent

89 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket