Unsupervised Models of Entity Reference Resolution

Haghighi, Aria Delier; EECS Department, University of California

PDF

Description

A central aspect of natural language understanding consists of linking information across multiple sentences and even combining multiple sources (for example: articles, conversations, blogs and tweets). Understanding this global information structure requires identifying the people, objects, and events as they evolve over a discourse. While natural language pro- cessing (NLP) has made great progress on sentence-level tasks such as parsing and machine translation, far less progress has been made on the processing and understanding of large units of language such as a document or a conversation. The initial step in understanding discourse structure is to recognize the entities (people, artifacts, locations, and organizations) being discussed and track their references through- out. Entities are referred to in many ways: with proper names (Barack Obama), nominal descriptions (the president), and pronouns (he or him). Entity reference resolution is the task of deciding to which entity a textual mention refers. Entity reference resolution is influenced by a variety of constraints, including syntactic, discourse, and semantic constraints. Even some of the earliest work (Hobbs, 1977, 1979), has recognized that while syntactic and discourse constraints can be declaratively specified, semantic constraints are more elusive. While past work has successfully learned many of the syntactic and discourse cues, there has yet to be an entity reference resolution system that exploits semantic cues and operationalizes these observations into a coherent model. This dissertation presents unified statistical models for entity reference resolution that can be learned in an unsupervised way (without labeled data) and models soft semantic constraints probabilistically along with hard grammatical constraints. While the linguistic insights which underlie this model have been observed in some of the earliest anaphora resolution literature (Hobbs, 1977, 1979), the machine learning techniques which allow these cues to be used collectively and effectively are relatively recent (Blei et al., 2003; Teh et al., 2006; Blei and Frazier, 2009). In particular, our models use recent insights into Bayesian non- parametric modeling (Teh et al., 2006) to effectively learn entity partition structure when the number of entities is not known ahead of time. The primary contribution of this dissertation is combining the linguistic observations of past researchers with modern structured machine learning techniques. The models presented herein yield state-of-the-art reference resolution results against other systems, supervised or unsupervised.

Details

Title

Unsupervised Models of Entity Reference Resolution

Creator

Haghighi, Aria Delier, Author
EECS Department, University of California, Publisher

Published

2010-09-02

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2010-121

Type

Text

Format

technical reports

Extent

114 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket