Optimal Search Algorithms for Structured Problems in Natural Language Processing

Pauls, Adam David; EECS Department, University of California

PDF

Description

Many tasks in Natural Language Processing (NLP) can be formulated as the assignment of a label to an input. Often, the set of possible labels can be unmanageably large and even enumerating the set would be intractable. For example, the set of possible labels for machine translation might be the set of all sentences in the output language, which can in principle be infinite. Fortunately, these large label sets typically exhibit some kind of structure – that is, they are composed of smaller parts. Because they are composed of parts, it is possible to construct labels piece by piece. The problem of predicting such structured labels is referred to as structured prediction, and the process of incrementally constructing the best structured label(s) is called search. In this thesis, we consider problems for which it is feasible to perform exact or optimal search – that is, search that is guaranteed to find the best possible label according to some model. Many such problems can be formulated as a search for the best path in a weighted directed hypergraph. For example, monolingual parsing, bitext parsing, and syntactic machine translation can all be formulated in this way. We will discuss both known and novel algorithms that can find the best path without considering all hyperedges in the hypergraph, and hence can speed up search without sacrificing search quality. We will provide simplified proofs of correctness for these algorithms. We also propose two novel algorithms that permit extraction of the k-best paths instead of the single best. We compare these approaches both against exhaustive search, and against approximate search techniques which speed up search by sacrificing optimality guarantees.

Details

Title

Optimal Search Algorithms for Structured Problems in Natural Language Processing

Creator

Pauls, Adam David, Author
EECS Department, University of California, Publisher

Published

2012-12-13

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2012-239

Type

Text

Format

technical reports

Extent

83 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket