Learning Grounded Pragmatic Communication

Fried, Daniel

PDF

Description

This dissertation shows how language generation and interpretation across varied grounded domains can be improved through pragmatic inference: explicitly reasoning about the actions and intents of the people that the systems interact with. We train neural generation (speaker) and interpretation (listener) models which ground language into a world context, then layer a pragmatic inference procedure on top of these models. This pragmatic procedure predicts how human listeners will interpret text generated by the models and reasons counterfactually about why human speakers produced the text they did.

We begin by showing that explicit pragmatic inference aids in correctly generating and following natural language for complex, sequential grounded instruction tasks. Evaluation of language generation and interpretation shows that pragmatic inference improves state-of-the-art listener models (at correctly interpreting human instructions) and speaker models (at producing instructions correctly interpreted by humans).

Next, we describe extensions of this approach to vision-and-language navigation. We combine visually-grounded listener and speaker models, using the speaker model to synthesize new instructions for data augmentation in addition to evaluating candidate action sequences in pragmatic inference. Both models are supported by a panoramic action space that reflects the granularity of human-generated instructions. Experiments show that all three components of this approach---speaker-driven data augmentation, pragmatic inference and the panoramic action space---dramatically improve the performance of a baseline instruction follower, more than doubling the success rate over the best existing approach on a standard benchmark.

Finally, we present a grounded neural dialogue model that successfully collaborates with people in a partially-observable reference game. We focus on a setting where two agents each observe an overlapping part of a world context and need to identify and agree on some object they share. Therefore, the agents should pool their information and communicate pragmatically to solve the task. Our dialogue agent accurately grounds referents from the partner's utterances using a structured reference resolver, conditions on these referents using a recurrent memory, and uses a pragmatic generation procedure to ensure the partner can resolve the references the agent produces.

Details

Title

Learning Grounded Pragmatic Communication

Creator

Fried, Daniel, Author

Published

EECS Department, University of California at Berkeley, Berkeley, California, 2021-12-01

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Type

Text

Format

technical reports

Extent

83 p

Language

eng

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket