Automating the Debugging of Datacenter Applications with ADDA

Candea, George; Stoica, Ion; Gautam, Altekar; Zamfir, Cristian; EECS Department, University of California

PDF

Description

Debugging data-intensive distributed applications running in a datacenter ("datacenter applications") is complex and time-consuming. Developers wish they had a way to deterministically replay failed executions with little human effort, but unfortunately no such tool exists today. We see two challenges in replay-based debugging: First, the clusters used to run datacenter applications consist of many nodes, so the nondeterminism resulting from multithreaded execution on a single node is compounded by the size of the cluster. Second, datacenter applications produce terabytes of intermediate data shipped from one node to the next-the total data volume, itself proportional to cluster size, makes full input recording for potential subsequent replay infeasible.

We present ADDA, a replay-debugging system for datacenter applications. We observe that these applications often consist of a separate "control plane" and "data plane," and that the applications' initial inputs are typically persisted in append-only storage for reasons unrelated to debugging. Building upon these observations, ADDA leverages the control / data plane separation to make recording of debug-critical data scalable even in large clusters, it deterministically re-synthesizes intermediate data based on the (already available) initial inputs, and performs reduced-scale replay, i.e., recreates failed executions on just a subset of the original cluster.

We show that ADDA scales well and deterministically replays real-world failures in Hypertable and Memcached. We also argue that ADDA,s techniques generalize to a broader set of datacenter applications.

Details

Title

Automating the Debugging of Datacenter Applications with ADDA

Creator

Candea, George, Author
Stoica, Ion, Author
Gautam, Altekar, Author
Zamfir, Cristian, Author
EECS Department, University of California, Publisher

Published

2011-04-04

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2011-22

Type

Text

Format

technical reports

Extent

17 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket