Debugging data-intensive distributed applications running in a datacenter ("datacenter applications") is complex and time-consuming. Developers wish they had a way to deterministically replay failed executions with little human effort, but unfortunately no such tool exists today. We see two challenges in replay-based debugging: First, the clusters used to run datacenter applications consist of many nodes, so the nondeterminism resulting from multithreaded execution on a single node is compounded by the size of the cluster. Second, datacenter applications produce terabytes of intermediate data shipped from one node to the next-the total data volume, itself proportional to cluster size, makes full input recording for potential subsequent replay infeasible.
We present ADDA, a replay-debugging system for datacenter applications. We observe that these applications often consist of a separate "control plane" and "data plane," and that the applications' initial inputs are typically persisted in append-only storage for reasons unrelated to debugging. Building upon these observations, ADDA leverages the control / data plane separation to make recording of debug-critical data scalable even in large clusters, it deterministically re-synthesizes intermediate data based on the (already available) initial inputs, and performs reduced-scale replay, i.e., recreates failed executions on just a subset of the original cluster.
We show that ADDA scales well and deterministically replays real-world failures in Hypertable and Memcached. We also argue that ADDA,s techniques generalize to a broader set of datacenter applications.
Title
Automating the Debugging of Datacenter Applications with ADDA
Published
2011-04-04
Full Collection Name
Electrical Engineering & Computer Sciences Technical Reports
Other Identifiers
EECS-2011-22
Type
Text
Extent
17 p
Archive
The Engineering Library
Usage Statement
Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).