PDF

Description

Replay debugging systems enable the reproduction and debugging of non-deterministic failures in production application runs. However, no existing replay system is suitable for datacenter applications like Cassandra, Hadoop, and Hypertable. For these large scale, distributed, and data intensive programs, existing methods either incur excessive production overheads or don't scale to multi-node, terabyte-scale processing.

In this position paper, we hypothesize and empirically verify that control plane determinism is the key to record-efficient and high-fidelity replay of datacenter applications. The key idea behind control plane determinism is that debugging does not always require a precise replica of the original datacenter run. Instead, it often suffices to produce some run that exhibits the original behavior of the control plane --- the application code responsible for controlling and managing data flow through a datacenter system.

Details

Files

Statistics

from
to
Export
Download Full History