PDF

Description

We've built a tool for debugging non-deterministic failures in production datacenter applications. Our system, called DCR, is the first to efficiently record and replay large scale, distributed, and data-intensive systems such as HDFS/GFS, HBase/Bigtable, and Hadoop/MapReduce. The enabling idea behind DCR is that debugging doesn't require a precise replica of the original datacenter run. Instead, it suffices to produce some run that exhibits the original control-plane behavior. This report details the design and implementation of DCR and provides preliminary results.

Details

Files

Statistics

from
to
Export
Download Full History