We present two novel data structures developed in the SCADS distributed storage toolkit for synchronizing replicated datasets with predictable performance: Nye's trie is a lightweight index for ordered key-value sets that supports synchronization with time and bandwidth utilization proportional to the number of diverging entries. While efficient, this process is only predictable if the number of divergent entries can be measured. For this, we introduce the floret estimator, a novel sublinear-space set summarization structure used to estimate the cardinalities of set difference, union, and intersection operations. We describe how these structures satisfy the design requirements of the SCADS system, detail their design and implementation, and present a set of microbenchmarks demonstrating their functionality.
Title
Nye's Trie and Floret Estimators: Techniques for Detecting and Repairing Divergence in the SCADS Distributed Storage Toolkit
Published
2010-03-18
Full Collection Name
Electrical Engineering & Computer Sciences Technical Reports
Other Identifiers
EECS-2010-30
Type
Text
Extent
20 p
Archive
The Engineering Library
Usage Statement
Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).