Low-latency, high-throughput systems for serving interactive queries are crucial to today's web services. Building such systems for today's web services is challenging due to the massive volumes of data they must cater to, and the requirement for supporting sophisticated queries (e.g., searches, filters, aggregations, regular expression matches, graph queries, etc.). Several recent approaches have highlighted the importance of in-memory storage for meeting the low-latency and high-throughput requirements, but these approaches are unable to sustain this performance when the data grows larger than DRAM capacity. Existing systems thus achieve these goals either by assuming large enough DRAM (too expensive) or by supporting only a limited set of queries (e.g., key-value stores).
In this dissertation, we explore algorithmic and data structure-driven solutions to these system design problems. We present Succinct, a distributed data store that addresses these challenges using a fundamentally new approach --- executing a wide range of queries (e.g., search, random access, range, wildcard) directly on a compressed representation of the input data --- thereby enabling efficient execution of queries on data sizes much larger than DRAM capacity. We then describe BlowFish, a system that builds on Succinct to enable a dynamic storage-performance tradeoff in data stores, providing applications the flexibility to modify the storage and performance fractionally, just enough to meet the desired goals. Finally, we explore approaches that enable even richer query semantics on compressed data, including graph queries using ZipG, a memory efficient graph store, and regular expression queries using Sprint, a query rewriting technique.
Title
Queries on Compressed Data
Published
2019-11-02
Full Collection Name
Electrical Engineering & Computer Sciences Technical Reports
Other Identifiers
EECS-2019-141
Type
Text
Extent
132 p
Archive
The Engineering Library
Usage Statement
Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).