Understanding Memory Configurations for In-Memory Analytics

Reiss, Charles

PDF

Description

A proliferation of frameworks have emerged to handle the challenges of making distributed computations reliable and scalable. These enable users to easily perform analysis of large datasets on commodity clusters. As users have demanded better response times for these computations, newer versions of these frameworks have focused on efficiently keeping computation in memory. A major challenge in deploying such frameworks is understanding application memory requirements. Just as the layers of abstraction of frameworks assist in writing efficient and robust applications, they hide the true memory requirements.

In this dissertation, I describe and evaluate SLAMR, a tool I have developed for providing users with memory recommendations for programs written for the Apache Spark analytics stack. These recommendations practically address the lack of visibility users have into memory requirements even as they are asked to assign resources to deploy their analytics programs. My tool records activity from an example execution, both from the framework and the garbage collector of its underlying language runtime. Given this instrumentation, it estimates a memory configuration that effectively keeps the entire computation in memory, such that allocating more memory would have minimal benefit on performance. Because in-memory analytics frameworks are built to take advantage of the memory available to them, simply observing actual memory usage is not an effective way to produce such estimates. A challenge, then, is to produce these estimates without performing effort similar to trying configurations around the ultimate recommendation.

SLAMR provides these recommendations without requiring many example executions or detailed analysis of the semantics of the program. What I instrument allows it to predict the effect of different memory configurations rather than simply reflecting the available memory. It collects information corresponding to the abstractions provided by frameworks, so it can distinguish which memory usage is useful and account for when alternate storage was used. To understand the requirements of the underlying language runtime in this analytics stack, it also collects statistics about the program execution that can replayed into a dramatically simplified model of a garbage collector. Both of these models are built around the goal of providing a conservative bound, allowing users to use the resulting memory recommendations in confidence rather than inflate them to avoid the risk of memory exhaustion from errors in the estimates. I show through evaluation that SLAMR provides effective, consistently safe recommendations for a variety of analytics programs and does so with minimal measurement overhead.

Details

Title

Understanding Memory Configurations for In-Memory Analytics

Creator

Reiss, Charles, Author

Published

2016-08-03

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2016-136

Type

Text

Format

technical reports

Extent

198 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket