This thesis explores how to architect systems for performance clarity: the ability to understand where bottlenecks lie and the performance implications of various system changes. First, we focus on incrementally adding performance clarity to current data analytics frameworks. We develop blocked time analysis, a methodology for quantifying performance bottlenecks in parallelized systems, and use it to analyze the Spark framework’s performance on two SQL benchmarks and one production workload. Contrary to commonly-held beliefs about performance, we find that (i) CPU (and not I/O) is often the bottleneck, (ii) improving network performance can improve job completion time by at most 2%, and (iii) the causes of most stragglers can be identified.
Blocked time analysis helped to understand performance bottlenecks in today’s frameworks, but fell short of enabling users to reason about the impact of potential hardware and software configuration changes. Given the challenges to providing performance clarity in current architectures, the second part of this thesis focuses on a new system architecture built from the ground up for performance clarity. Rather than breaking jobs into tasks that pipeline many resources, as in today’s frameworks, we propose breaking jobs into units of work that each use a single resource, called monotasks. We demonstrate that explicitly separating the use of different resources simplifies reasoning about performance without sacrificing fast runtimes. Our implementation of monotasks provides job completion times within 9% of Apache Spark, and leads to a model for job completion time that predicts runtime under different hardware and software configurations with at most 28% error for most predictions.