PDF

Description

System designers in industry are often overwhelmed by large scale data, while researchers in academic often confront a lack of publicly available production data. In this paper, we analyze a large scale production workload trace recently made publicly available by Google. We offer a statistical profile of the data, with several interesting discoveries regarding job arrival patterns, CPU and memory consumptions, task durations, and others. We further perform k-means clustering to identify common groups of jobs, with several methodological departures and different findings compared with prior work on similar data. We also do correlation analysis between job semantics and job behavior, leading to helpful perspectives on capacity planning and system tuning. Our key finding is that while the limited dataset size prevents us from generalizing the trace behaviors observed, the analytical methods we describe nonetheless allow us to extract many system design insights.

Details

Files

Statistics

from
to
Export
Download Full History