We observe that the challenges software optimizers and microarchitects face every day boil down to a single problem: bottleneck analysis. A bottleneck is any event or resource that contributes to execution time, such as a critical cache miss or window stall. Tasks such as tuning processors for energy efficiency and finding the right loads to prefetch all require measuring the performance costs of bottlenecks.

In the past, simple event counts were enough to find the important bottlenecks. Today, the parallelism of modern processors makes such analysis much more difficult, rendering traditional performance counters less useful. If two microarchitectural events (such as a fetch stall and a cache miss) occur in the same cycle, which event should we blame for the cycle? What cost should we assign to each event?

In this work, we propose a new way of thinking about performance that employs a global view of program execution to determine the criticality of program events. With the new-found understanding that a rigorous notion of criticality provides, we can correctly identify which events are bottlenecks, something that is not generally possible with event counts alone.

Our work goes even further, however. We can also quantify how much a particular bottleneck costs to execution time, how much slack a non-bottleneck event has, as well as how the cost and slack of multiple events interact with each other. Along with efficient hardware implementations of these analyses, we show how our advanced performance analysis can assist in the design and optimization of microprocessors and parallel systems.





Download Full History