Graph processing is experiencing a surge of renewed interest as applications in social networks and their analysis have grown in importance. Additionally, graph algorithms have found new applications in speech recognition and the sciences. In order to deliver the full potential of these emerging applications, graph processing must become substantially more efficient, as graph processing's communication-intensive nature often results in low arithmetic intensity that underutilizes available hardware platforms.

To improve graph algorithm performance, this dissertation characterizes graph processing workloads on shared memory multiprocessors in order to understand graph algorithm performance. By querying performance counters to measure utilizations on real hardware, we find that contrary to prevailing wisdom, caches provide great benefit for graph processing and the systems are rarely memory bandwidth bound. Leveraging the insights of our workload characterization, we introduce the Graph Algorithm Iron Law (GAIL), a simple performance model that allows for reasoning about tradeoffs across layers by considering algorithmic efficiency, cache locality, and memory bandwidth utilization. We also provide the Graph Algorithm Platform (GAP) Benchmark Suite to help the community improve graph processing evaluations through standardization.

In addition to understanding graph algorithm performance, we make contributions to improve graph algorithm performance. We present our direction-optimizing breadth-first search algorithm that is advantageous for low-diameter graphs, which are becoming increasingly relevant as social network analysis becomes more prevalent. Finally, we introduce propagation blocking, a technique to reduce memory communication on cache-based systems by blocking graph computations in order to improve spatial locality.




Download Full History