Description
The most important contribution of this dissertation is its identification and exploitation of work distribution locality properties. Previous work on irregular parallel program scheduling unearthed the following dilemma: compilers can not predict work distribution accurately enough to schedule programs efficiently; however, runtime load balancing solutions, while more accurate, incur prohibitive overhead. This dissertation shows how to avoid this dilemma whenever irregular loops within parallel programs have work distribution locality, that is, when a loop retains a similar distribution of individual iteration execution times from one execution instance to the next. An execution instance is simply an execution of the entire loop, possibly in parallel.
Where this common case arises, we exploit it through work distribution caching: guessing the work distribution of a loop execution instance based on earlier measurements. We also exploit work distribution locality through deferred load balancing: reducing the communication overhead and thrashing potential of load balancing algorithms by applying them across multiple execution instances of a loop.
We evaluated these scheduling techniques using a set of application programs, including climate modeling, circuit simulation, and x-ray tomography, that contain irregular parallel operations. The results demonstrate that, for these applications, the techniques described in this dissertation achieve near-optimal efficiency on large numbers of processors. In addition, they perform significantly better, on these problems, than any previously proposed static or dynamic scheduling method.