This dissertation describes a methodology for compiling and executing irregular parallel programs. Such programs implement parallel operations whose size and work distribution depend on input data. Irregular operations pose a particularly difficult scheduling problem because the information necessary to execute these operations efficiently can not be known at the time the program is compiled. This dissertation describes a set of four run-time scheduling techniques that can execute many irregular parallel programs efficiently. A common thread among these techniques is that they gather information about the work distribution of a program during its execution and use this information to adjust the allocation of processing resources.
The most important contribution of this dissertation is its identification and exploitation of work distribution locality properties. Previous work on irregular parallel program scheduling unearthed the following dilemma: compilers can not predict work distribution accurately enough to schedule programs efficiently; however, runtime load balancing solutions, while more accurate, incur prohibitive overhead. This dissertation shows how to avoid this dilemma whenever irregular loops within parallel programs have work distribution locality, that is, when a loop retains a similar distribution of individual iteration execution times from one execution instance to the next. An execution instance is simply an execution of the entire loop, possibly in parallel.
Where this common case arises, we exploit it through work distribution caching: guessing the work distribution of a loop execution instance based on earlier measurements. We also exploit work distribution locality through deferred load balancing: reducing the communication overhead and thrashing potential of load balancing algorithms by applying them across multiple execution instances of a loop.
We evaluated these scheduling techniques using a set of application programs, including climate modeling, circuit simulation, and x-ray tomography, that contain irregular parallel operations. The results demonstrate that, for these applications, the techniques described in this dissertation achieve near-optimal efficiency on large numbers of processors. In addition, they perform significantly better, on these problems, than any previously proposed static or dynamic scheduling method.
Title
Adaptive Parallel Programs
Published
1994-08-01
Full Collection Name
Electrical Engineering & Computer Sciences Technical Reports
Other Identifiers
CSD-95-864
Type
Text
Extent
85 p
Archive
The Engineering Library
Usage Statement
Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).