The main contributions of MAGNIFY are that it provides new and complex parallelization strategies that would be prohibitively difficult to implement by hand, and that it allows the programmer to determine their use. MAGNIFY, selectively guided by the programmer in an interactive dialog, applies a novel set of code transformations to the application. The transformations reveal opportunities for concurrency beyond those available through traditional loop-based optimizations. Normal parallelizing compilers focus on individual parallel computations, usually expressed in loops, and introduce synchronization operations after each one. MAGNIFY is able to manage the interactions among parallel computations to achieve more efficient performance.
MAGNIFY summarizes the data access behavior of sub-computations (such as loop nests) using symbolic data descriptors. The descriptors contain extensive symbolic and conditional information, providing more accuracy than previously developed summary structures. Once the code is analyzed, MAGNIFY uses the descriptors to apply transformations that expose concurrency and pipelining opportunities. The key transformation is split, which reduces synchronization constraints by sub-dividing computations. MAGNIFY also applies traditional loop transformations like interchange and loop-invariant code motion.
After the programmer has used MAGNIFY to transform an application, the parallelization strategy is encoded in an intermediate form based on two notations: a coordination language called Delirium and an annotation language called Dossier. An adaptive run-time system executes the application, using run-time information to improve the scheduling efficiency. The run-time system incorporates algorithms that allocate processing resources to concurrently executing sub-computations and choose communication granularity.
MAGNIFY has been used to analyze and transform three production scientific applications. Performance measurements show that the resulting parallel implementations are far more efficient than traditional static decomposition strategies on large numbers of processors.