The widespread emergence of parallel computers in the last decade has created a substantial programming challenge for application developers who wish to attain peak performance for their applications. Parallel programming requires significant expertise, and programming tools---general-purpose languages, compilers, libraries, etc.---have had limited success in hiding the complexity of parallel architectures. Furthermore, the parallel programming burden is likely to increase as processor core counts grow and memory hierarchies become deeper and more complex.

The challenge of delivering productive high-performance computing is especially relevant to computational imaging. One technique in particular, iterative image reconstruction, has emerged as a prominent technique in medical and scientific imaging because it offers enticing application benefits. However, it often demands high-performance implementations that can meet tight application deadlines, and the ongoing development of the iterative reconstruction techniques discourages ad-hoc performance optimization efforts.

This work explores productive techniques for implementing fast image reconstruction codes. We present a domain-specific programming language that is expressive enough to represent a variety of important reconstruction problems, but restrictive enough that its programs can be analyzed and transformed to attain good performance on modern multi-core, many-core and GPU platforms. We present case studies from magnetic resonance imaging (MRI), ptychography, magnetic particle imaging, and microscopy that achieve up to 90% of peak performance. We extend our work to the distributed-memory setting for an MRI reconstruction task. There, our approach gets perfect strong scaling for reasonable machine sizes, and sets the best-known reconstruction time for our particular reconstruction task. The results indicate that a domain-specific language can be successful in hiding much of the complexity of implementing fast reconstruction codes.




Download Full History