Description
Large-scale parallel machines are programmed mainly with the single program, multiple data (SPMD) model of parallelism. This model has advantages of scalability and simplicity, combining independent threads of execution with global collective communication and synchronization operations. However, the model does not fit well with divide-and-conquer parallelism or hierarchical machines that mix shared and distributed memory. In this paper, we define a hierarchical team mechanism that retains the performance and analysis advantages of SPMD parallelism while supporting hierarchical algorithms and machines. We demonstrate how to ensure alignment of collective operations on teams, eliminating a class of deadlocks. We present application case studies showing that the team mechanism is both elegant and powerful, enabling users to exploit the hardware features at different levels of a hierarchical machine and resulting in significant performance gains.