In December 2006, we published a broad survey of the issues for the whole field concerning the multicore/manycore sea change (Asanovic, Bodik et al. 2006). We view the ultimate goal as the ability to create efficient and correct software productively that scales smoothly as the number of cores per chip doubles biennially. This much shorter report covers the specific research agenda that a large group of us at Berkeley is going to follow.

This report is based on a proposal for creating a Universal Parallel Computing Research Center (UPCRC) that a technical committee from Intel and Microsoft unanimously selected as the top proposal in a competition with the top 25 computer science departments. The five-year, $10M, UPCRC forms the foundation for the U.C. Berkeley Parallel Computing Laboratory, or Par Lab, a multidisciplinary research project exploring the future of parallel processing.

To take a fresh approach to the longstanding parallel computing problem, our research agenda will be driven by compelling applications developed by domain experts. Historically, past efforts to resolve these challenges have often been driven "bottom-up" from the hardware, with applications an afterthought. We will focus on exciting new applications that need much more computing horsepower to run well, rather than on legacy programs that already run well on today's computers. Our applications are in the areas of personal health, image retrieval, music, speech understanding, and web browsers.

The development of parallel software is the heart of our research agenda. The task will be divided into two layers: an efficiency layer that aims at low overhead for 10 percent of the best programmers, and a productivity layer for the rest of the programming community -- including domain experts -- that reuses the parallel software developed at the efficiency layer. Key to this approach is a layer of libraries and programming frameworks centered on the 13 computational bottlenecks ("motifs") that we identified in the original Berkeley View report (Asanovic, Bodik et al. 2006). We will also create a Composition and Coordination Language to make it easier to compose these components. Finally, we will rely on autotuning to map the software efficiently to a particular parallel computer. Past attempts have often relied on a single programming abstraction and language for all programmers and on automatically parallelizing compilers.

The role of the operating system and the architecture in this project is to support software and applications in achieving the ultimate goal, rather than the conventional approach of fixing the environment in which parallel software must survive. Example innovations include very thin hypervisors, which allow user-level control of processor scheduling, and hardware support for partitioning and fast barrier synchronization.

We will prototype the hardware of the future using field-programmable gate arrays (FPGAs), which we believe are fast enough to be interesting to parallel software researchers, yet flexible enough to "tape out" new designs every day, while being cheap enough that university researchers can afford to construct systems containing hundreds of processors. This prototyping infrastructure is called RAMP (Research Accelerator for Multiple Processors), and is being developed by a consortium of universities and companies.




Download Full History