FRPA: A Framework for Recursive Parallel Algorithms

Demmel, James; Fox, Armando; Eliahu, David; Spillinger, Omer; EECS Department, University of California

PDF

Description

Recursion continues to play an important role in high-performance computing. However, parallelizing recursive algorithms while achieving high performance is nontrivial and can result in complex, hard-to-maintain code. In particular, assigning processors to subproblems is complicated by recent observations that communication costs often dominate computation costs. Previous work demonstrates that carefully choosing which divide-and-conquer steps to execute in parallel (breadth-first steps) and which to execute sequentially (depth-first steps) can result in significant performance gains over naive scheduling. Our Framework for Recursive Parallel Algorithms (FRPA) allows for the separation of an algorithm's implementation from its parallelization. The programmer must simply define how to split a problem, solve the base case, and merge solved subproblems; FRPA handles parallelizing the code and tuning the recursive parallelization strategy, enabling algorithms to achieve high performance. To demonstrate FRPA's performance capabilities, we present a detailed analysis of two algorithms: Strassen-Winograd and Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication (CARMA). Our single-precision CARMA implementation is fewer than 80 lines of code and achieves a speedup of up to 11X over Intel's Math Kernel Library (MKL) matrix multiplication routine on ''skinny'' matrices. Our double-precision Strassen-Winograd implementation, at just 150 lines of code, is up to 45% faster than MKL for large square matrix multiplications. To show FRPA's generality and simplicity, we implement six additional algorithms: mergesort, quicksort, TRSM, SYRK, Cholesky decomposition, and Delaunay triangulation. FRPA is implemented in C++, runs in shared-memory environments, uses Intel's Cilk Plus for task-based parallelism, and leverages OpenTuner to tune the parallelization strategy.

Details

Title

FRPA: A Framework for Recursive Parallel Algorithms

Creator

Demmel, James, Author
Fox, Armando, Author
Eliahu, David, Author
Spillinger, Omer, Author
EECS Department, University of California, Publisher

Published

2015-05-01

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2015-28

Type

Text

Format

technical reports

Extent

20 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket