Sparse Gaussian Elimination on High Performance Computers

Computer Science Division; Li, Xiaoye S.

PDF

Description

This dissertation presents new techniques for solving large sparse unsymmetric linear systems on high performance computers, using Gaussian elimination with partial pivoting. The efficiencies of the new algorithms are demonstrated for matrices from various fields and for a variety of high performance machines.

In the first part we discuss optimizations of a sequential algorithm to exploit the memory hierarchies that exist in most RISC-based superscalar computers. We begin with the left-looking supernode-column algorithm by Eisenstat, Gilbert and Liu, which includes Eisenstat and Liu's symmetric structural reduction for fast symbolic factorization. Our key contribution is to develop both numeric and symbolic schemes to perform supernode-panel updates to achieve better data reuse in cache and floating-point registers. A further refinement, a two-dimensional matrix partitioning scheme, enhances performance for large matrices or machines with small caches. We conduct extensive performance evaluations on several recent superscalar architectures, such as the IBM RS/6000-590, MIPS R8000 and DEC Alpha 21164, and show that our new algorithm is much faster than its predecessors. The advantage is particularly evident for large problems. In addition, we develop a detailed model to systematically choose a set of blocking parameters in the algorithm.

The second part focuses on the design, implementation and performance analysis of a shared memory parallel algorithm based on our new serial algorithm. We parallelize the computation along the column dimension of the matrix, assigning one block of columns (a panel) to a processor. The parallel algorithm retains the serial algorithm's ability to reuse cached data. We develop a dynamic scheduling mechanism to schedule tasks onto available processors. One merit of this approach is the ability to balance work load automatically. The algorithm attempts to schedule independent tasks to different processors. When this is not possible in the later stage of factorization, a pipeline approach is used to coordinate dependent computations. We demonstrate that the new parallel algorithm is very efficient on shared memory machines with modest numbers of processors, such as the SGI Power Challenge, DEC AlphaServer 8400, and Cray C90/J90. We also develop performance models to study available concurrency and identify performance bottlenecks.

Details

Title

Sparse Gaussian Elimination on High Performance Computers

Creator

Computer Science Division, Publisher
Li, Xiaoye S., Author

Published

1996-09-01

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

CSD-96-919

Type

Text

Format

technical reports

Extent

131 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket