Low Overhead Materialization with Flor

Liu, Eric

PDF

Description

Research in AI is dominated by model experimentation, and training a single model can be extremely expensive. However, there is no efficient way of recovering information if something goes wrong or the model behaves differently from expected. Flor is a system that provides a record-replay approach to ML training, allowing developers the flexibility of retrieving the data they want after execution. It takes intermittent checkpoints during model training to help speed up and parallelize replay. However, achieving a record-replay system requires expensive checkpoints to be serialized during program execution, causing high overheads and making this solution less palatable. Flor is able to achieve a low overhead, fast materialization process by using its multiprocessing materializer, a solution that takes advantage of parallelism to offload the burden of serialization to other processes. The multiprocessing materializer uses forking to both spawn processes and provide one-way interprocess communication (IPC), allowing Flor to quickly share and serialize expensive checkpoints. We also show that our multiprocessing materializer performs better than other popular methods of multiprocessing and IPC. Notably, this method achieves checkpointing at only an additional 1.74% runtime cost.

Details

Title

Low Overhead Materialization with Flor

Creator

Liu, Eric, Author

Published

2020-05-28

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Type

Text

Format

technical reports

Extent

41 p

Language

eng

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket