Towards a Distributed OS for Data-Intensive Cloud Applications

Wang, Stephanie

PDF

Description

Commodity hardware is reaching fundamental limits, while the demands of data-intensive applications continue to grow. Thus, we now rely on horizontal scale-out and hardware accelerators to improve application performance and scale, while developing a myriad of distributed execution frameworks that are specialized to specific application domains, from data analytics to machine learning. While this reduces burden for certain applications, it also creates three problems: (1) duplicated system implementation effort, (2) reduced framework evolvability, and (3) difficulty interoperating efficiently between applications, especially when large data is involved.

This thesis describes the first steps towards a distributed ”operating system”' that can provide essential services to such data-intensive applications. This would allow currently monolithic frameworks to be built as libraries instead, making them easier to build, evolve, and compose. Towards this vision, we propose an intermediate and interoperable execution layer that handles common problems in distributed execution and memory management.

We first propose distributed futures, a general-purpose programming interface that extends the RPC abstraction with pass-by-reference semantics and a shared address space. Distributed futures act as a virtual memory-like abstraction but for the distributed setting, enabling distributed memory management to be factored out into a common system. Next, we present a design for this system that provides flexible fault tolerance with low overheads. We first present a fault-tolerant architecture for distributed futures that provides automatic memory management. We show how this system factors out system complexity from data-intensive applications without sacrificing performance, using MapReduce workloads as an example. Finally, we show how stronger recovery guarantees can be layered on top of this core architecture to provide greater recovery flexibility to end applications. Thus, we show how an end-to-end approach to fault tolerance can expand system generality.

Details

Title

Towards a Distributed OS for Data-Intensive Cloud Applications

Creator

Wang, Stephanie, Author

Published

EECS Department, University of California at Berkeley, Berkeley, California, 01/11/24

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2024-3

Type

Text

Format

technical reports

Extent

210 p

Language

eng

Archive

The Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket