Description
Serverless computing has become increasingly popular in the last few years because it simplifies the developer's experience of constructing and deploying applications. Simultaneously, it enables cloud providers to pack multiple users' workloads into shared physical resources at a fine granularity, achieving higher resource efficiency. However, existing serverless Function-as-a-Service (FaaS) systems have significant shortcomings around state management—notably, high-latency IO, disabled point-to-point communication, and high function invocation overheads.
In this dissertation, we present a line of work in which we redesign serverless infrastructure to natively support efficient, consistent, and fault-tolerant state management. We first explore the architecture of a stateful FaaS system we designed called Cloudburst, which overcomes many of the limitations of commercial FaaS systems. We then turn to consistency and fault-tolerance, describing how we provide read atomic transactions in the context of FaaS applications. Finally, we describe the design and implementation of a serverless dataflow API and optimization framework specifically designed to support machine learning prediction serving workloads.