Description
We identify two main challenges with leveraging serverless computing for highly-distributed applications, such as Big Data Analytics and Machine Learning. The first challenge has to do with automatic management of resources through higher-level abstractions for serverless applications. The second has to do with the performance and scalability of distributed network communication on serverless platforms. In this thesis we present two systems that tackle both of these challenges. We solve the first one with a system, Cirrus, for automatic serverless ML end-to-end workflows. We solve the second one with Zip, a system that provides high-performance and scalable distributed primitives for inter-lambda serverless communication.
In this thesis we show that it is possible to provide simple APIs to developers with significantly better performance than today's approaches. For instance, Cirrus provides 2 orders of magnitude more updates per second in model training than when using PyWren, a MapReduce serverless framework, because it provides a high-level API backed by a highly optimized backend for ML tasks. Similarly, Zip provides 1.3-12x speedup for different communication patterns compared to the next best alternative, using a memory-backed store for inter-lambda communication.