High-performance, parallel programs want uninterrupted access to physical resources. This characterization is true not only for traditional scientific computing, but also for high-priority data center applications that run on parallel processors. These applications require high, predictable performance and low latency, and they are important enough to warrant engineering effort at all levels of the software stack. Given the recent resurgence of interest in parallel computing as well as the increasing importance of data center applications, what changes can we make to operating system abstractions to support parallel programs?

Akaros is a research operating system designed for single-node, large-scale SMP and many-core architectures. The primary feature of Akaros is a new process abstraction called the "Many-Core Process" (MCP) that embodies transparency, application control of physical resources, and performance isolation. The MCP is built on the idea of separating cores from threads: the operating system grants spatially partitioned cores to the MCP, and the application schedules its threads on those cores. Data centers typically have a mix of high-priority applications and background batch jobs, where the demands of the high-priority application can change over time. For this reason, an important part of Akaros is the provisioning, allocation, and preemption of resources, and the MCP must be able to handle having a resource revoked at any moment.

In this work, I describe the MCP abstraction and the salient details of Akaros. I discuss how the kernel and user-level libraries work together to give an application control over its physical resources and to adapt to the revocation of cores at any time - even when the code is holding locks. I show an order of magnitude less interference for the MCP compared to Linux, more resilience to the loss of cores for an HPC application, and how a customized user-level scheduler can increase the performance of a simple webserver.




Download Full History