Modern operating systems are notoriously complex and hard to make dependable. Due to performance, flexibility and historical reasons, most of them are written in relatively low level languages like C and C++. These languages lack safe type systems and provide few guarantees of safety and reliability. We propose to improve the dependability of these systems with more sophisticated program analysis using tools that understands the tricky issues that human beings often mistake about, for example, locking and memory safety. The main challenge we see here is the complexity and scale of the systems, which makes fully automated verification hard. Therefore we propose that these analyses require some help from the developers, in the form of non-intrusive annotations in the code. Moreover, some analyses can be made hybrid, consists of both static and dynamic (runtime) checks, thus making the analysis much simpler without losing precision. We argue that we should still strive for soundness despite the scale of the problems, as only sound analyses can provide us with guarantees that the system is in absence of certain categories of bugs.

We present two case studies of applying these techniques to the Linux kernel. Both of them are efficient enough to be used on the whole kernel itself, require at most a few percent of code changes, and find real bugs in the kernel.

First, we present a static analysis tool that analyzes the interaction between processor execution contexts and locks in the Linux kernel, and tries to finds bugs related to these aspects. Execution contexts are certain state of the processor, which can decide at this particular time what operations the kernel is allowed to do. For example, the kernel is not allowed to do a task switch while serving an interrupt (in interrupt context). We analyze the process context and hardware interrupt context, and the usages of spin lock primitives in these contexts. The analysis is a flow-sensitive, inter-procedural and context-insensitive analysis. The current prototype analyzes a minimally-configured Linux 2.6.20 kernel (over 850K lines of C code) in less than 2 minutes. It found 6 confirmed bugs in the kernel.

Then, we present a system for detecting and recovering from type safety violations in Linux device drivers. It uses a novel type system, partly specified with annotations, that provides fine-grained isolation for existing Linux device drivers. In addition, we track invariants using simple wrappers for the host system API and restore them when recovering from a violation. This approach achieves fine-grained memory error detection and recovery with few code changes and at a significantly lower performance cost than existing solutions based on hardware-enforced domains, such as Nooks [Swift et al., 2003], L4 [LeVasseur et al., 2004], and Xen [Fraser et al., 2004], or software-enforced domains, such as SFI [Wahbe et al., 1993].

The principles of the analyses and tools presented are general. These aspects such as execution contexts, or device driver interfaces are similar among different operating systems. So we believe these techniques can be adapted to other operating systems as well.




Download Full History