Preview
Description
We describe techniques that allow us to apply shape analysis to data structures that occur commonly in systems code. These data structures often use arrays, hash tables, C strings, and buffers of a known size. Sometimes, memory in these data structures is managed by manual reference counting. Analyzing such code is difficult or impossible with existing shape analyses. Most difficult of all, many data structures use several of these patterns at the same time, such as a hash table pointing to reference counted objects through which a doubly linked list threads.
We describe an analysis capable of handling these data structures easily and efficiently. Our technique uses abstract interpretation over the combination of two abstract domains. One, based on three-valued logic, is used for analyzing the heap. The other domain reasons about integers and set cardinality. The key feature of the combined domain is that quantified facts can be shared between the integer and heap domains. The precision we achieve is significantly greater than if either domain were used independently.
Besides improvements in precision, we also describe changes that make both domains more scalable and efficient. We present the results of experiments analyzing the cache data structure of the thttpd web server, which uses a hash table, linked lists, and reference counting in a single data structure. We successfully prove the absence of memory errors in about two minutes.