PDF

Description

In this dissertation we study in-cache address translation, a new approach to implementing virtual memory. In-cache translation combines the functions of the traditional translation lookaside buffer with a virtual address cache. Rather than dedicating hardware resources specifically to hold pagetable entries, in-cache translation lets the pagetable share the regular cache with instructions and data. By combining the two mechanisms we simplify the memory system design, and potentially reduce the cycle time. In addition, by eliminating the translation lookaside buffer, we simplify the translation consistency problem in multiprocessors: the operating system can use the regular data cache coherency protocol to maintain a consistent view of the pagetable.

Trace-driven simulation shows that in-cache translation has better performance than many translation lookaside buffer designs. As cache memories grow larger, the advantage of in-cache translation will also increase. Other simulation results indicate that in-cache translation will also increase. Other simulation results indicate that in-cache translation performs well over a wide range of cache configurations.

To further understand in-cache translation, we implemented it as a central feature of SPUR, a multiprocessor workstation developed at the University of California at Berkeley. A five-processor prototype convincingly demonstrates the feasibility of this new mechanism. In addition, a set of event counters, imbedded in the custom VLSI cache controller chip, provides a means to measure cache performance. We use these counters to evaluate the performance of in-cache translation for 23 workloads. These measurements validate some of the key simulation results.

The measurements and simulations also show that the performance of in-cache translation is sensitive to pagetable placement. We propose a variation of the algorithm, called inverted in-cache translation, which reduces this sensitivity. We also examine alternative ways to support reference and dirty bits in a virtual address cache. An analytic model and measurements from the prototype show that the SPUR mechanism has the best performance, but that emulating dirty bits with protection is not much worse and does not require additional hardware. Finally, we show that the miss bit approximation to reference bits, where the bit is only checked on cache misses, performs better than true reference bits, which require a partial cache flush.

Details

Files

Statistics

from
to
Export
Download Full History