In a shared-memory multiprocessor, a page table entry (PTE) may be replicated in multiple translation lookaside buffers (TLBs), causing an inconsistency problem when the PTE is updated. More generally, this problem exists among virtually-tagged caches, which keep PTE information, such as protection bits, in every cache line. Operating systems and applications that exploit virtual memory remapping must consider the overhead of synchronizing TLBs.

We explore a spectrum of software TLB synchronization algorithms for various consistency semantics and TLB characteristics. We analyze and simulate the performance of the three most general ones: 2-phase, optimistic-synchronous, and optimistic-asynchronous. The queueing models for these algorithms do not have product-form solutions because of the interaction among processors (for example, the 2-phase algorithm enforces locking by stalling processors). Instead, we obtain approximations using a computationally efficient iterative analysis method, the accuracy of which is verified by simulation results.

The performance results show that software TLB synchronization algorithms do not scale well with (1) the number of processors, (2) the rate of PTE updates, or (3) the overhead of flushing a TLB entry. Hence TLB synchronization should be avoided in some future architectures (e.g., scalable cache-coherent shared-memory multiprocessors) and under some workloads (e.g., moving high-bandwidth multimedia data to a user address space by virtual memory remapping). To this end, we describe mechanisms for tolerating TLB inconsistency, and classify them according to three fundamental types of tolerable inconsistency: safe, transient and trusted inconsistency. We also discuss how to fit these mechanisms into the software architecture of the virtual memory system.




Download Full History