Almost all published and/or implemented solutions to the cache consistency problem have relied solely on hardware, and suffer from cost and performance disadvantages, especially for large numbers of processors. Some of the hardware solutions cannot be made to work at all for large numbers of processors. The generally unknown software solution for cache consistency requires write-through on all references and cache purging when a shared data area is released; this solution has a performance problem due to the memory bandwidth requirement of write-through.
In this paper, we propose a new software controlled cache consistency mechanism which doesn't require a shared bus and needs only limited hardware support. Shared writeable data is treated as write through in the cache; otherwise the cache is (optionally) copy-back, to minimize memory traffic. Write through ensures that the main memory copy of write-shared regions is always up to date, so that when a processor reads a line from main memory, it always gets the current value. A "one-time identifier" is associated with the TLB entry for each (shared) page and with the address tag for each line of that page that is cache resident; one time identifiers function as unique capabilities. Stale shared cache contents are made inaccessible by changing the one time identifier in the TLB entry for a page, so that the address tag on the cache line no longer matches; this avoids the need to purge the cache whenever write shared regions of memory are passed between processors. Limiting write through to data items to be read by other processors minimizes memory traffic and cache purges are required only when the supply of unique identifiers is exhausted. Our discussion in this paper also includes possible optimizations for this basic idea.
The advantages to the cache consistency mechanism proposed here include the fact that no shared bus is needed, so that memory interconnection schemes permitting much higher bandwidth are possible. This, then, permits high performance multiprocessors with shared memory to be built with many more processors than shared bus schemes allow.