As more computer systems turn to multiprocessing for improved performance, additional research is needed to evaluate and improve the performance of cache consistency protocols. In this study, we use trace-driven simulation to examine the performance of several consistency protocols, including some new adaptive protocols which have not been examined in prior research. This study uses a wider variety of traces than have been previously analyzed, including some production applications from a vector mini-supercomputer system, and presents a wider variety of analyses than have been previously presented for a given workload.

We find that the sharing characteristics of application programs have a large bearing on the relative performance of the different protocols. Update-based protocols outperform invalidate-based protocols when accesses to shared data are highly interleaved among different processors (fine-grain sharing), while invalidate-based protocols are superior if one processor performs all accesses to shared data over long periods of time (coarse-grain sharing). Adaptive protocols provide the best overall performance across all applications; we present a new protocol called Update-Once, which yields the highest average performance. In even the best cases, however, estimated processor utilizations are unacceptably low due to the overhead to maintain consistent caches. To extract good performance from multiprocessor systems, existing application programs must be recoded to reduce sharing between processors.




Download Full History