Shared-memory multiprocessors can provide impressive performance at reasonable costs, although private caches are usually needed to alleviate the potential bottleneck at shared memory. These private caches in turn require the use of cache-consistency (coherency) protocols, whose performance is a strong function of the reference behavior within multiprocessor applications. In this paper we characterize the memory reference behavior in a wide variety of scalar and vector multiprocessor address traces from production workloads. This analysis is for the purpose of estimating and improving the performance of cache-consistency protocols. Our analysis extends previous results in the literature by performing a wider variety of analyses, and analyzing a larger and more diverse set of multiprocessor traces, including a production vector workload.

We find wide differences between the sharing behavior observed in vector and scalar applications. Compared to scalar programs, vector programs reference shared data more frequently and contain larger amounts of processor locality, the tendency for shared data to be used by only one processor over periods of time. Write sharing by different processors over short intervals are infrequent in one workload but frequent in another. This implies that sequentially-consistent programming models will remain necessary unless applications are recoded to avoid such reference patterns.




Download Full History