Cache memories have not been used for vector supercomputers, as far as we know, because of a belief that program behavior in relevant workloads was such as to preclude efficient cache operation. It has been possible to make efficient use of such machines by carefully programming around the resulting long memory delays, although unmodified, "dusty-deck" code usually performs poorly. In related research, we have found that hit ratios are high for large caches in processors with vector workloads. In this paper, we address the specific issue of the direct effect of cache memory on vector processor performance.

The issue in processor design is machine performance, of which the hit ratio of the cache is only one determinant. In this paper, we simulate three vector processors, the designs for which are derived from expected technology changes applied to the Ardent Titan. Our simulator is an accurate timing model incorporating the necessary aspects of the design of the cache and memory system. We find that current trends in memory and processor performance will lead to increasingly severe memory speed and bandwidth limitations. Either of two designs using large cache memories (2MB, 4MB) on the average double processor performance relative ot the design without a cache. Hit ratios for almost all of the programs used for trace driven simulation, drawn from real Ardent workloads, are over 99%. Based on the work presented here and elsewhere, we recommend that future supercomputers incorporate cache memories.




Download Full History