Our studies show that parallel programs incur substantially higher miss ratios and bus utilization than comparable uniprocessor programs. The sharing component of these metrics proportionally increases with both cache and block size, and for some cache configurations determines both their magnitude and trend. The amount of overhead depends on the memory reference pattern to the shared data. Programs that exhibit good per-processor-locality perform better than those with fine grain-sharing. This suggests that parallel software writers and better compiler technology can improve program performance through better memory organization of shared data.