In this paper we summarize the tradeoffs made in the designs of this and previous simulators. In previous simulators, these tradeoffs have often led to sacrificing accuracy for faster simulation. However by careful selection of techniques and when to apply them, we have built a simulator that is both faster and more accurate than previous simulation systems. The improved accuracy comes from applying code augmentation techniques at a uniform low level and from having such fast context switching that accuracy/performance tradeoffs become unnecessary.
Our simulator has a modular design and has been configured in many ways. It has been used to conduct numerous experiments on multithreaded machine behavior, application behavior, cache behavior, compiler optimization, and traffic patterns. Because of its high performance, we have been able to perform simulations of larger machines than would otherwise have been feasible.