Evaluations of the behavior of Internet services during partial outages tend to rely upon system throughput as their primary unit of measurement. Other metrics such as response time and data quality may be equally as important, however, because of their ability to describe the performance of the system from the user's perspective. Because the success of most online services depends upon user satisfaction, such user-oriented metrics may be a more appropriate way to rate a system's effectiveness.

This report investigates how system evaluators can use the performance metric of response time when conducting reliability benchmarks and long-term performability modeling of online systems. Specifically, this report describes the results of fault-injection tests run on a cluster-based web server under an emulated client workload and explores various ways of presenting and summarizing data about the observed system behavior. The report concludes with a discussion of how online services can improve their effectiveness by adopting and employing more useful and descriptive performability measurements.




