This dissertation explores how to reduce the system administration cost of disk storage systems. There are several reasons why reducing the operator's burden is the key to success of large storage systems. One is that the cost of system administration usually dominates the budget of storage systems. Another is that an operator error on storage systems can easily have disastrous results. In the field of physiology and psychology, there have been studies that show reducing mental and physical stress on the operator is crucial in preventing human errors.

This dissertation describes Tertiary Disk, a large-scale disk array system built from commodity components, and how we evaluated the feasibility of its design. Instead of incurring the cost of custom hardware, we attempt to solve various problems by design and software. Tertiary Disk is a cluster of storage nodes connected by switched Ethernet. Each storage node is a PC hosting a few dozen SCSI disks, running the FreeBSD operating system. The system is used as a web-based image server for the Zoom Project in cooperation with the Fine Arts Museums of San Francisco. Our system is fully redundant in both hardware and software, and is designed to avoid a single point of failure.

There are several approaches to lower the human cost of system administration. One is to make the system as autonomous as possible. I have designed a self-maintenance extension to the operating system to make the system run continuously in the event of failures. There are also several other improvements to the system to make the operator's job easier.

Finally, we will prove the feasibility of our system by evaluating it by simulation. Failure data that has been collected on Tertiary Disk over the course of several years were used to design an event generator. The second program, a simulator, models the system using a directed acyclic graph and computes its availability by solving a connectivity problem. The results have shown that our system performs as expected with the current set of parameters, and also expands nicely into the future.




Download Full History