This thesis focuses on an often-neglected area of computer system design: tertiary storage. In the last decade, several advances in tertiary storage have made it of increasing interest, including increased tape capacities, less expensive tape drives and optical disk drives, and the proliferation of robots for loading tertiary devices automatically. Concurrently, faster processor speeds have enabled a growing number of applications that would benefit from fast access to massive storage. We evaluate the usefulness of current tertiary storage systems for some of these new applications.

First, we describe the design and performance of tertiary storage products. Next, we evaluate the technique of data striping in tape arrays. We find that tape striping improves the performance of sequential workloads. However, striped tape systems perform poorly for applications in which there are several non-sequential, concurrent requests active in the tape library because of contention for a small number of tape drives.

We characterize two new workloads: video-on-demand servers and digital libraries. For the former, we evaluate design alternatives for providing storage in a movies-on-demand system. First, we study disk farms in which one movie is stored per disk. This is a simple scheme, but it wastes substantial disk bandwidth, since disks holding less popular movies are under-utilized; also, good performance requires that movies be replicated to reflect the user request pattern. Next, we examine disk farms in which movies are striped across disks, and find that striped video servers offer close to full utilization of the disks by achieving better load balancing. Finally, we evaluate the use of storage hierarchies for video service that include a tertiary library along with a disk farm. Unfortunately, we show that the performance of neither magnetic tape libraries nor optical disk jukeboxes as part of a storage hierarchy is adequate to service the predicted distribution of movie accesses.

Throughout the dissertation, we identify several desirable changes in tertiary storage systems. To support new applications with higher concurrencies, tertiary libraries should be redesigned with a higher ratio of drives to media, higher bandwidth per drive and faster access times.





Download Full History