The widening disparity between processor speeds and disk performance is causing an increasing I/O performance gap. One method of increasing disk bandwidth is through arrays of multiple disks (RAIDs). In addition, to prevent the file server from limiting disk performance, new controller architectures connect the disks directly to the network so that data movement bypasses the file server. These developments raise two questions for file systems: how to get the best performance from a RAID, and how to use such a controller architecture.
This thesis describes the implementation of a high-bandwidth log-structured file system called "Sawmill" that uses a RAID disk array. Sawmill runs on the RAID-II storage system; this architecture provides a fast data path that moves data rapidly among the disks, high-speed controller memory, and the network.
By using a log-structured file system, Sawmill avoids the high cost of small writes to a RAID. Small writes through Sawmill are a factor of three faster than writes to the underlying RAID. Sawmill also uses new techniques to obtain better bandwidth from a log-structured file system. By performing disk layout "on-the-fly," rather than through a block cache as in previous log-structured file systems, the CPU overhead of processing cache blocks is reduced and write transfers can take place in large, efficient units.
The thesis also examines how a file system can take advantage of the data path and controller memory of a storage system such as RAID-II. Sawmill uses a stream-based approach instead of a block cache to permit large, efficient transfers.
Sawmill can read at up to 21 MB/s and write at up to 15 MB/s while running on a fairly slow (9 SPECmarks) Sun-4 workstation. In comparison, existing file systems provide less than 1 MB/s on the RAID-II architecture because they perform inefficient small operations and don't take advantage of the data path of RAID-II. In many cases, Sawmill performance is limited by the relatively slow server CPU, suggesting that the system would be able to handle larger and faster disk arrays simply by using a faster processor.
Sawmill: A Logging File System for a High-Performance RAID Disk Array
Full Collection Name
Electrical Engineering & Computer Sciences Technical Reports
The Engineering Library
Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).