Description
This thesis describes the implementation of a high-bandwidth log-structured file system called "Sawmill" that uses a RAID disk array. Sawmill runs on the RAID-II storage system; this architecture provides a fast data path that moves data rapidly among the disks, high-speed controller memory, and the network.
By using a log-structured file system, Sawmill avoids the high cost of small writes to a RAID. Small writes through Sawmill are a factor of three faster than writes to the underlying RAID. Sawmill also uses new techniques to obtain better bandwidth from a log-structured file system. By performing disk layout "on-the-fly," rather than through a block cache as in previous log-structured file systems, the CPU overhead of processing cache blocks is reduced and write transfers can take place in large, efficient units.
The thesis also examines how a file system can take advantage of the data path and controller memory of a storage system such as RAID-II. Sawmill uses a stream-based approach instead of a block cache to permit large, efficient transfers.
Sawmill can read at up to 21 MB/s and write at up to 15 MB/s while running on a fairly slow (9 SPECmarks) Sun-4 workstation. In comparison, existing file systems provide less than 1 MB/s on the RAID-II architecture because they perform inefficient small operations and don't take advantage of the data path of RAID-II. In many cases, Sawmill performance is limited by the relatively slow server CPU, suggesting that the system would be able to handle larger and faster disk arrays simply by using a faster processor.