Today's Internet datacenters run many complex and large-scale Web applications that are very difficult to manage. The main challenges are understanding user workloads and application performance, and quickly identifying and resolving performance problems. Statistical Machine Learning (SML) provides a methodology for quickly processing the large quantities of monitoring data generated by these applications, finding repeating patterns in their behavior, and building accurate models of their performance.
This dissertation argues that SML is a useful tool for simplifying and automating datacenter operations and demonstrates application of SML to three important problems in this area: characterization and synthesis of workload spikes, dynamic resource allocation in stateful systems, and quick and accurate identification of recurring performance problems.
Automating Datacenter Operations Using Machine Learning
Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).