End-to-End Large Scale Machine Learning with KeystoneML

Sparks, Evan

PDF

Description

The rise of data center computing and Internet-connected devices has led to an unparalleled explosion in the volumes of data collected across a multitude of industries and academic disciplines. This data serves as fuel for statistical machine learning techniques that in turn enable some of today's most advanced applications including those powered by image classification, speech recognition, and natural language understanding, which we broadly term machine learning applications.

Unfortunately, until recently the tools and techniques used to leverage recent advances in machine learning at the scales demanded by modern datasets, and thus develop these applications, have been available only to experts in fields such as distributed computing, statistics, and optimization.

I describe my efforts to render these tools accessible to a broader audience of application developers, and further demonstrate that by taking a holistic approach and capturing end-to-end high level specifications of machine learning applications the systems I present here can make novel, high impact optimizations to decrease resource consumption while simultaneously increasing throughput. These improvements are designed to decrease machine learning application development time, increase quality, and increase machine learning application developer productivity. I demonstrate the viability of these optimizations via experiments on a number of real-world applications in domains such as collaborative filtering, computer vision, and natural language processing.

Many of the ideas presented in this thesis have already had practical impact as embodied in the open source software packages KeystoneML and Apache Spark MLlib.

Details

Title

End-to-End Large Scale Machine Learning with KeystoneML

Creator

Sparks, Evan, Author

Published

2016-12-15

Full Collection Name

Electrical Engineering & Computer Sciences Technical Reports

Other Identifiers

EECS-2016-200

Type

Text

Format

technical reports

Extent

143 p

Archive

The Engineering Library

Usage Statement

Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).

Collection

EECS Technical Reports

Files

Statistics

Download Full History

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket