Description
This dissertation aims to improve the usability and resource efficiency of systems for developing and productionizing machine learning applications by investigating multiple directions identified through extensive empirical evidence gathering and analysis. First, we study the applied machine learning literature and execution traces of publicly available machine learning workflows to understand common practices and shed light on the highly iterative process of model development. Using our insights, we present a solution to accelerate the iterative model development process. Next, we analyze the provenance graph of thousands of production pipelines to uncover latent inefficiencies when running these pipelines. Using these insights, we propose a solution to significantly reduce wasted computation in such systems. Our solutions harness classic techniques from systems, databases, and programming languages to automate data management and optimize computation in machine learning application development. Finally, we synthesize findings from interviews with current users of automated machine learning tools to examine the role of automation in model development, as we look ahead to the future of machine learning developer tools.