The last decade has seen two main trends in the large scale computing: on the one hand we have seen the growth of cloud computing where a number of big data applications are deployed on shared cluster of machines. On the other hand there is a deluge of machine learning algorithms used for applications ranging from image classification, machine translation to graph processing, and scientific analysis on large datasets. In light of these trends, a number of challenges arise in terms of how we program, deploy and achieve high performance for large scale machine learning applications. In this dissertation we study the execution properties of machine learning applications and based on these properties we present the design and implementation of systems that can address the above challenges. We first identify how choosing the appropriate hardware can affect the performance of applications and describe Ernest, an efficient performance prediction scheme that uses experiment design to minimize the cost and time taken for building performance models. We then design scheduling mechanisms that can improve performance using two approaches: first by improving data access time by accounting for locality using data-aware scheduling and then by using scalable scheduling techniques that can reduce coordination overheads.