In this thesis, we illustrate the impact of system-aware machine learning through the lens of optimization, a crucial component in formulating and solving most machine learning problems. Classically, the performance of an optimization method is measured in terms of accuracy (i.e., does it realize the correct machine learning model?) and convergence rate (after how many iterations?). In modern computing regimes, however, it becomes critical to additionally consider a number of systems-related aspects for best overall performance. These aspects can range from low-level details, such as data structures or machine specifications, to higher-level concepts, such as the tradeoff between communication and computation.
We propose a general optimization framework for machine learning, CoCoA, that gives careful consideration to systems parameters, often incorporating them directly into the method and theory. We illustrate the impact of CoCoA in two popular distributed regimes: the traditional cluster-computing environment, and the increasingly common setting of on-device (federated) learning. Our results indicate that by marrying systems-level parameters and optimization techniques, we can achieve orders-of-magnitude speedups for solving modern machine learning problems at scale. We corroborate these empirical results by providing theoretical guarantees that expose systems parameters to give further insight into empirical performance.