In this thesis, we explore theoretical foundations of distributed machine learning under communication constraints. We study the trade-offs between communication and computation, as well as the trade-off between communication and learning accuracy. In particular settings, we are able to design algorithms that don't compromise on either side. We also establish fundamental limits that apply to all distributed algorithms. In more detail, this thesis makes the following contributions:
* We propose communication-efficient algorithms for statistical optimization. These algorithms achieve the best possible statistical accuracy and suffer the least possible computation overhead.
* We extend the same algorithmic idea to non-parametric regression, proposing an algorithm which also guarantees the optimal statistical rate and superlinearly reduces the computation time.
* In the general setting of regularized empirical risk minimization, we propose a distributed optimization algorithm whose communication cost is independent of the data size, and is only weakly dependent on the number of machines.
* We establish lower bounds on the communication complexity of statistical estimation and linear algebraic operations. These lower bounds characterize the fundamental limits of any distributed algorithm.
* We design and implement a general framework for parallelizing sequential algorithms. The framework consists of a programming interface and an execution engine. The programming interface allows machine learning experts to implement the algorithm without concerning any detail about the distributed system. The execution engine automatically parallelizes the algorithm in a communication-efficient manner.