In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the M-step can be prohibitive. We present a framework which exploits parameter sparsity to fully distribute the entire EM procedure. Each node interacts with only the subset of parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate the effectiveness of our framework over a MapReduce approach, on two tasks: word alignment for machine translation, and LDA for topic modeling.