This dissertation presents new implementation techniques for lazy multithreading systems on conventional machines and compares a variety of design choices. We develop an abstract machine that makes explicit the design decisions for achieving lazy multithreading. We introduce new options on each of the four axes in the design space: the storage model, the thread representation, the disconnection method, and the queueing mechanism. Stacklets, our new storage model, allows parallel calls to maintain the invariants of sequential calls. Thread seeds, our new thread representation, allows threads to be stolen without requiring thread migration or shared memory. Lazy-disconnect, our new disconnection method, does not restrict the use of pointers. Implicit and Lazy queueing, our two new queueing mechanisms, eliminate the need for explicit bookkeeping. Additionally, we develop a core set of compilation techniques and runtime primitives that form the basis for the efficient implementation of any design point.
We have evaluated the different approaches by incorporating them into a compiler for an explicitly parallel extension of Split-C. We show that there exist points in the design space (e.g., stacklet, thread seeds, lazy-disconnect, and lazy queueing) for which fine-grained parallelism can be efficiently supported even on distributed memory machines, thus allowing programmers freedom to specify the parallelism in a program without concern for excessive overhead.