Application specific multiprocessors are capable of high performance implementations while remaining flexible enough to support a range of applications. Architects of these systems achieve high performance through domain specific optimizations such as multiple processing elements, dedicated logic, and specialized memory and interconnection. However, these features are often introduced at the expense of programming productivity. For application-specific programmable systems to succeed, it is necessary to deliver high performance implementations quickly. Three of the most important and time-consuming steps to arriving at implementations on these platforms are (1) to extract parallelism from applications descriptions, (2) to arrive at a model of the architecture, and (3) to map the parallelized application to the architectural model. We examine this problem for the networking domain and target a commercial family of network processors: Intel's IXP series. We propose and demonstrate a solution that starts with a domain specific language, which allows the extraction of parallelism without designer intervention. We show that high level application information enables optimizations and automated transformations to a task graph. A task graph is a model of an application that exposes its computation, data, and communication. The elements of the task graph can be mapped to processing elements, memory, and interconnect of the target architecture. We formulatethe mapping problem as an integer linear programming (ILP) problem. We demonstrate that this method finds efficient solutions with fast run times on the data plane of network applications of different scales and complexity including IP forwarding, differentiated services, network address translation, and web switching. Using this design flow designers enjoy design times up to 3.5 times shorter than other new productive approaches. The resulting implementations are within 17% of hand mapped approaches. While the demonstration vehicle for this work has been network processing, we believe it will have wider applicability to other application domains that are starting to employ more single chip multiprocessing.




Download Full History