Data layout, data placement, and synchronization processes are not usually part of a speech application expert's daily concerns. Yet failure to carefully take these concerns into account in a highly parallel implementation on the graphics processing units (GPUs) could mean an order of magnitude of loss in application performance. In this paper we present an application framework for parallel programming of automatic speech recognition (ASR) applications that allows a speech application expert to effectively implement speech applications on the GPU. It is an approach for crystallizing and transferring the often tacit knowledge of effective parallel programming techniques while allowing for flexible adaptation to various application usage scenarios. The application framework for parallel programming includes an application context description, a software architecture, a reference implementation, and a set of extension points for flexible customization. We describe how a speech expert can use the application framework in a parallel application design flow as well as present two case studies that illustrate the flexibility of the framework to adapt to different usage scenarios. The case studies show two examples in extending the framework to an advanced audio-only speech recognition application and an audio-visual recognition application that enables lip-reading in high noise recognition environments. The adaptation to the latter scenario also demonstrates how the ASR application framework has enabled a Matlab/Java programmer to effectively utilize a GPU to produce an implementation that achieves a 20x speedup in recognition throughput as compared to a sequential CPU implementation.




Download Full History