Professor John Canny, our capstone advisor, has developed the BIDData Suite, a machine learning toolkit that expertly utilizes GPUs to achieve record-breaking "roofline" performance on a single machine. Our capstone focuses on extending BIDData's statistical models with the ability to train effectively in parallel on a cluster.
Our team has succeeded in developing multiple cluster-enabling modules within BIDData's codebase, including (1) an inter-machine communication framework, covered in Jiaqi Xie's technical report, (2) a network throughput monitor, covered in Quanlai Li's technical report, and (3) several distributed variants of practical machine learning models, covered in depth in Chapter 1 of this report.
Chapter 2 focuses on the issues that arise as a consequence of the growing trends of using machine learning to analyze massive datasets in industry, and how our project aims to alleviate some of these issues. Chapter 2 also provides an analysis of the market strategy for our industry partner, OpenChai, who is trying to bring the benefits of machine learning to lagging enterprise like healthcare and banking.