There has been a growing trend towards automated generation of massive amounts of data at multiple distributed locations. These systems can take advantage of learning and inference algorithms to assemble local observations and reach global conclusions. This requires performing probabilistic inference in a distributed fashion. Distributed inference algorithms eliminate single points of failure, distribute the computation across several nodes and avoid the need to share sensitive data.
The design and implementation of distributed algorithms is very challenging. Our work involves using a combination of overlays and declarative programming to simplify the design of distributed inference algorithms. Our main contribution involves using a declarative language, Overlog, to implement a set of existing probabilistic inference algorithms and evaluating their performance. We also present the conciseness of our declarative implementation. For example, we could implement Junction Tree Running Intersection Property in just 7 rules in the Overlog.
We then use the distributed inference architecture for collaborative spam detection. This work is a first step towards applying distributed inference techniques for network monitoring. During the design of the application we learned that multiple factors like algorithm selection, data partitioning and aggregation play an important role when solving network monitoring problems using distributed inference. Future work in this direction involves solving the issues faced to make the spam detection application scalable and practical to provide real-time detection.