Using secure multiparty computation (MPC), organizations which own sensitive data (e.g., in healthcare, finance or law enforcement) can train machine learning models over their joint dataset without revealing their data to each other. At the same time, secure computation restricts operations on the joint dataset, which impedes computation to assess its quality. Without such an assessment, deploying a jointly trained model is potentially illegal. Regulations, such as the European Union’s General Data Protection Regulation (GDPR), require organizations to be legally responsible for the errors, bias, or discrimination caused by their machine learning models. Hence, testing data quality emerges as an indispensable step in secure collaborative learning. However, performing distribution testing is prohibitively expensive using current techniques, as shown in our experiments.

We present HOLMES, a protocol for performing distribution testing efficiently. In our experiments, compared with three non-trivial baselines, HOLMES achieves a speedup of more than 10x for classical distribution tests and up to 104x for multidimensional tests. The core of HOLMES is a hybrid protocol that integrates MPC with zero-knowledge proofs and a new ZK-friendly and naturally oblivious sketching algorithm for multidimensional tests, both with significantly lower computational complexity and concrete execution costs.




Download Full History