Description
Even with a clear specification of what to look for, uncovering sophisticated attacks has long eluded enterprises because such attacks give rise to a detection problem with two challenging constraints: an extreme class imbalance and a lack of ground truth. In particular, targeted enterprise attacks occur at a low rate, reflect the work of stealthy attackers (and thus frequently remain unknown and unlabeled), and transpire amidst a sea of anomalous-but-benign activity that inherently occurs within modern enterprise networks. This setting poses fundamental challenges to traditional machine learning methods, causing them to detect an insufficient number of attacks or produce an intractable volume of false positives. To overcome these challenges, we present a new approach to anomaly detection for security settings, specification-based anomaly detection, which we use to construct new detection algorithms for identifying rare attacks in large, unlabeled datasets.
Combining these algorithms with the attack models we develop, we design and implement a set of detection systems that collectively form a defense-in-depth approach to unearthing and mitigating enterprise attacks. Through collaborations with three large organizations, we validate the efficacy and practicality of our approach. Given the ability of our systems to detect a wide-range of attacks, the low volume of false positives they generate, and the real-world adoption of many of our ideas, this dissertation illustrates the utility and promise of a data-empowered approach to thwarting enterprise attacks.