This dissertation aims to advance the state of credential compromise detection for enterprise settings. We leverage several years worth of real-world network logs from the Lawrence Berkeley National Laboratory (LBNL) in order to develop systems for detecting: (i) stealthy, distributed brute-force attacks that compromise password-based credentials by attempting a number of guesses against the site's servers---these attacks proceed in a stealthy fashion by distributing the brute-force work across an army of machines, such that each individual host only makes a few attempts, and thereby becomes hard to differentiate from failed attempts of legitimate users, and (ii) anomalous logins indicating that a user's login credentials may have been potentially compromised---either through brute-forcing attacks or broadly through other vectors (phishing attacks and credential-stealing malware).
For the detection of stealthy brute-force attacks, we first develop a general approach for flagging distributed malicious activity in which individual attack sources each operate in a stealthy, low-profile manner. We base our approach on observing statistically significant changes in a parameter that summarizes aggregate activity, bracketing a distributed attack in time, and then determining which sources present during that interval appear to have coordinated their activity. We then apply this approach to the problem of detecting stealthy distributed SSH brute-forcing activity, showing that we can model the process of legitimate users failing to authenticate using a beta-binomial distribution, which enables us to tune a detector that trades off an expected level of false positives versus time-to-detection. Using the detector we study the prevalence of distributed brute-forcing, finding dozens of instances in an extensive eight-year dataset collected at the Lawrence Berkeley National Lab. Many of the attacks---some of which last months---would be quite difficult to detect individually. While a number of the attacks reflect indiscriminant global probing, we also find attacks that targeted only the local site, as well as occasional attacks that succeeded.
For the detection of anomalous logins, we first extensively characterize the volumes and diversity of login activity at LBNL's network, with the goal of engineering features that with good confidence can serve as indicators of compromise. We then develop a practical rule-based detector that leverages the global view of the network as well as historical profile of each user to flag potentially compromised credentials. On average, our detector raises 2--3 alarms per week---a reasonable analyst workload for an enterprise with several thousand users. To understand these alarms, we worked with the site operators, who deemed the top ten percent of instances suspicious enough to merit an inquiry to the affected user. Our detector successfully flagged a known compromised account and discovered an instance of a (benign) shared credential in use by a remote collaborator.