In the first case study, we examine the problem of keyboard acoustic emanations. Attackers use security inference to analyze sound signals from typing on computer keyboards. We present a novel attack that takes as input a 10-minute sound recording of a user typing English text on a keyboard and recovers up to 96% of the characters typed. There is no need for a labeled training recording. Moreover, the recognizer bootstrapped this way can even recognize random text such as passwords: in our experiments, with 20 or fewer attempts to guess a random letter-only password, an attacker can guess 90% of 5-character passwords and 70% of 10-character passwords. This case study demonstrates that applying statistical analysis to security problems provides new tools for drawing powerful conclusions.
In the second case study, system administrators (or defenders) use security inference to determine the nature of attackers. We develop new techniques to map botnet membership and other characteristics of botnets using spam traces. The data consist of side channel traces from attackers: spam email messages received by Hotmail, one of the largest Web mail services. The basic assumption is that spam email messages with similar content often originate from the same controlling entity. These email messages share a common economic interest, so it is likely that a single entity also controls the machines sending these spam email messages. By grouping spam email messages with similar content and determining the senders of these email messages, one can infer the composition of the botnet. This approach can analyze botnets regardless of their internal organization and means of communication. This work also reports new statistics about botnets.
In this thesis, we leverage recent developments in the areas of applied data mining, statistical learning, and distributed data analysis. The approaches we discuss are easily deployable to real systems.