We propose a method for describing the asymptotic behavior of programs in practice by measuring their empirical computational complexity. Our method involves running a program on workloads spanning several orders of magnitude in size, measuring their performance, and fitting these observations to a model that predicts performance as a function of workload size. Comparing these models to the programmer's expectations or to theoretical asymptotic bounds can reveal performance bugs or confirm that a program's performance scales as expected.
We develop our methodology for constructing these models of empirical complexity as we describe and evaluate two techniques. Our first technique, BB-TrendProf, constructs models that predict how many times each basic block runs as a linear (y = a + b*x) or a powerlaw (y = a*x^b) function of user-specified features of the program's workloads. To present output succinctly and focus attention on scalability-critical code, BB-TrendProf groups and ranks program locations based on these models. We demonstrate the power of BB-TrendProf compared to existing tools by running it on several large programs and reporting cases where its models show (1) an implementation of a complex algorithm scaling as expected, (2) two complex algorithms beating their worst-case theoretical complexity bounds when run on realistic inputs, and (3) a performance bug.
Our second technique, CF-TrendProf, models performance of loops and functions both per-function-invocation and per-workload. It improves upon the precision of BB-TrendProf's models by using control flow to generate candidates from a richer family of models and a novel model selection criteria to select among these candidates. We show that CF-TrendProf's improvements to model generation and selection allow it to correctly characterize or closely approximate the empirical scalability of several well-known algorithms and data structures and to diagnose several synthetic, but realistic, scalability problems without observing an egregiously expensive workload. We also show that CF-TrendProf deals with multiple workload features better than BB-TrendProf. We qualitatively compare the output of BB-TrendProf and CF-TrendProf and discuss their relative strengths and weaknesses.