Description
Several widely held assumptions about collecting profile data are not true. It is not true that the optimal instrumentation problem has been solved, and it is not true that counting traversals of the arcs of a program flow graph is more expensive and complex than counting executions of basic blocks. There are simple program flow graphs for which finding optimal instrumentations is possibly exponential. An algorithm is presented that computes instrumentations of a program to count arc traversals (and therefore basic block counts also). Such instrumentations impose 10% to 20% overhead on the execution of a program, often less than the overhead required for collecting basic block execution counts.
An algorithm called Greedy Sewing improves the behavior of programs on machines with instruction caches. By moving basic blocks physically closer together if they are executed close together in time, miss rates in instruction caches can be reduced up to 50%. Arc-count profile data not only allows the compiler to know which basic blocks to move closer together, it also allows those situations that will have little or no effect on the final performance of the reorganized program to be ignored. Such a low-level compiler optimization would be difficult to do without arc-count profile data.
The primary contribution of this work is the development of TYPESETTER, a programming system that utilizes profile data to select implementations of program abstractions. The system integrates the development, evaluation, and selection of alternative implementations of programming abstractions into a package that is transparent to the programmer. Unlike previous systems, TYPESETTER does not require programmers to know details of the compiler implementation. Experience indicates that the TYPESETTER approach to system synthesis has considerable benefit, and will continue to be a promising avenue of research.