The Branch Target Buffer (BTB) can reduce the performance penalty of branches in pipelined processors by predicting the path of the branch and caching information used by the branch. This paper discusses two major issues in the design of BTBs with the theme of achieving maximum performance with a limited number of bits allocated to the BTB design. First is the issue of BTB management -- when to enter and discard branches from the BTB. Higher performance can be obtained by entering branches into the BTB only when they experience a branch taken execution. A new method for discarding branches from the BTB is examined. This method discards the branch with the smallest expected value for improving performance, outperforming the LRU strategy by a small margin, at the cost of additional complexity.

The second major issue discussed is the question of what information to store in the BTB. A BTB entry can consist of one or more of the following: branch tag (i.e. the branch address), prediction information, the branch target address, and instructions at the branch target. A variety of BTB designs, with one or more of these fields, are evaluated and compared. This study is then extended to multilevel BTBs, in which different levels have different amounts of information per entry. For the specific implementation assumptions used, multilevel BTBs improved performance over single level BTBs only slightly, at the cost of additional complexity. Multi-level BTBs may provide significant performance improvements for other implementations, however.

Design target miss ratios for BTBs are also developed, so that the performance of BTBs for real workloads may be estimated.




Download Full History