Microprocessor architecture is evolving rapidly as silicon integrated circuits increase in density. On-chip cache memories are becoming an established feature in 32-bit microprocessor designs because they significantly improve performance. Microprocessor performance is degraded by bus contention between instruction and data memory traffic. On-chip instruction caches reduce this contention problem by supplying many of the instructions executed by the microprocessor. The SPUR instruction unit is a direct mapped cache with 512 bytes or 128 instructions. It is organized in sub-blocks to provide efficient instruction fetching and prefetching from the external memory. The SPUR instruction unit is controlled by two finite state machines: one for instruction fetching and one for instruction prefetching. These control functions are implemented using PLA's and standard logic cells. The standard cells are implemented in domino logic to meet speed and area constraints. SPICE simulations indicate that the slowest signal delay path in the instruction unit is 14.7 ns. The SPUR instruction unit contains 39,400 transistors and occupies 4200 x 6000 um^2 in a 2 um technology. Area and speed metrics for alternative instruction units indicate that implementations with either larger sub-blocks or two-way associativity will satisfy the SPUR CPU speed requirements. A two-way set-associative implementation would consume approximately 20% more silicon area.