Aggressive technology scaling has necessitated the development of techniques to ensure resilience to device faults, including soft errors, circuit wear-out, variability, and environmental effects. All error resilience techniques employ some form of redundancy, resulting in added cost such as area or power overhead. Existing selective hardening techniques have been focused on identifying the most vulnerable components and then statically hardening them to produce a resilience to overhead trade-off. This paper proposes a new technique that can further reduce this overhead for error resilience mechanisms that are controllable. The key idea is to generate control predicates that can turn the resilience mechanisms ON and OFF dynamically and at the right time. These predicates are mined using a 0-1 integer linear optimization formulation. An experimental evaluation shows that the proposed approach provides a systematic way to control error-resilience so as to meet reliability targets under a specified power budget. For example, for a chip multiprocessor router, our approach achieves the same amount of soft error resilience with only half of the power overhead compared with the static hardening approach.




Download Full History