UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Integrated hardware-software diagnosis of intermittent faults Dadashikelayeh , Majid

Abstract

Intermittent hardware faults are hard to diagnose as they occur non-deterministically. Hardware-only diagnosis techniques incur significant power and area overheads. On the other hand, software-only diagnosis techniques have low power and area overheads, but have limited visibility into many micro-architectural structures and hence cannot diagnose faults in them. To overcome these limitations, we propose a hardware-software integrated framework for diagnosing intermittent faults. The hardware part of our framework, called SCRIBE continuously records the resource usage information of every instruction in the processor, and exposes it to the software layer. SCRIBE has 0.95% on-chip area overhead, incurs a performance overhead of 12% and power overhead of 9%, on average. The software part of our framework is called SIED and uses backtracking from the program's crash dump to find the faulty micro-architectural resource. Our technique has an average accuracy of 84% in diagnosing the faulty resource, which in turn enables fine-grained deconfiguration with less than 2% performance loss after deconfiguration.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivs 2.5 Canada