Go to  Advanced Search

Hardware error detection in multicore parallel programs

Show full item record

Files in this item

Files Size Format Description   View
ubc_2012_fall_wei_jiesheng.pdf 491.6Kb Adobe Portable Document Format   View/Open
 
Title: Hardware error detection in multicore parallel programs
Author: Wei, Jiesheng
Degree Master of Applied Science - MASc
Program Electrical and Computer Engineering
Copyright Date: 2012
Publicly Available in cIRcle 2012-08-17
Abstract: The scaling of Silicon devices has exacerbated the unreliability of modern computer systems, and power constraints have necessitated the involvement of software in hardware error detection. Simultaneously, the multi-core revolution has impelled software to become parallel. Therefore, there is a compelling need to protect parallel programs from hardware errors. Parallel programs’ tasks have significant similarity in control data due to the use of high-level programming models. In this thesis, we propose BlockWatch to leverage the similarity in parallel program’s control data for detecting hardware errors. BlockWatch statically extracts the similarity among different threads of a parallel program and checks the similarity at runtime. We evaluate BlockWatch on eight SPLASH-2 benchmarks to measure its performance overhead and error detection coverage. We find that BlockWatch incurs an average overhead of 15% across all programs, and provides an average SDC coverage of 97% for faults in the control data.
URI: http://hdl.handle.net/2429/42961
Scholarly Level: Graduate

This item appears in the following Collection(s)

Show full item record

All items in cIRcle are protected by copyright, with all rights reserved.

UBC Library
1961 East Mall
Vancouver, B.C.
Canada V6T 1Z1
Tel: 604-822-6375
Fax: 604-822-3893