A Software-Based Redundant Execution Programming Model for Transient Fault Detection and Correction
- Resource Type
- Conference
- Authors
- Chen, Yi-Shen; Chen, Peng-Sheng
- Source
- 2016 45th International Conference on Parallel Processing Workshops (ICPPW) Parallel Processing Workshops (ICPPW), 2016 45th International Conference on. :66-71 Aug, 2016
- Subject
- Computing and Processing
Parallel processing
Conferences
multi-threading
reliability
transient fault
fault tolerance
compiler
- Language
- ISSN
- 2332-5690
Software reliability is becoming increasingly important as computer systems assume ever greater roles in our everyday life. This paper proposes a software-based redundant execution programming model for transient fault detection and correction. A multi-threading technique is introduced to handle thread-level redundant execution for fault detection, and majority voting is used to recover from errors. A watchdog thread is used to cope with no-response threads. Preliminary experiments for benchmark programs show that the proposed programming model can detect errors from transient faults and that the majority voting strategy can correctly resume program execution. Application of the proposed model will improve programs' fault tolerance.