Title

Efficient Transient-Fault Tolerance For Multithreaded Processors Using Dual-Thread Execution

Keywords

Fault tolerance; Microprocessors; Multi-threaded architectures; Redundant systems

Abstract

Reliability becomes a key issue in computer system design as microprocessors are increasingly susceptible to transient faults. Many previously proposed schemes exploit simultaneous multithreaded (SMT) architectures to achieve transient-fault tolerance by running a program concurrently on two threads, a main thread and a redundant checker thread. Such schemes however often incur high performance overheads due to resource contention and redundancy checking. In this paper, we propose dual-thread execution (DTE) for SMT processors to efficiently achieve transient-fault tolerance. DTE is derived from the recently proposed fault-tolerant dual-core execution (FTDCE) paradigm, in which two processor cores on a single chip perform redundant execution to improve both reliability and performance. In this paper, we apply the same principles as in FTDCE to SMT architectures and explore fetch policies to address the critical resource-sharing issue in SMT architectures. Our experimental results show that DTE achieves an average of 56.1% speedup over the previously proposed simultaneously and redundantly threaded processor with recovery (SRTR). More impressively, even compared to single-thread execution, DTE achieves full-coverage transient-fault tolerance along with an average of 15.5% performance improvement. © 2006 IEEE.

Publication Date

12-1-2006

Publication Title

IEEE International Conference on Computer Design, ICCD 2006

Number of Pages

120-126

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/ICCD.2006.4380804

Socpus ID

49749109535 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/49749109535

This document is currently not available here.

Share

COinS