Keywords
Fault tolerance in Engineering, Field programmable gate arrays, Genetic algorithms, Markov processes, Redundancy in Engineering, reconfigurable computing, evolvable hardware, organic computing, self x properties, space missions
Abstract
As reconfigurable devices' capacities and the complexity of applications that use them increase, the need for self-reliance of deployed systems becomes increasingly prominent. A Sustainable Modular Adaptive Redundancy Technique (SMART) composed of a dual-layered organic system is proposed, analyzed, implemented, and experimentally evaluated. SMART relies upon a variety of self-regulating properties to control availability, energy consumption, and area used, in dynamically-changing environments that require high degree of adaptation. The hardware layer is implemented on a Xilinx Virtex-4 Field Programmable Gate Array (FPGA) to provide self-repair using a novel approach called a Reconfigurable Adaptive Redundancy System (RARS). The software layer supervises the organic activities within the FPGA and extends the self-healing capabilities through application-independent, intrinsic, evolutionary repair techniques to leverage the benefits of dynamic Partial Reconfiguration (PR). A SMART prototype is evaluated using a Sobel edge detection application. This prototype is shown to provide sustainability for stressful occurrences of transient and permanent fault injection procedures while still reducing energy consumption and area requirements. An Organic Genetic Algorithm (OGA) technique is shown capable of consistently repairing hard faults while maintaining correct edge detector outputs, by exploiting spatial redundancy in the reconfigurable hardware. A Monte Carlo driven Continuous Markov Time Chains (CTMC) simulation is conducted to compare SMART's availability to industry-standard Triple Modular Technique (TMR) techniques. Based on nine use cases, parameterized with realistic fault and repair rates acquired from publically available sources, the results indicate that availability is significantly enhanced by the adoption of fast repair techniques targeting aging-related hard-faults. Under harsh environments, SMART is shown to improve system availability from 36.02% with lengthy repair techniques to 98.84% with fast ones. This value increases to "five nines" (99.9998%) under relatively more favorable conditions. Lastly, SMART is compared to twenty eight standard TMR benchmarks that are generated by the widely-accepted BL-TMR tools. Results show that in seven out of nine use cases, SMART is the recommended technique, with power savings ranging from 22% to 29%, and area savings ranging from 17% to 24%, while still maintaining the same level of availability.
Notes
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Graduation Date
2011
Semester
Summer
Advisor
DeMara, Ronald F.
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Electrical Engineering and Computer Science
Format
application/pdf
Identifier
CFE0003993
URL
http://purl.fcla.edu/fcla/etd/CFE0003993
Language
English
Release Date
August 2011
Length of Campus-only Access
None
Access Status
Doctoral Dissertation (Open Access)
STARS Citation
Al-Haddad, Rawad N., "An Adaptive Modular Redundancy Technique to Self-regulate Availability, Area, and Energy Consumption in Mission-critical Applications" (2011). Electronic Theses and Dissertations. 6656.
https://stars.library.ucf.edu/etd/6656