Reconfigurable logic or Field-Programmable Gate Array (FPGA) devices have the ability to dynamically adapt the computational circuit based on user-specified or operating-condition requirements. Such hardware platforms are utilized in this dissertation to develop adaptive techniques for achieving reliable and sustainable operation while autonomously meeting these requirements. In particular, the properties of resource uniformity and in-field reconfiguration via on-chip processors are exploited to implement Evolvable Hardware (EHW). EHW utilize genetic algorithms to realize logic circuits at runtime, as directed by the objective function. However, the size of problems solved using EHW as compared with traditional approaches has been limited to relatively compact circuits. This is due to the increase in complexity of the genetic algorithm with increase in circuit size. To address this research challenge of scalability, the Netlist-Driven Evolutionary Refurbishment (NDER) technique was designed and implemented herein to enable on-the-fly permanent fault mitigation in FPGA circuits. NDER has been shown to achieve refurbishment of relatively large sized benchmark circuits as compared to related works. Additionally, Design Diversity (DD) techniques which are used to aid such evolutionary refurbishment techniques are also proposed and the efficacy of various DD techniques is quantified and evaluated. Similarly, there exists a growing need for adaptable logic datapaths in custom-designed nanometer-scale ICs, for ensuring operational reliability in the presence of Process, Voltage, and Temperature (PVT) and, transistor-aging variations owing to decreased feature sizes for electronic devices. Without such adaptability, excessive design guardbands are required to maintain the desired integration and performance levels. To address these challenges, the circuit-level technique of Self-Recovery Enabled Logic (SREL) was designed herein. At design-time, vulnerable portions of the circuit identified using conventional Electronic Design Automation tools are replicated to provide post-fabrication adaptability via intelligent techniques. In-situ timing sensors are utilized in a feedback loop to activate suitable datapaths based on current conditions that optimize performance and energy consumption. Primarily, SREL is able to mitigate the timing degradations caused due to transistor aging effects in sub-micron devices by reducing the stress induced on active elements by utilizing power-gating. As a result, fewer guardbands need to be included to achieve comparable performance levels which leads to considerable energy savings over the operational lifetime. The need for energy-efficient operation in current computing systems has given rise to Near-Threshold Computing as opposed to the conventional approach of operating devices at nominal voltage. In particular, the goal of exascale computing initiative in High Performance Computing (HPC) is to achieve 1 EFLOPS under the power budget of 20MW. However, it comes at the cost of increased reliability concerns, such as the increase in performance variations and soft errors. This has given rise to increased resiliency requirements for HPC applications in terms of ensuring functionality within given error thresholds while operating at lower voltages. My dissertation research devised techniques and tools to quantify the effects of radiation-induced transient faults in distributed applications on large-scale systems. A combination of compiler-level code transformation and instrumentation are employed for runtime monitoring to assess the speed and depth of application state corruption as a result of fault injection. Finally, fault propagation models are derived for each HPC application that can be used to estimate the number of corrupted memory locations at runtime. Additionally, the tradeoffs between performance and vulnerability and the causal relations between compiler optimization and application vulnerability are investigated.
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Electrical Engineering and Computer Engineering
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Ashraf, Rizwan Arshad, "Adaptive Architectural Strategies for Resilient Energy-Aware Computing" (2015). Electronic Theses and Dissertations. 5009.