Computer Architecture
A. Teymouri; H. Dorosti; M. Ersali Salehi Nasab; S.M. Fakhraie
Abstract
Background and Objectives: The future demands of multimedia and signal processing applications forced the IC designers to utilize efficient high performance techniques in more complex SoCs to achieve higher computing throughput besides energy/power efficiency improvement. In recent technologies, variation ...
Read More
Background and Objectives: The future demands of multimedia and signal processing applications forced the IC designers to utilize efficient high performance techniques in more complex SoCs to achieve higher computing throughput besides energy/power efficiency improvement. In recent technologies, variation effects and leakage power highly affect the design specifications and designers need to consider these parameters in design time. Considering both challenges as well as boosting the computation throughput makes the design more difficult.Methods: In this article, we propose a simple serial core for higher energy/power efficiency and also utilize data level parallel structures to achieve required computation throughput.Results: Using the proposed core we have 35% (75%) energy (power) improvement and also using parallel structure results in 8x higher throughput. The proposed architecture is able to provide 76 MIPS computation throughput by consuming only 2.7 pj per instruction. The outstanding feature of this processor is its resiliency against the variation effects.Conclusion: Simple serial architecture reduces the effect of variations on design paths, furthermore, the effect of process variation on throughput loss and energy dissipation is negligible and almost zero. Proposed processor architecture is proper for energy/power constrained applications such as internet of things (IoT) and mobile devices to enable easy energy harvesting for longer lifetime.
Computer Architecture
H. Dorosti
Abstract
Background and Objectives: Considering the fast growing low-power internet of things, the power/energy and performance constraints have become more challenging in design and operation time. Static and dynamic variations make the situation worse in terms of reliability, performance, and energy consumption. ...
Read More
Background and Objectives: Considering the fast growing low-power internet of things, the power/energy and performance constraints have become more challenging in design and operation time. Static and dynamic variations make the situation worse in terms of reliability, performance, and energy consumption. In this work, a novel slack measurement circuit is proposed to have precise frequency management based on timing violation measurement.Methods: Proposed slack measurement circuit is based on measuring the delay difference between the edge clock pulse and possible transition on path end-points (primary outputs of design). The output of proposed slack monitoring circuits is a digital code related to the current state of target critical path delay. In order to convert this digital code to equivalent delay difference, the delay of a reference gate is mandatory which is basic unit in proposed monitor. This monitor enables the design to have more precise and efficient frequency management, while maintaining the correct functionality regarding low-power mode. Results: Applying this method on a MIPS processor reduces the amount of performance penalty and recovery energy overhead up to 30% with only 2% additional hardware. Results for benchmark applications in low-power mode, show 7-30% power improvement in normal execution mode. If the application is resilient against occurred errors duo to timing violations, proposed method achieves 20-60% power reduction considering approximate computation as long as application is showing resilience. The performance of proposed method depends on the degree of application resilience against the timing errors. In order to keep generality of propsoed monitor for different applications, the resilience threshold is user programmable to configure according to the requirements of each application.Conclusion: The results show that precise frequency scheduling is more energy/power efficient in static and dynamic variation management. Utilizing a proper monitor capable of measureing the amount of violation will help to have finer frequency management. At the other hand, this method will help to use the resilience of application according to estimation about the possible error value based on measured vilation amount.
Digital Design
B. Soltani Farani; H. Dorosti; M. E. Salehi; Si M. Fakhraie
Abstract
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor.Methods: ...
Read More
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor.Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power general purpose processor. Afterwards, we make some modifications to add new instructions to the processor instruction set for better adapting to signal processing applications. In the second step, employing sub-threshold cores in many-core architectures, we use the proposed processor as simple basic cores in a many-core architecture.Results: In comparison with the baseline architecture, these modifications reduce the program memory size about 42% in average. In addition, data memory accesses are reduced about 60% in average, and more than 90% speed-up is achieved. According to the improvements in total execution time (93%) and power consumption (27%), the total consumed energy is reduced about 95% in average with at most 2.6% area overhead and without increasing the process variation effects on processor specifications.Conclusion: The results show that for parallel applications, such as FFT in LTE standard, exploiting sub-threshold processors in a many-core architecture not only can satisfy the required performance, but also reduce the power consumption about 50% or even more.