Precision multiplier for aging-aware design with AHL in real time applications.
DIGITAL multipliers are among the most critical arithmetic functional units in many applications, such as the transforms and filtering. The throughput of these applications depends on multipliers and if the multipliers are too slow, the performance of entire circuits will be reduced.
Furthermore, negative bias temperature instability (NBTI) occurs when a pMOS transistor is under negative bias [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Similarly, the positive bias temperature instability (PBTI), which occurs when an nMOS transistor is under positive bias. Compared with the NBTI effect, the PBTI effect is much smaller on oxide/polygate transistors, and therefore is usually ignored.
A traditional method to mitigate the aging effect including guard-banding and gate oversizing; however, this approach can be very pessimistic and area and power inefficient. To avoid this problem, many NBTI-aware methodologies have been proposed.
Traditional circuits use critical path delay as the overall circuit clock cycle in order to perform correctly. However, the probability that the critical paths are activated is low. In most cases, the path delay is shorter than the critical path. For these noncritical paths, using the critical path delay as the overall cycle period will result in significant timing waste. Hence, the variable-latency design was proposed to reduce the timing waste of traditional circuits.
In , a modified booth multiplier has been designed which provides a flexible arithmetic capacity and a tradeoff between output precision and power consumption due to using of SPST architecture. In , the reversible logic are to minimize the number of garbage outputs & reduce the delay. Hence, power dissipation is controlled with reduced number of garbage values. The Braun multiplier which removes the extra correction circuitry needed. But the limitation of this technique is that it cannot stop the switching activity even if the bit coefficient is zero so the power & area consumption is high. For more details refer .
The critical paths are divided into two shorter paths that could be unequal and the clock cycle is set to the delay of the longer one. These designs were able to reduce the timing waste of traditional circuits to improve performance, but they did not consider the aging effect.
In this paper, we propose an aging-aware reliable multiplier design with adaptive hold logic (AHL) circuit. The multiplier is based on the variable-latency technique and can adjust the AHL circuit to achieve reliable operation under the influence of NBTI and PBTI effects. To be specific, the contributions of this paper are summarized as follows:
1) Row bypassing multiplier architecture with an AHL circuit. The AHL circuit can decide whether the input patterns require one or two cycles to ensure that there is minimum performance degradation after considerable aging occurs.
2) Comprehensive analysis and comparison of the multiplier's performance under different cycle periods to show the effectiveness of our proposed architecture;
3) An aging-aware reliable multiplier design method that is suitable for large multipliers. Our proposed architecture can be easily extended to large designs.
4) The experimental results show that our proposed architecture with the precision multiplier.
The paper is organized as follows. Section II introduces the background of the column-bypassing multiplier, row-bypassing multiplier. Section III details the aging-aware variable-latency multiplier based on the row bypassing multiplier. The aging-aware based on precision multiplier is presented in Section IV. The experimental setup and results are presented in Section V. Section VI concludes this paper.
A. Column-Bypassing Multiplier:
A column-bypassing multiplier is an advanced multiplier of the normal array multiplier (AM). The AM is a fast parallel AM and is shown in Fig. 1. The multiplier array consists of (n-1) rows of carry save adder (CSA), in which each row contains (n -1) full adder (FA) cells. Each FA in the CSA array has two outputs: 1) the sum bit goes down and 2) the carry bit goes to the lower left FA. The last row is a ripple adder for carry propagation.
[FIGURE 1 OMITTED]
Carry Save Adder (CSA) is a type of digital adder used to compute the sum of three (or) more n-bit numbers in binary. It differs from other digital adders in that its outputs two numbers of the same dimensions as the inputs, one which is a sequence of partial sum bits and another which is a sequence of carry bits.
Fig. 2 shows a 4x4 column-bypassing multiplier. Supposing the inputs are 10012 * 11112, it can be seen that for the FAs in the second and third diagonals, two of the three input bits are 0. Therefore, the output of the adders in both diagonals is 0, and the output sum bit is simply equal to the fourth bit, which is the sum output of its upper FA.
Hence, the FA is modified to add two tristate gates and one multiplexer. The multiplicand bit ai can be used as the selector of the multiplexer to decide the output of the FA, and ai can also be used as the selector of the tristate gate to turn off the input path of the FA. If ai is 0, the inputs of FA are disabled, and the sum bit of the current FA is equal to the sum bit from its upper FA, thus reducing the power consumption of the multiplier. If ai is 1, the normal sum result is selected. More details for the column-bypassing multiplier can be found in .
[FIGURE 2 OMITTED]
B. Row-Bypassing Multiplier:
A low-power row-bypassing multiplier reduces the power of the AM. The operation of the row-bypassing multiplier is similar to that of the column-bypassing multiplier, but the selector of the multiplexers and the tristate gates use the multiplicator.
[FIGURE 3 OMITTED]
Fig. 3 is a 4 x 4 row-bypassing multiplier. Each input is connected to an FA through a tristate gate. When the inputs are 11112 * 10012, the two inputs in the first and third rows are 0 for FAs. Because b0 is 0, the multiplexers in the first row select aib0 as the sum bit and select 0 as the carry bit. The inputs are bypassed to FAs in the second rows, and the tristate gates turn off the input paths to the FAs. Therefore, no switching activities occur in the first-row FAs and therefore power consumption is reduced. Similarly, because b2is 0, no switching activities will occur in the second-row FAs. However, the FAs must be active in the third row because theb3 is not zero. More details for the row-bypassing multiplier can also be found in .
III. Aging-Aware Of Row Bypassing Multiplier:
This section details the aging-aware reliable multiplier design. It introduces the overall architecture and the functions of each component and also describes how to design AHL that adjusts the circuit when significant aging occurs.
Row Bypassing Multiplier Architecture:
Fig. 4 shows aging-aware design using row bypassing multiplier architecture, which includes two m-bit inputs (m is a positive number), one 2m-bit output, one column- or row-bypassing multiplier, 2m 1 -bit Razor flip-flops , and an AHL circuit. The inputs of the row-bypassing multiplier are the symbols in the parentheses. Razor flip-flops can be used to detect whether the timing violations occur before the next input pattern arrives.
Fig 5 shows the detail of Razor Flip- flops. The main idea of razor is to tune the supply voltage by monitoring the error rate during operation. One of the approaches of "Dynamic Voltage Scaling" (DVS) is based on dynamic detection and correction of the speed path failures in digital designs. A shadow latch is controlled by a delayed clock, augments each flip- flop in a design.
In a given clock cycle, if the combinational logic, the multiplier meets the setup time for the main flip- flop for the clock's rising edge, then both the main flip- flop and the shadow latch will latch the correct data. In this case, the error signal at the xor gate's output remains low and leaving the pipeline's operation. If the multiplier doesn't complete its computation in time, the main flip- flop will latch an incorrect value which the shadow latch will latch the late-arriving correct value. The error signal would then go high, prompting restoration of the correct value from the shadow latch into the main flip- flop and the correct value become available at the output stage.
If an error occurs in the multiplier during particular clock cycle, the data at the output stage during the following clock cycle is incorrect and must be flushed from the pipeline. However because the shadow latch contains the correct output data from the multiplier, the instruction needn't re-execute through this failing edge stage. Thus a key feature of razor is
[FIGURE 4 OMITTED]
[FIGURE 5 OMITTED]
that if an instruction fails in a particular pipeline stage, it re-executes through the following pipeline stage while incurring a one-cycle penalty. If an error exceeds the threshold voltage, it enters into an aging indicator in AHL. For more details of razor flip- flops, refer . The AHL circuit is the key component in the aging-ware variable-latency multiplier. Fig. 6 shows the details of the AHL circuit.
[FIGURE 6 OMITTED]
The AHL circuit contains an aging indicator, two judging blocks, one mux, and one D flip-flop. The aging indicator indicates whether the circuit has suffered significant performance degradation due to the aging effect. The aging indicator is implemented in a simple counter that counts the number of errors over a certain amount of operations and is reset to zero at the end of those operations. If the cycle period is too short, the column- or row-bypassing multiplier is not able to complete these operations successfully, causing timing violations. These timing violations will be caught by the Razor flip-flops, which generate error signals. If errors happen frequently and exceed a predefined threshold, it means the circuit has suffered significant timing degradation due to the aging effect, and the aging indicator will output signal 1; otherwise, it will output 0 to indicate the aging effect is still not significant, and no actions are needed.
The first judging block in the AHL circuit will output 1 if the number of zeros in the multiplicand (multiplicator for the row-bypassing multiplier) is larger than n (n is a positive number, which will be discussed in Section IV), and the second judging block in the AHL circuit will output 1 if the number of zeros in the multiplicand (multiplicator) is larger than n + 1. They are both employed to decide whether an input pattern requires one or two cycles, but only one of them will be chosen at a time. In the beginning, the aging effect is not significant, and the aging indicator produces 0, so the first judging block is used. After a period of time when the aging effect becomes significant, the second judging block is chosen. Compared with the first judging block, the second judging block allows a smaller number of patterns to become one-cycle patterns because it requires more zeros in the multiplicand (multiplicator) input flip flops will latch new data in the next cycle. On the other hand, when the output of the multiplexer is 0, which means the input pattern requires two cycles to complete, the OR gate will output 0 to the D flip-flop. Therefore, the! (Gating) signal will be 0 to disable the clock signal of the input flip-flops in the next cycle. Note that only a cycle of the input flip-flop will be disabled because the D flip-flop will latch 1 in the next cycle. For more details, refer .
The overall flow of our proposed architecture is as follows: when input patterns arrive, the column- or row-bypassing multiplier, and the AHL circuit execute simultaneously. According to the number of zeros in the multiplicand (multiplicator), the AHL circuit decides if the input patterns require one or two cycles. If the input pattern requires two cycles to complete, the AHL will output 0 to disable the clock signal of the flip-flops. Otherwise, the AHL will output 1 for normal operations. When the column- or row-bypassing multiplier finishes the operation, the result will be passed to the Razor flip-flops. The Razor flip-flops check whether there is the path delay timing violation. If timing violations occur, it means the cycle period is not long enough for the current operation to complete and that the execution result of the multiplier is incorrect. Thus, the Razor flipflops will output an error to inform the system that the current operation needs to be reexecuted using two cycles to ensure the operation is correct. In this situation, the extra reexecution cycles caused by timing violation incurs a penalty to overall average latency. However, our proposed AHL circuit can accurately predict whether the input patterns require one or two cycles in most cases. Only a few input patterns may cause a timing variation when the AHL circuit judges incorrectly. In this case, the extra reexecution cycles did not produce significant timing degradation.
In summary, our proposed multiplier design has three key features. First, it is a variable-latency design that minimizes the timing waste of the noncritical paths. Second, it can provide reliable operations even after the aging effect occurs. The Razor flip-flops detect the timing violations and reexecute the operations using two cycles. Finally, our architecture can adjust the percentage of one-cycle patterns to minimize performance degradation due to the aging effect. When the circuit is aged, and many errors occur, the AHL circuit uses the second judging block to decide if an input is one cycle or two cycles.
IV. Proposed Method:
Fig 7. shows the proposed method of aging-aware design. The multiplicand and multiplicator are given as an input to both the precision multiplier and the shadow copy of the precision multiplier. The operations are done in both multiplier and the results are compared in the comparator. If the results are same, then the result will given as a product otherwise the errored outputs are given as an input and the process will continue till the correct product will produce.
[FIGURE 7 OMITTED]
V. Experimental Result:
Our experiments are conducted in Windows 8 operating system. The proposed multiplier is designed in Verilog of Xilinx software. Here the inputs are enter into row/column bypassing multiplier and then to razor flip- flop. If the process is correct, the output is produced at the product side; if the process is wrong, the error enter into AHL. Again the process continues and the necessary output produced at the product side.
1) Simulation for the row bypassing multiplier architecture:
[FIGURE 8 OMITTED]
2) Simulation for Precision multiplier:
[FIGURE 9 OMITTED]
3) Usage of Components in the architecture:
Table 1: Number of components used in Aging-Aware Design Logic Row Bypassing Multiplier Utilization Used Available Utilization Number of Slice 29 244S 1% LUTs Number of Slice Flip 17 4S96 0% Flop Number of Bonded 20 158 12% IOB's Logic Precision Multiplier Utilization Used Available Utilization Number of Slice 25 63400 0% LUTs Number of Slice Flip 0 25 0% Flop Number of Bonded 31 210 14% IOB's
The multiplier is able to adjust the AHL to mitigate performance degradation due to increased delay. The experimental results show that our existing architecture with 4x4 multiplications with CLA as last stage instead of Normal RCA adder it will decrease the delay and improve the performance compared with previous designs. Electro migration occurs when the current density is high enough to cause the drift of metal ions along the direction of electron flow. The metal atoms will be gradually displaced after a period of time, and the geometry of the wires will change. If a wire becomes narrower, the resistance and delay of the wire will be increased, and in the end, electro migration may lead to open circuits. This issue is also more serious in advanced process technology because metal wires are narrower, and changes in the wire width will cause larger resistance differences. If the aging effects caused by the BTI effect and electro migration are considered together, the delay and performance degradation will be more significant. Our proposed method consists of Aging- Aware design with Precision Multiplier. When compared to row bypassing multiplier, the precision multiplier give better performance such as reduction of area, power. It also increases the speed.
[1.] Chinababu Vanama, M.Sumalatha, 2013. "Implementation of High Speed Modified Booth Multiplier and Accumulator (Mac) Unit", in proc. IOSR Journal of Electronics and Communication Engineering (IOSRJECE),
[2.] Vinod Kumar Jigalur, S.P. Meharunnis, 2015. "Efficient reversible multiplier using column bypass technique for dsp applications", in proc. International Journal of Engineering Research and General Science.
[3.] Nithya, J., G. Sathiyabama, K. Revathi, 2015. "Comparative Study of Low Power Low Area Bypass Multipliers for Signal Processing Applications", in proc. International Journal of Engineering Research and Applications.
[4.] Wen, M.-C., S.-J. Wang and Y.-N. Lin, 2005. "Low power parallel multiplier with column bypassing," in Proc. IEEE ISCAS.
[5.] Ohban, J., V.G. Moshnyaga and K. Inoue, 2002. "Multiplier energy reduction through bypassing of partial products," in Proc. APCCAS.
[6.] Yu-Shih Su, Da-Chung Wang, Shih-Chieh Chang and Malgorzata Marek-Sadowska, Fellow, 2010. "Performance Optimization Using Variable-Latency Design Style", in Proc. IEEE Transaction in Very Large Scale Integration System.
[7.] Kai Du, Peter Varman, Kartik Mohanram, 2008. "High Performance Reliable Variable Latency Carry Select Addition", in Proc. IEEE International Journal of Electronics and Communication Engineering.
[8.] Mehmet Basoglu, Michael Orshansky, Mattan Erez, 2010. "NBTI-Aware DVFS: A New Approach to Saving Energy and Increasing Processor Lifetime", in proc. ISLPED.
[9.] Ernst, D. et al., 2003. "Razor: A low-power pipeline based on circuit-level timing speculation," in Proc. IEEE/ACM MICRO.
[10.] CH.D. Vishnu Priya, C. Srijana Devi, 2014. "Design and Implementation of Aging-Aware Reliable Multiplier by Using Carry Look-Ahead Adder", in proc. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering.
(1) Abinaya. L, (2) Kalaichelvi. K, (3) Raja.S
(1) PG Scholar, Dept of ECE, VSB Engineering College, Karur
(2,3) Assistant Professor, Dept of ECE, VSB Engineering College, Karur
Received 25 January 2016; Accepted 18 April 2016; Available 28 April 2016
Address For Correspondence:
Abinaya. L, PG Scholar, Dept of ECE, VSB Engineering College, E-mail: email@example.com
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||adaptive hold logic|
|Author:||Abinaya, L.; Kalaichelvi, K; Raja, S.|
|Publication:||Advances in Natural and Applied Sciences|
|Date:||Apr 1, 2016|
|Previous Article:||A novel approach for watermarking in JPEG 2000 images using RC4-2S encryption.|
|Next Article:||An survey on observation response to primary therapy in carcinoma victimization using dynamic contrast-enhanced magnetic resonance imaging.|