# A modified implementation of tristate inverter based static master-slave flip-flop with improved power-delay-area product.

1. IntroductionFlip-flops are the key elements used in sequential digital systems. The appropriate selection of flip-flop topologies is instrumental in the design of VLSI integrated circuits such as microprocessors, microcontrollers, and other high complexity chips. However, factors such as high performance, low power, transistor count, clock load, design robustness, power-delay, and power-area tradeoffs are generally considered before choosing a particular flip-flop design. The highest operating frequency of clocked digital systems is determined by the flip-flops. Flip-flops and clock distribution network generally account for 30-70% of the total chip power consumption [1, 2]. Clock load is another major concern for digital system designers and several contributions have been reported in the past to reduce clock load and the associated power dissipation in the clocking network [3-5]. A design with elevated transistor count occupies a larger area on chip and leads to an increase in the overall manufacturing cost. Hence, design and implementation of low power high performance flip-flops with the least possible chip area is the main target of the modern chip manufacturing industry.

Flip-flops are broadly classified into three main categories, namely, master-slave [6-11], pulse triggered [12-17], and differential flip-flops [18-21]. Among them, master-slave and pulse-triggered flip-flops are the most efficient in terms of power-delay product. Master-slave flip-flops exhibit positive (negative) set-up time (hold time) requirements and hence not suitable for high speed systems due to extended data to output delays. But they are power efficient and can be used in low power applications. However, their main limitation is less robustness to clock skew. Pulse-triggered flip-flops have negative set-up time and thus lead to smaller data to output delay. They exhibit inherent soft clock edge property which minimizes clock skew related cycle time loss.

A classification of master-slave flip-flops is further elaborated in Figure 1. Clock-gated topologies exhibit internal clock gating to suppress the power consumption at lower data switching activities based on a clock gating logic and a comparator circuit. However, clock gated flip-flops have extended latency due to enhanced clock to output delays along with increased chip area overhead. Clock gated structures generally consume lesser power at low switching activities [22]. TGFF represents the best choice in the nonclock gated flip-flop category in terms of power-delay product [6], whereas existence of NMOS transistors in the critical path along with partially nongated keepers leads to less significant power-delay tradeoff characteristics in case of write port master-slave flip-flop (WPMS) [7, 8] and pass transistor logic based flip-flop (PTLFF) [9].

In this paper, we introduce an alternative design approach for designing [C.sup.2]MOS based master-slave flip-flop, based on a new architecture with reduced transistor count and improved power-delay-area product. The proposed configurations m[C.sup.2]MOSff1 and m[C.sup.2]MOSff2 fall under the nonclock gated flip-flop category as shown in Figure 1.

The rest of the paper is organized as follows. Section 2 compares the conventional master-slave flip-flop configurations with proposed designs. Section 3 highlights the simulation parameters and test bench along with techniques used for transistor sizing and methodology adopted for optimization of timing and power-delay product. Section 4 describes the simulation results. Section 5 concludes the paper. An appendix is added to show calibration of parameters for delay calculations using LE theory and to outline the strategy followed for designing the eight-bit ripple counter.

2. Overview of Previous Work and Proposed Designs

Figure 2 shows the conventional master-slave flip-flop architecture, whereby two regenerative loops (L1 and L2) are present in the master and slave sections to account for a static functionality. Both loops operate independently of each other on complementary clocksignals. Regenerative loops are composed of cross coupled inverters. It can be observed from Figure 2 that for each loop, regenerative action is achieved through one inversion in the forward (critical) path while the other (clocked) inversion takes place in the feedback path. Moreover, there is no common component between both loops.

Since an inverter followed by transmission gate is equivalent to a clocked inverter, the combination is replaced by a clocked inverter to form a [C.sup.2]MOS based flip-flop architecture as shown in Figure 3 [23]. Two regenerative loops L3 and L4 are used in a similar manner as in the previous case to maintain the static nature of the flip-flop.

However, in the proposed architecture as reported in Figure 4(a), both inversions take place in the forward (critical) path and the loop is completed by a clocked switch for loop L6 while loop L5 is completed by using an inverter in the feedback path. It is clearly noticed from Figure 4(a) that the output node is always driven and never floating thus ensuring a static flip-flop operation. The size of transistors in the feedback path marked by asterisks (*) is kept at 360 nm (minimum technology width) to eliminate race conditions at nodes U and V. Yet another implementation is shown in Figure 4(b) which uses inverter INVX in the critical path and a clocked switch to form a regenerative loop L7. It is to be noted that INVX is common to both the regenerative loops L7 and L8 which is contrary to the realization of previous architectures.

Figure 5 represents the actual circuit design based on the proposed architectures in Figure 4, while TGFF is implemented using transmission gates as switches in the conventional architecture as demonstrated in Figure 6.

It can be clearly observed that m[C.sup.2]MOSff1 and m[C.sup.2]MOSff2 both are realized using sixteen transistors each. As a result, the area occupied by the proposed designs is significantly lesser than the conventional designs. Moreover, the number of clocked transistors in m[C.sup.2]MOSff1 is six as compared to eight in case of TGFF or conventional clocked inverter based flip-flop [C.sup.2]MOSff [23].

To illustrate the superior performance of the proposed flip-flop configurations, other flip-flop topologies, namely, TGFF, WPMS, PTLFF, gated master-slave latch (GMSL) [10], and data transition look ahead flip-flop (DTLA) [11] belonging to the master-slave class have been used for comparisons. Out of the above mentioned topologies GMSL, and DTLA represent flip-flops with internal clock gating. Schematic diagrams of WPMS, PTLFF, GMSL and DTLA are shown in Figures 7, 8, 9, and 10, respectively.

3. Simulation Parameters, Test Bench, and Optimization Methodology

Table 1 lists the CMOS parameters used for creating the simulation environment. The flip-flops were designed to layout level in 180 nm/1.8 V CMOS process at 250 MHz clock frequency. The width of transistors in the feedback structures was invariably fixed at the minimum value 360 nm while the slope of the data and clock signals was kept at 100 ps. Performances of the various flip-flop configurations are evaluated through SPICE simulation of the circuits extracted from the layout with the inclusion of parasitics.

Figure 11 shows the simulation test bench for characterization and comparison of the FF designs [3]. The clock and data signals are fed to the flip-flop through a two stage buffer. Data-to-output delay ([T.sub.DQ,min]) is used for performance comparisons. Logical effort theory is extensively used for designing fast CMOS circuits based on pencil and paper calculations and is widely adopted in the literature [24]. Hence, the delay sensitivity factor introduced by Alioto et al. [25] based on logical effort theory has been used for performance optimization.

A 16-cycle long pseudorandom sequence with a switching factor [alpha] = 0.5 is supplied at the data input for measurement of average power [26]. Since the delay and power characterization are strongly dependent on the capacitive load offered to FFs [27], varying capacitive loads {4,16, 64} [C.sub.min], where [C.sub.min] is the input capacitance of a symmetrical minimum inverter ([W.sub.p] = 2[W.sub.n] = 2[W.sub.min]), have been used to test the FF behaviour. Transistor sizing methodology adopted is the same as that in [28,29], whereas power-delay product (PDP) and power-delay-area product (PDAP) are the chosen figures of merit (FOM).

The expression relating the absolute gate capacitance ([C.sub.GATE]) in terms of fF (femtofarads) and absolute transistor width (W) in terms of nanometers (nm) obtained at 180 nm process node by fitting simulation data [30] is given as

[C.sub.GATE] = (1.15 x [10.sup.-3]) x W. (1)

LE method states that the optimized delay D of a path of N cascaded stages is

D = N [Nth root of GBH] + P, (2)

D = N [Nth root of F] + P, (3)

where G, B, H (= [C.sub.L]/[C.sub.in]) are the logical effort, branching effort, and electrical effort while P, F (= GBH) and [C.sub.L] are parasitic delay, path effort, and final load capacitance, respectively. One has the following:

D = P(1 + t). (4)

From (2) and 4),

t = N [Nth root of GB][Nth root of [C.sub.L]]/P[Nth root of [C.sub.in]], (5)

where t represents the relative delay increment with respect to parasitic delay. Equations (4) and (5) indicate that larger values of [C.sub.in] lead to a saturation in the optimized delay and based on the above analysis, the delay sensitivity factor introduced by Alioto et al. [25] is utilized to obtain the upper bound on the transistor widths for exploration of the power-delay design space with least computational effort. Consider the following:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]., (6)

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the delay sensitivity factor and is obtained from (3) to (5). The upper bounds on the normalized transistor widths [w.sub.i] (normalized with respect to [W.sub.min]) have been obtained such that the delay sensitivity remains under a minimum value [S.sub.min] which is chosen as -5% for our analysis. The input capacitance [C.sub.in] of the flip-flop is expressed in terms of normalized width w1 as follows:

[C.sub.in] = (w1 x 360 + 2 x w1 x 360) (1.15 x [10.sup.-3]). (7)

Figure 12 shows the conventional TGFF design. The sizing is done by assuming the transistors in the critical path to be independent design variables (IDVs) and optimizing for maximum performance using LE theory. The inverter before transmission gate in the first stage protects the input terminal from noise variations [31]. Table 2 exhibits delay variation for increasing [C.sub.in] values. It is noteworthy that the delay saturates at 153 ps for [C.sub.in] = 24.8 fF. As a result, the upper bounds on transistor widths are exposed and the limits of power (energy)-delay design space are defined early in the design cycle [32]. The table also includes the corresponding power dissipation along with the power-delay product and it is observed that minimum power-delay product is obtained at [C.sub.in] = 9.92 fF. The technology parameters used for capacitance calculations throughout this paper are listed in Table 3.

4. Results and Discussion

It is a well-established fact that the conventional [C.sup.2]MOS although slower, is skew tolerant and occupies lesser area than TGFF [23, 33]. Moreover, m[C.sup.2]MOSff1 and m[C.sup.2]MOSff2 show nearly identical characteristics in terms of power, delay, and area and hence only m[C.sup.2]MOSff1 is considered for comparisons.

The waveforms in Figure 13 represent the transient analysis of m[C.sup.2]MOSFF1 carried out over a period of 8 clock cycles. The SPICE simulation results verify the correct flip-flop operation at 1 GHz clock frequency (all the flip-flops reported in the paper are designed for negative edge triggered operation). The variation of absolute data-to-output delays [T.sub.DQ,min] with FF input capacitance ([C.sub.in]) for 16X (19.92 fF) capacitive load is illustrated in Figure 14.

TGFF utilizes transmission gates in the critical path and hence it is faster than the rival designs. There is exactly the same number of stages in the critical path of TGFF and m[C.sup.2]MOSff1, the only difference being that the latching circuit in case of TGFF is an inverter followed by a clocked transmission gate (inverting latch), whereas a clocked/tristate inverter is present in mC2 MOSff1. Logical effort of both the latches is considered to be two; however, it is apparent that an inverter followed by a transmission gate is faster because the output node is driven by both the transistors of the transmission gate in parallel and this behaviour is reflected in Figure 14. From the above discussion, it is obvious that the value of logical effort for an inverting latch can be assumed to be two for most theoretical purposes, but for comparison with a [C.sup.2]MOS latch, it must be slightly less than two if delays are to be modelled precisely.

Equation (2) clearly indicates that lesser branching effort leads to a faster circuit operation. The branching effort for a path with internal fan-out is expressed as [24]

b = [C.sub.on-path] + [C.sub.off-path]/[C.sub.on-path], (8)

where [C.sub.on-path] represents the load capacitance along the path under analysis and [C.sub.off-path] represents the capacitance of the connections that lead off the path.

The branching effort along the critical path is given as

B = [PI][b.sub.i]. (9)

There are two branches each in TGFF and m[C.sup.2]MOSff1 represented as b1, b2 and b3, b4 in Figures 6 and 5(a), respectively. The branching effort corresponding to branches b1, b2, b3, and b4 is calculated as follows.

4.1. Branching Effort in Case of TGFF. One has the following.

b1 Calculation:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (10)

b2 Calculation:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (11)

4.2. Branching Effort in Case of m[C.sup.2]MOSff1. One has the following.

b3 Calculation:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (12)

b4 Calculation:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (13)

where [C.sub.gd] is gate to drain capacitance, [C.sub.db] is drain to body capacitance, and [C.sub.g] is the gate capacitance of respective transistors.

Accordingly, using (2) and putting G = 4, B = 1.23, H = 19.92/12.4 = 1.60, N = 4, and P = 6, we have D = 12.7 (absolute delay 165.1 ps) for TGFF, whereas putting G = 4, B = 1.30, H = 19.92/12.4 = 1.60, N = 4, and P = 6, we have D = 12.79 (absolute delay 166.27 ps) for m[C.sup.2]MOSff1. Absolute delays Dabs are obtained by multiplying parameter D with parameter r as follows:

[D.sub.abs] = D[tau]. (14)

It is clearly observed that the delay of m[C.sup.2]MOSff1 is marginally higher than the delay of TGFF. Now, keeping other parameters to be the same and assuming the logical effort of inverting latch to be 1.8, the updated value of TGFF is evaluated as D = 12.35 (absolute delay 160.55 ps).

The value of process dependent parameter [tau] is determined as approximately 13 ps using the calibration technique as mentioned by Sutherland et al. [24]. The detailed procedure is discussed in the Appendix. The absolute delay measurements obtained through simulation are 162 ps for TGFF and 196 ps for m[C.sup.2]MOSff1 which is in close agreement with the theoretical values 160.55 ps and 166.27 ps, respectively (typically within 15% error).

WPMS and PTLFF topologies show degraded performance due to the presence of pass transistors in the critical path while the speed of clock-gated structures is worst mainly because gating circuit is inserted between the clock and the flip-flop terminals which deteriorates the timing characteristics. The characterizations are done assuming that [C.sub.in] = 12.4 fF and [C.sub.L] = 19.92 fF (16X) where [C.sub.L] represents the flip-flop load capacitance.

The variation of average power with [C.sub.in] for 16X loading condition is depicted in Figure 15. Due to threshold voltage drop at internal nodes, WPMS and PTLFF display worst power dissipation characteristics because of short circuit power dissipation. GMSL and DTLA exhibit greater power dissipation than nongated counterparts because pseudorandom sequence has an activity factor of 0.5. The reason being the presence of additional comparator and clock gating circuit which is beneficial only at sufficiently low switching activities or otherwise leads to both increased area and power overhead.

4.3. Clock Load Calculations. One has the following.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (15)

{Transistors contributing towards clock load in the critical path} + {Transistors contributing towards clock load in the feedback structure}

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (16)

{Transistors contributing towards clock load in the critical path} + {Transistors contributing towards clock load in the feedback structure}

= 22.18 fF + 0.84 fF = 23.02 fF.

Apart from the clock load, the capacitance value at internal nodes of m[C.sup.2]MOSff1 is reduced as compared to TGFF by eliminating transistors TN6 and TP6 from the feedback structure.

4.4. Capacitance Calculations at Internal Nodes of TGFF

Internal Capacitance at Nodes P and K

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Internal Capacitance at Nodes M and N

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

4.5. Capacitance Calculations at Internal Nodes of m[C.sup.2]MOSff1

Internal Capacitance at Nodes P' and K

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Internal Capacitance at Node M'

Node M': [C.sub.g](TN15) + [C.sub.g] (TP15) = 12.35 fF.

It can be easily concluded from calculations above that a total of 19.34 fF capacitance has been reduced from the internal nodes in the critical path of m[C.sup.2]MOSff1 in comparison to TGFF. This leads to reduced internal power dissipation at these nodes as lesser capacitance has to be charged or discharged per clock cycle. However, reduction in the clock load of m[C.sup.2]MOSff1 due to transistors eliminated from the feedback structure is nullified due to PMOS transistors TP10 and TP11 whose size is twice that of transistors TP1 and TP5 in case of TGFF and as a result the total power dissipation of both the flip-flops is nearly the same as it can be clearly observed from Figure 16. Following a similar procedure, the clock load of various flip-flops is obtained and listed in Table 4 along with number of clocked transistors and power consumption values. It is seen that TGFF and m[C.sup.2]MOSff1 represent the most efficient designs in terms of reduced power consumption having power dissipation comparable to DTLA at [C.sub.in] = 12.4 fF and [C.sub.L] = 19.92 fF.

It can be observed that m[C.sup.2]MOSff1 has the least transistor count along with PTLFF while GMSL and DTLA consist of maximum number of transistors. Since only sixteen transistors are used for circuit realization of m[C.sup.2]MOSff1, power dissipation is comparable to TGFF. It is worth noting that GMSL and DTLA offer minimum clock load, as a result, these topologies exhibit least power dissipation at lower switching activities. The reason for extended clock-to-output delays of GMSL and DTLA is the insertion of clock gating circuitry while DTLA has a pulsed operation and hence shows negative set-up time requirements. Based on the power and delay measurements, power-delay product characteristics are derived for all the flip-flops as shown in Figure 16. The optimum power-delay product of gated structures GMSL and DTLA is, respectively, 3.30x and 3.34x times greater than optimum PDP of TGFF. Among the nonclock gated structures, pass transistors based designs WPMS and PTLFF exhibit 1.77x and 1.57x enhancement in the power-delay product with respect to the benchmark flip-flop TGFF. TGFF also shows 20% improvement over m[C.sup.2]MOSff1 in terms of minimum powerdelay product. However, despite the fact that TGFF represents a better alternative in terms of performance and optimum power-delay product, the area requirements also remain a major concern. It has been observed in the literature that conventional [C.sup.2]MOS based flip-flop is up to 20-25% more efficient in terms of occupied chip area. This stems mainly from the fact that at layout level (i) in comparison to TGFF, diffusion areas of most of the transistors can be shared in [C.sup.2]MOS flip-flop [33], (ii) the number of contact holes can be reduced in the layout pattern [23], and (iii) less complicated feedback structure leads to fewer interconnections.

The layouts were implemented using [C.sub.in] = 12.4 fF, indicating almost similar transistor sizes throughout the critical path with the exception of TP10 and TP11 belonging to m[C.sup.2]MOSff1 which are twice in size compared to TP1 and TP5 in accordance with the LE theory. The layouts for TGFF and m[C.sup.2]MOSff1 are shown in Figures 17 and 18, respectively. Table 5 clearly shows that while TGFF is better in terms of PDP by 18.4%, m[C.sup.2]MOSff1 shows a 12.4% improvement in the PDAP making it suitable for high density applications where performance can be compromised.

The power dissipation results as illustrated in Figure 19 are obtained using [C.sub.in] = 12.4 fF which ensures that all the transistors in the critical path have similar widths. At zero switching activity, clock-gated topologies are the most power efficient. GMSL and DTLA show GMSL 32.5% and 46.3% reduction in power in case of logic high at the input, whereas for logic low, the power consumption is reduced by 19.2% and 35.4%, respectively. Again, it can be clearly observed that there is only a slight difference in the power dissipation of TGFF and m[C.sup.2]MOSff1 at different switching activities.

The correct functionality of the proposed flip-flop m[C.sup.2]MOSff1 is validated by designing an 8-bit ripple counter at 16X capacitive load and the average power measurements were carried out over 256 clock cycles. It was noticed that the power consumption of the m[C.sup.2]MOSff1 based counter is comparable to the TGFF at varying frequencies. Again, LE theory has been adopted for sizing individual flip-flops in each counter for optimum performance which is expressed in detail in the Appendix.

The flip-flops were also designed and simulated to layout level with

inclusion of parasitics at 130 nm, 90 nm, and 65 nm CMOS processes to address scalability issues at more advanced process nodes. The simulation test bench and optimization methodology are similar as mentioned in Section 3. PVT variations are emphasized to evaluate the performance of flip-flops at all process corners, namely, FF, SS, FS, and SF with voltages scaled from 0.9 to 1.1 V while the temperatures varied from 0 to 125 degrees as shown in Table 6. The simulation and technology parameters are also listed in Table 6 where [C.sub.G] represents the capacitance per unit gate oxide and was evaluated to be 1.3fF/um by fitting simulation data. In addition, the capacitances per unit length of poly, metal 1 and metal 2 interconnects are also mentioned.

For illustration purposes, the delay and power variations with the flip-flop input capacitance with respect to different process corners at 65 nm CMOS technology for m[C.sup.2]MOSff1 are demonstrated in Figures 20 and 21, respectively, at 16X capacitive loading. Both m[C.sup.2]MOSff1 and m[C.sup.2]MOSff2 showed correct circuital behaviour at the aforementioned process nodes which indicates that no internal noise violations exist especially due to the fact that logic levels are retained even at FF process corner. However, it is to be pointed out that m[C.sup.2]MOSff1 in a manner similar to TGFF starts to fail at SS corner for lower values of [C.sub.in] [34].

5. Conclusion

In this paper, an alternative architecture for designing [C.sup.2]MOS based flip-flops is presented with a modified feedback strategy while preserving the fully static operation. Using the new feedback approach, a modified topology m[C.sup.2]MOSff1 is proposed with decreased parasitic capacitances at internal nodes in comparison to the TGFF which is the finest design in terms of PDP. However, postlayout simulations and analyses indicate that the modified configuration m[C.sup.2]MOSff1 presents the best alternative in terms of PDAP among all the conventional designs. Therefore, for high performance applications, TGFF still remains the best choice but it can be replaced by m[C.sup.2]MOSff1 for high density applications. Comparisons were carried out with state-of-the-art flip-flops in the master-slave class. The simulation results are well supported with mathematical analysis based on logical effort theory within acceptable error (typically less than 15%).

Appendices

A. Delay Calibration Using LE Theory

For modelling delays using LE theory initially, all the delays are expressed in terms of a basic delay unit [tau] which is process dependent such that the absolute delay is represented as the product of a unit less delay of the gate as shown in (2), and the delay unit [tau]. Accordingly,

[D.sub.abs] = D[tau]. (A.1)

While D represents the delay for a multistage path, d corresponds to the delay of a single stage logic gate. Parameter t needs to be estimated in order to obtain absolute delays and accordingly a delay versus fanout curve is determined for an inverter as shown in Figure 22 by fitting simulation data. The curve is approximated as a straight line and the slope of the line represents [tau] since d = (gh + p)[tau] and logical effort of an inverter is 1. In our case, [tau] is estimated as 13 ps.

B. Implementation of 8-Bit Ripple Counter

An 8-bit asynchronous counter was implemented by converting the D flip-flop configuration to a T flip-flop configuration using an EXOR gate as illustrated in Figure 23.

The T flip-flop designed using TGFF is shown in Figure 24. It is considered to be a five stage design and optimized for highest speed using LE theory. The EXOR gate was realized using transmission gates as revealed in Stage 1 of Figure 24. A similar procedure was followed for designing m[C.sup.2]MOSff1 based T flip-flop.

For designing the modulo 256 counter, the output Q of each stage is connected to the clock terminal of the next stage through two intermediate inverters (acting as a buffer) sized ([W.sub.p] = 11.52 u, [W.sub.n] = 5.76 u) such that the input capacitance of the first inverter acts as the load capacitance for the flip-flop configuration of the previous stage as depicted in Figure 25. As a result, the load at the output terminal of each flip-flop is uniformly fixed at 19.92 fF.

http://dx.doi.org/10.1155/2014/453675

Conflict of Interests

The authors declare that there is no conflict of interests

regarding the publication of this paper.

References

[1] H. Kawaguchi and T. Sakurai, "A reduced clock-swing flip-flop (RCSFF) for 63% power reduction," IEEE Journal of Solid-State Circuits, vol. 33, no. 5, pp. 807-811, 1998.

[2] G. Yeap, Practical Low Power Digital VLSI Design, Kluwer Academic, 1998.

[3] V. Oklobdzija, V Stojanovic, D. Markovic, and N. Nedovic, Digital System Clocking: High-Performance and Low-Power Aspects, Wiley-IEEE Press, 2003.

[4] B. Mesgarzadeh, M. Hansson, and A. Alvandpour, "Jitter characteristic in charge recovery resonant clock distribution," IEEE Journal of Solid-State Circuits, vol. 42, no. 7, pp. 1618-1625, 2007

[5] C. Giacomotto, N. Nedovic, and V. G. Oklobdzija, "The effect of the system specification on the optimal selection of clocked storage elements," IEEE Journal of Solid-State Circuits, vol. 42, no. 6, pp. 1392-1404, 2007.

[6] G. Gerosa, S. Gary, C. Dietz et al., "2.2 W, 80 MHz superscalar RISC microprocessor," IEEE Journal of Solid-State Circuits, vol. 29, no. 12, pp. 1440-1454, 1994.

[7] D. Markovic, J. Tschanz, and V. De, "Transmission-gate based flip-flop," US Patent 6642765, 2003.

[8] S. K. Hsu, S. K. Mathew, M. A. Anders et al., "A 110 GOPS/W 16-bit multiplier and reconfigurable PLA loop in 90-nm CMOS," IEEE Journal of Solid-State Circuits, vol. 41, no. 1, pp. 256-264, 2006.

[9] R. Hossain, L. D. Wronski, and A. Albicki, "Low power design using double edge triggered flip-flops," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 2, no. 2, pp. 261-264, 1994.

[10] A. G. M. Strollo, E. Napoli, and D. de Caro, "Low-power flip-flops with reliable clock gating," Microelectronics Journal, vol. 32, no. 1, pp. 21-28, 2001.

[11] M. Nogawa and Y. Ohtomo, "A data-transition look-ahead DFF circuit for statistical reduction in power consumption," IEEE Journal of Solid-State Circuits, vol. 33, no. 5, pp. 702-706, 1998.

[12] F. Klass, C. Amir, A. Das et al., "A new family of semidynamic and dynamic flip-flops with embedded logic for high-performance processors," IEEE Journal of Solid-State Circuits, vol. 34, no. 5, pp. 712-716, 1999.

[13] P. Zhao, T. Darwish, and M. Bayoumi, "Low power and high speed explicit-pulsed flip-flops," in Proceedings of the 45th Midwest Symposium on Circuits and Systems, pp. II477-II480, August 2002.

[14] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, "Flow-through latch and edge-triggered flip-flop hybrid elements," in Proceedings of the IEEE International Solid-State Circuits Conference, pp. 138-139, February 1996.

[15] R. Heald, K. Aingaran, C. Amir et al., "Third-generation SPARC V9 64-b microprocessor," IEEE Journal of Solid-State Circuits, vol. 35, no. 11, pp. 1526-1538, 2000.

[16] N. Nedovic, M. Aleksic, and V. G. Oklobdzija, "Conditional techniques for low power consumption flip-flops," in Proceedings of the 8th IEEE International Conference on Electronics, Circuits and Systems (ICECS '01), pp. 803-806, September 2001.

[17] S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. J. Sullivan, and T. Grutkowski, "The implementation of the itanium 2 microprocessor," IEEE Journal of Solid-State Circuits, vol. 37, no. 11, pp. 1448-1460, 2002.

[18] B.-S. Kong, S.-S. Kim, and Y.-H. Jun, "Conditional-capture flip-flop for statistical power reduction," IEEE Journal of Solid-State Circuits, vol. 36, no. 8, pp. 1263-1271, 2001.

[19] S. Shin and B. Kong, "Variable sampling window flip-flops for low power high-speed VLSI," IEE Proceedings of Circuits, Devices and Systems, vol. 152, no. 3, pp. 266-271, 2005.

[20] B. Nikolic, V. G. Oklobdzija, V. Stojanovic, W. Jia, J. K.-S. Chiu, and M. M.-T. Leung, "Improved sense-amplifier-based flip-flop: design and measurements," IEEE Journal of Solid-State Circuits, vol. 35, no. 6, pp. 876-884, 2000.

[21] N. Nedovic, V.G. Oklobdzija, and W. W. Walker, "A clock skew absorbing flip-flop," in Proceedings of the IEEE International Solid-State Circuits Conference, vol. 1, pp. 342-344, February 2003.

[22] A. G. M. Strollo and D. de Caro, "Low power flip-flop with clock gating on master and slave latches," Electronics Letters, vol. 36, no. 4, pp. 294-295, 2000.

[23] Y. Suzuki, K. Odagawa, and T. Abe, "Clocked CMOS Calculator Circuitry," IEEE Journal of Solid-State Circuits, vol. SC-8, no. 6, pp. 462-469, 1973.

[24] I. Sutherland, B. Sproull, and D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan Kaufmann, Los Altos, Calif, USA, 1998.

[25] M. Alioto, E. Consoli, and G. Palumbo, "General strategies to design nanometer flip-flops in the energy-delay space," IEEE Transactions on Circuits and Systems I, vol. 57, no. 7, pp. 1583-1596, 2010.

[26] V. Stojanovic and V. G. Oklobdzija, "Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems," IEEE Journal of Solid-State Circuits, vol. 34, no. 4, pp. 536-548, 1999.

[27] S. Heo and K. Asanovic, "Load-sensitive flip-flop characterization," in Proceedings of the IEEE Computer Society Workshop on VLSI, pp. 87-92, 2001.

[28] M. Alioto, E. Consoli, and G. Palumbo, "Analysis and comparison in the energy-delay-area domain of nanometer CMOS Flip-Flops. Part I: methodology and design strategies," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 5, pp. 725-736, 2011.

[29] M. Alioto, E. Consoli, and G. Palumbo, "Analysis and comparison in the energy-delay-area domain of nanometer CMOS FlipFlops. Part II: results and figures of merit," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 5, pp. 737-750, 2011.

[30] G. Palumbo and M. Pennisi, "Design guidelines for high-speed transmission-gate latches: analysis and comparison," in Proceedings of the 15th IEEE International Conference on Electronics, Circuits and Systems (ICECS '08), pp. 145-148, September 2008.

[31] E. Consoli, G. Palumbo, and M. Pennisi, "Reconsidering high-speed design criteria for transmission-gate-based master-slave flip-flops," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 2, pp. 284-295, 2012.

[32] M. Alioto, E. Consoli, and G. Palumbo, "From energy-delay metrics to constraints on the design of digital circuits," International Journal of Circuit Theory and Applications, vol. 40, pp. 815-834, 2012.

[33] H. J. Chao and C. A. Johnston, "Behavior analysis of CMOS D flip-flops," IEEE Journal of Solid-State Circuits, vol. 24, no. 5, pp. 1454-1458, 1989.

[34] H. Q. Dao, K. Nowka, and V. G. Oklobdzija, "Analysis of clocked timing elements for dynamic voltage scaling effects over process parameter variation," in Proceedings of the International Symposium on Low Electronics and Design (ISLPED '01), pp. 56-59, August 2001.

Kunwar Singh, (1) Satish Chandra Tiwari, (2) and Maneesha Gupta (2)

(1) Department of Electrical Engineering, Delhi Technological University, Room No. FW1-SF1, EED, DTU, New Delhi 110042, India

(2) Division of ECE, Netaji Subhas Institute of Technology (NSIT), University of Delhi, Sector 3, Dwarka, New Delhi 110078, India

Correspondence should be addressed to Kunwar Singh; kunwarsingh@dce.ac.in

Received 28 August 2013; Accepted 13 October 2013; Published 27 February 2014

Academic Editors: L. Donetti, E. Tlelo-Cuautle, and F. Yuan

TABLE 1: Simulation parameters. Parameter Value [W.sub.mi] 360 nm [L.sub.mi] 140 nm [C.sub.mi] 1.24 fF [V.sub.DD] 1.8 V Frequency 250 MHz Signal slope 100 ps TABLE 2: Traditional transmission gate flip-flop at 19.92 fF load (16X). [C.sub.in] w1 w2 w3 w4 [T.sub. Power PDP (fF) DQ,min] (ps) (uW) (fJ) 2.48 2 2.35 2.79 6.65 226 554 125.2 4.96 4 3.95 3.95 7.91 191 585 111.7 7.44 6 5.35 4.84 8.76 173 599 103.6 9.92 8 6.65 5.59 9.41 166 615 102 12.4 10 7.86 6.25 9.95 162 632 102.3 14.8 12 9.01 6.85 10.4 159 648 103 17.3 14 10.1 7.40 10.8 157 665 104.4 19.8 16 11.1 7.91 11.2 155 675 104.6 22.3 18 12.2 8.39 11.5 154 682 105 24.8 20 13.2 8.84 11.8 153 689 105.4 TABLE 3: Technology parameters used for estimation of capacitances. Parameter [C.sub.gdo] [C.sub.gso] [C.sub.jsw] (F/m) (F/m) (F/m) NMOS 2.78E-10 2.78E-10 7.9E-10 PMOS 2.78E-10 2.78E-10 1.44E-9 Parameter [C.sub.j] [L.sub.D] [L.sub.S] (F/[m.sup.2]) (m) (m) NMOS 0.00365 31.6E-09 31.6E-09 PMOS 0.00138 31.6E-09 31.6E-09 TABLE 4: Comparison of flip-flop parameters at [C.sub.in] = 12.4 fF and 16X capacitive loading. Design TGFF m[C.sup.2]MOSff1 WPMS Transistor count 20 16 24 No. of clocked transistors 8 6 6 Clock-to-output delay (ps) 92 116 206 Optimum setup time (ps) 70 80 40 Hold time (ps) -19 -21 -33 [T.sub.DQ,min] (ps) 162 196 246 Clock load (fF) 16.44 23.02 9.05 Power dissipation (uW) * 632 640 786 Leakage Power (uW) 59.38 5751 72.64 Design PTLFF GMSL DTLA Transistor count 16 31 46 No. of clocked transistors 4 2 3 Clock-to-output delay (ps) 204 419 683 Optimum setup time (ps) 50 80 -140 Hold time (ps) -32 -23 25 [T.sub.DQ,min] (ps) 254 499 543 Clock load (fF) 8.22 776 7.31 Power dissipation (uW) * 679 676 643 Leakage Power (uW) 69.83 74.91 76.73 * Pseudorandom sequence with [alpha] = 0.5 is used for power calculations. TABLE 5: PDAP comparison of TGFF and m[C.sup.2]MOSff1. Design Transistor Transistor Delay Power count widths (um) (ps) (uW) TGFF 20 52.52 162 632 m[C.sup.2]MOSff1 16 58.95 196 640 Design Layout area PDP (fJ) PDAP (fJ x (u[m.sup.2]) u[m.sup.2]) TGFF 175 102.3 17902 m[C.sup.2]MOSff1 125 125.4 15675 TABLE 6: Flip-flop simulation parameters at 65 nm CMOS technology. Process Temperature [V.sub.DD] Simulation/technology corner ([degrees]C) parameters TT 70 1 [L.sub.min] [W.sub.min] FF 0 1.1 60 nm 120 nm SS 125 0.9 FS 70 1 [C.sub.Poly] = 0.268 SF 70 1 [C.sub.G] Process Simulation/technology parameters corner TT [C.sub.min] Frequency Signal slope FF 507 aF 2 GHz 20 ps SS FS [c.sub.metal1] = 0.215 [c.sub.metal2] SF [C.sub.G] = 0.175 [C.sub.G]

Printer friendly Cite/link Email Feedback | |

Title Annotation: | Research Article |
---|---|

Author: | Singh, Kunwar; Tiwari, Satish Chandra; Gupta, Maneesha |

Publication: | The Scientific World Journal |

Article Type: | Report |

Date: | Jan 1, 2014 |

Words: | 6173 |

Previous Article: | Evaluation of the cytotoxic effects of CAM therapies: an in vitro study in normal kidney cell lines. |

Next Article: | Chemical characterization of fruit wine made from Oblacinska sour cherry. |

Topics: |