Dynamic Logic Circuits: Combinational and
Sequential Design for Digital ICs
Navneet Kaur Varun Nehru Deep Sehgal
VLSI Design Division VLSI Design Division VLSI Design Division
Semiconductor Laboratory Semiconductor Laboratory Semiconductor Laboratory
Chandigarh, India Chandigarh, India Chandigarh, India
[email protected] [email protected] [email protected] Abstract— Dynamic CMOS logic circuits or Precharge- Due to this asymmetry in behaviour for rising and falling
Evaluate logic circuits have been widely used in high transitions of input signal, dynamic logic, directly can neither
performance digital processors. Their high speed capability be used as combinational gate nor as memory element.
has been exploited in both combinational gates as well as Dynamic logic is clocked and blocks falling transition of
sequencing elements. For combinational gates, use of such input in evaluation phase and hence can not be used as gate
clocked logic is neither straightforward nor supported by or latch. Further, it passes the rising transition even after the
standard EDA tools used by digital IC designers. Due to this, clock edge and hence does not directly comply with the
dynamic logic gates remained in their niche of custom design definition of flip-flop, which is expected to react only to
flow. In this paper, various Precharge-Evaluate circuit styles
clock edges. In spite of these, dynamic logic has been
for high speed combinational design are reviewed. A
methodology for using Dual Rail Domino logic with standard
successfully used in high performance gates and sequencing
EDA tools is discussed and implemented on a floating point elements as discussed in the subsequent sections.
divider block. The use of Precharge-Evaluate logic for high Section-II deals with the use of dynamic logic in
speed sequencing elements is also discussed. Different high combinational gates in various forms. A methodology for
speed flip-flop circuits have been compared. These flip-flops using dual rail domino logic [5] is implemented and its
are supported by EDA tools and are specifically designed for feasibility for use in standard digital ASIC flow is discussed.
use in standard cell library. The test-circuits for flip-flops are
Section-III briefs the circuit innovations used in building
simulated using device models of SCL 180nm CMOS Process.
short latency flip-flops like Hybrid Latch Flip-flop[6], Semi-
Keywords— Dynamic CMOS Logic, EDA Tools, Standard Dynamic Flip-flop[7], Sense-Amplifier Flip-flop[8] using
cell, Design Synthesis, Technology mapping, Differential, PDP dynamic logic. The use of these flip-flops in standard cell
library is discussed and their performance is compared with
conventional Transmission Gate based flip-flop using device
I. INTRODUCTION models of Semi-Conductor Laboratory 180nm CMOS
Dynamic logic [1][2], also known as clocked logic or Process. Sense Amplifier Flip-Flop is designed for use in
Precharge-Evaluate logic has played significant role in high standard cell library. Finally, Section-IV concludes the work
performance processors. It has been used in high speed on dynamic logic for combinational and sequential design.
combinational gates as well as sequencing elements- latches
and flip-flops. II. DYNAMIC CIRCUITS FOR COMBINATIONAL LOGIC
Figure 1 explains the operation of dynamic logic. When Dynamic logic is used for high speed combinational logic
clock signal CK is low, transistor MN is switched off and owing to its half input capacitance as compared to static
dynamic node X is charged to VDD by transistor MP. This is CMOS logic and practically zero rise delay [2]. The
called pre-charge phase. When CK rises to logic ‘1’, pull up transparency of dynamic gates is ensured by the choosing
transistor MP is switched off and depending on the input A, proper clocking strategy [4] such that the gates enter the
X either floats at VDD or gets discharged to ground. This is evaluation phase at optimal time. As explained in Figure 1,
called evaluation phase. In the evaluation phase, a rising after entering evaluation phase, dynamic logic respond only
transition on signal A is transferred to output X whereas a to rising input transitions & not to falling ones. Also,
falling transition on A will not be propagated to X as shown unfortunately it produces falling output transitions at output
by waveforms in Figure 1. [1]. Hence, dynamic logic cannot be self-cascaded.
The following three possible solutions deal with the
aforementioned issue and comprise a separate dynamic
CMOS logic family.
i. The dynamic output X in Figure 1 is passed through
a high-skewed static inverter gate that converts
monotonically falling output transition to
monotonically rising transition. This inverted output
can be safely used at input of another dynamic
front-end. This combination of dynamic logic and
static inversion stage is called Domino Gate [1][2].
Figure 1: Dynamic Logic Principle ii. The dynamic output X is fed to pMOS transistor of
pull-up network which remains off while dynamic
node X is precharged to logic ‘1’. Thus, the second
stage consists of pMOS based logic with nMOS
978-1-5386-0576-9/18/$XX.00 ©2018 IEEE
transistor as pre-discharge transistor. This is called
NP CMOS or NORA logic [1][2] because of
alternating n-type and p-type logic blocks. In this
logic, noise prone dynamic output is used to drive
sensitive dynamic logic which makes this logic less
reliable.
iii. The clock of the second stage is delayed such that it
does not enter evaluation mode until the first stage
has evaluated. In doing so, the second stage is
prevented from seeing a falling input transition in
the evaluation mode. This family is called Delayed
PE Clock Logic or Clock-Delayed Domino Logic
[3]. In this logic, however, the delay elements
inserted for PE clock would govern the speed of the
circuit rather than the gates themselves. The delay Figure 2: Dual Rail Domino AND/NAND Gate
elements require margin of 20-30% to work at all
corners which decreases the speed of the system.
A standard cell library of 31 dual rail domino gates was
Amongst the above three dynamic circuit styles, Domino created and characterized. As proposed in [5], the block was
logic is the safest, fastest and most widely accepted for first synthesized in Synopsys Design Compiler tool using
implementation of data-path of high performance processors. conventional static logic based libraries. The design was
However, due to the mandatory inversion in domino logic, synthesized at highest possible frequency. The netlist of
only non-inverting functions can be realized using domino combinational logic having most critical timing path was
gates and no signal inversion is allowed in between two then processed and converted into dual rail domino logic
gates. This incomplete logic hence is not supported by netlist. In this step, inverters were inserted at registered
standard design synthesis tools like Design Compiler which inputs of critical block to create true and complementary
expect logic completeness for technology mapping. inputs for dual rail domino gates. All the intermediate
inverters in the logic were removed. Each cell was replaced
Dual Rail Domino [1][2], on the other hand, is a by its counterpart in dual rail domino library. The logic was
complete logic family which realizes both inverting and non- then, partitioned into two equal phases on the basis of timing,
inverting functions in a single gate as shown in Figure 2. It phase-1 and phase-2. Domino buffers were inserted at the
should be noted that, in dual rail domino, monotonicity of interface of two phases. Phase-1 gates were clocked with
input signals is still required & only the generation of delayed system clock [5] and Phase-2 gates were clocked
inverted function is worked around, for example, NAND using inverted system clock. These clocks were generated
function , ZN = (A.B)’ in dual rail domino is generated as from system clock using local clock generator in the block.
ZN = (AN+BN) where AN, BN, ZN are all monotonically The functionality of the modified logic block was then
rising signals generated from dual rail domino gates. Signal verified with circuit level SPICE simulations. The complete
Z, ZN in Figure 2 are not strictly complementary to each block netlist with critical combinational logic modified and
other. During the pre-charge phase, both the outputs go to clock generator module inserted was then taken to Synopsys
logic ‘0’ and during the evaluation phase only one of them Placement and Routing tool.
rises depending on the inputs. Their polarity needs to be
preserved. Like domino, dual rail domino gates can-not have However, it was found that significant speed
a static inverter in between them. This aspect is not taken improvement was not achieved using dual rail domino logic.
care by standard technology mapping tools which results in The conventional static logic based floating point divider
synthesis failure using dual rail domino logic. However, due achieved 134MHz frequency whereas its dual rail domino
to the logic completeness that dual rail domino brings over logic counterpart could go only up to 149MHz under
domino logic, a methodology for using dual rail domino perfectly equal time partitioned combinational logic.
logic with standard EDA tools was presented in [5].
B. Analysis of Dual Rail Domino Implementation using
A. Dual Rail Domino Logic Implementation of Floating Standard Design Tools
Point Divider Block Although this technique of leveraging static logic
In [5], a 4x4 array multiplier was synthesized and it was technology mapping for dual rail domino logic sounds good
shown that dual rail domino logic can provide up to 62% due to presence of dual networks in both types of logic, it has
speed up over conventional static logic with standard EDA been observed that performance gain is achieved only when
tools. The design synthesis and logic optimization capability design is synthesized at moderate frequency targets rather
of tools with static CMOS logic gates was leveraged and than high frequencies. It is found not to be successful in
gates were then replaced with their dual rail domino achieving frequency goals beyond what is offered by
counterparts with two phase clocking scheme [4]. The conventional static CMOS logic libraries with standard
methodology presented in [5] was implemented on a digital design tools. The reason for the same is twofold.
pipelined floating point divider block.
At high frequencies, the synthesis tool optimizes the
technology mapping by putting more and more NAND gates
on the critical path which are universal and fastest among all
static CMOS gates. At the same time, NAND gates show
better performance when implemented using static CMOS In order to validate the above inference, the Floating
logic rather than dual rail domino logic. Point Unit was synthesized using Design Compiler tool, both
Table I NAND GATE AVERAGE DELAY
with and with-out the NAND gates in the Standard library.
These gates have faster or equally faster implementations in
Average Static NAND Dual Rail Domino static logic as compared to their dual rail domino
Delay Gate AND/NAND Gate counterpart. It was found that FPU could reach up to clock
Pre-Layout 112ps 96ps frequency of 150MHz in the presence of NAND gates in the
Post-Layout 127ps 131ps Static CMOS Standard library, but could reach only up to
110MHz without using these cells in the library. This clearly
shows that using standard design synthesis tools, only the
Table I gives the average FO2 delay for static CMOS logic family which has NAND gates faster than the
NAND gate and dual rail domino AND/NAND gate. It can conventional static CMOS logic NAND gates, should be
be seen that before layout, delay of dual rail domino used to increase the clock frequency of system beyond what
AND/NAND gate is lesser than static NAND gate but after is provided by static CMOS logic. Dual rail domino is not
layout, the trend is reversed. Delay of dual rail domino the one.
AND/NAND gate is 3% more than Static NAND gate. The
higher post-layout delay of dual rail domino AND/NAND Moreover, during the physical layout stage of the design,
gate can be attributed to the large parasitic effects of 13 logic optimizations are carried out very effectively by the
transistors implementation rather than four transistors Automated Placement and Routing tools to meet the
compact implementation of static CMOS NAND gate. frequency target. These are possible only when conventional
Figure 3 shows the layouts of both the gates. static CMOS logic library is used. In dual rail domino
implementation, the tool is not able to optimize the logic &
When a design is synthesized at a moderate frequency, performance of the circuit is found to further degrade after
synthesis tools like Synopsys Design Compiler provide least Automated Placement and Routing.
area technology mapping solution which supports the target
frequency. For this, they place complex standard cells such Due to above reasons, it is not feasible to reap the high
as adder cell, XOR gate cell, multiplexer cell on the critical speed benefits of dynamic or domino logic based gates using
path such that total area of block is minimized. Such conventional EDA tools. Dynamic logic gates have thus been
complex cells have lesser delay when implemented using used only in custom flow based processors [1][2] in which
dual rail domino logic, for instance, delay of dual rail these gates and their clock trees are handcrafted by designers
domino adder cell is 62ps whereas that for static CMOS to achieve high frequency goals.
adder cell is 128ps. If such cells in the timing critical module
of the design netlist provided by synthesis tool are now III. DYNAMIC CIRCUITS FOR SEQUENTIAL LOGIC
replaced by their dual rail domino counterparts and Unlike dynamic logic gates, dynamic logic based latches
methodology stated in [5] is used, the achieved frequency of and flip-flops are fully compatible with standard EDA tools
the design can be expected to increase by 60% to 70%. and can be easily incorporated in standard cell based digital
However, such comparison is not effective as it does not ASIC design flow. From the design testability perspective,
compare against maximum achievable frequency using Static flip-flops are the preferred sequencing elements over latches
CMOS. When the design is synthesized with tighter and and hence this paper focuses only on the former.
highest achievable frequency targets, synthesis tool
optimizes the mapping and, in turn uses more and more Conventional standard cell libraries contain Transmission
small and fast static gates like NAND gates on the critical Gate based Master-Slave Flip-Flops as shown in Figure 4(a).
path. Such gates when replaced with dual rail domino These flip-flops are compact but suffer from large clock to Q
counterparts would not increase the achievable frequency of delay and large positive set-up time, due to which they are
the design. not suitable for use in high performance designs with lower
logical depths or RF frequency synthesizer circuits.
Dynamic logic or Precharge-Evaluate logic is used in
plethora of high performance flip-flops. As pointed out in
Section-I, dynamic logic has shortcoming of being
responsive to rising input transition after clock edge when
used in flip-flop. In literature, many flip-flop circuits as
shown in Figure 4 (b)-(c) and Figure 5 have been proposed
that overcome this limitation. All these flips-flops use
different techniques to block the rising input transition after
clock edge.
In [6], Hybrid Latch Flip-Flop (HLFF), a pulsed flip-flop
was proposed which uses dynamic logic. In this flip-flop,
shown in Figure 4(b), along with the evaluation nMOS
transistor in pull-down network, another nMOS driven by
delayed and inverted clock, CPD is put in series. This makes
the pull-down network responsive only during the short
Figure 3: Layout of (a) Static NAND Gate (b) Dual Rail Domino duration of 1-1 overlap of CP and CPD. After this brief
AND/NAND Gate transparency period determined by the delay of inverters
chain, the pull down network of dynamic logic is blocked
and further activity on input signal is not captured. The
drawback of HLFF is the trade-off involved in width of
transparency period. It should be kept large enough that input
is sampled reliably in all the process corners. But a large
width would increase the hold time for the flip-flop. This
problem is overcome in Klass Semi-Dynamic Flip-Flop
(SDFF) [7] in which conditional shut off circuit is employed
as shown in Figure 4 (c). This automatically blocks the pull-
down network of dynamic logic front gate after the clock
rising edge if D input is sampled as logic ‘0’ during clock
edge. The rising transition of D input after the evaluation
phase is thus not captured.
Improved Nikolic’s Sense-Amplifier based Flip-Flop
(SAFF) [8], is a differential flip-flop using Dynamic Logic as
shown in Figure 5. Differential structure has many
advantages such as easy implementation of conditional shut-
off using only two additional transistors, use of cross-
coupled keepers which gives lower latency and increased
robustness over conventional keepers, symmetrical rise and
fall times over large load variations and very narrow meta-
stability window.
To use these high performance flip-flops in standard cell
library, a static inverter is added at the output of all these
flip-flops to avoid back coupling of noise from external
environment when used in design. Apart from providing
noise isolation to the flip-flop, inclusion of inverter at the
output also allows easy implementation of multiple drive
strengths of these flip-flops by sizing of inverters.
A. Performance Comparison of Flip-Flops
Spice Simulations were carried out for all the discussed
flip-flops at typical corner of 180nm CMOS process. Test
Bench circuit, as described in [8], was used with input
transition time of 0.1ns and capacitive load of 40Ff.
The simulation results are shown in Table II. The setup
time of each flip-flop is measured as minimum time between
Figure 4: Flip -Flop Circuit Diagrams (a) TGFF (b) HLFF (c) SDFF input edge and clock edge such that clock to output delay of
flip-flop is not more than 10% of reported clock to Q delay,
i.e. delay when input is sufficiently far from clock active
edge. ‘Avg. Power’ reported is the average power dissipated
in the flip-flop and clock buffer at 200MHz frequency. It is
observed that SAFF has shortest latency among all the
discussed high performance flip-flops followed by SDFF and
HLFF. In Figure 6-9, clock to Q delay and D to Q delay is
plotted for the compared flip-flops for Q rising transition and
minimum D to Q delay is found for all the flip-flops. Sense
Amplifier Flip-Flop has minimum D to Q delay of 225ps
whereas Transmission Gate Flip-Flop has maximum D to Q
delay of 371ps.
Table II. PERFORMANCE P ARAMETERS OF FLIP-FLOPS
TGFF HLFF SDFF SAFF
Parameter (SCL Lib) [6] [7] [8]
tClk->Q(rise) (ps) 304 251 269 218
tClk->Q(fall) (ps) 314 253 267 214
tSetUp(rise) (ps) +59 -4 -21 -5
tSetUp(fall)(ps) +104 +35 -35 +21
Avg. Latency(ps)
390 267 240 224
(tSetUp+tClk->Q)
Avg. Power(uW) 19.35 22.68 21.14 32.53
Normalized PDP 1 0.800 0.670 0.963
Figure 5: Sense Amplifier Flip-Flop
Figure 6: TGFF Clock to Q & D to Q delay
Figure 10: Standard Cell Layout of Sense Amplifier Flip-Flop
Table III. POST LAYOUT PERFORMANCE P ARAMETERS O F SAFF AND
TGFF
TG based Flip-Flop Sense
Parameter
(SCL Library) Amplifier Flip-Flop
tRise(Clk->Q)
370 260
(ps)
tFall(Clk->Q)
415 253
(ps)
tSetUp_rise (ps) +61 0
Figure 7: HLFF Clock to Q & D to Q delay
tSetUp_fall (ps) +90 +38
Avg. Power
26.81 38.88
(uW)
Cell Width (um) 10.64 15.01
Table III shows the post-layout performance of Sense
Amplifier Flip-Flop and conventional Transmission Gate
based Master-Slave Flip-Flop from SCL library. It is found
that after layout, the latency of Sense-Amplifier Flip-Flop is
41% lesser than TG based Flip-Flop. Maximum operating
frequency for TGFF is 1.9GHz whereas that for SAFF is
3.4GHz. Sense Amplifier based flip-flop is thus suitable for
use in high performance processors with low logical depth
and high frequency applications such as phase detectors for
Figure 8: SDFF Clock to Q & D to Q delay
RF frequency synthesizers. At the same time, its use is fully
supported by standard digital IC design tools.
IV. CONCLUSION
In this paper, use of dynamic logic for high performance
combinational and sequential design was discussed with
respect to standard cell based digital IC design flow using
EDA tools. It was concluded that technology mapping of
design synthesis tools can-not be used to get high speed
dynamic logic based combinational design. To reap the high
speed benefits of dynamic circuits based combinational
logic, either the custom design flow should be followed or
special technology mapping tools tailored to the
requirements of particular logic should be used. Dynamic
Figure 9: SAFF Clock to Q & D to Q delay
logic based high performance flip-flops were also discussed
and compared for use in standard cell library. Sense-
Amplifier based Flip-Flop is designed and characterized
B. Sense-Amplifier Flip-Flop Layout which can readily be used by standard design tools in deep
The layout of Sense Amplifier Flip-Flop was made for pipelined processors and high frequency circuits like RF
use in SCL standard cell library with height 5.6 um as shown frequency synthesizers.
in Figure 10. Cell width for the flip-flop is 15.01um.
REFERENCES [5] Navneet Kaur, Uday Khambete, and Deep Sehgal, “High Speed Dual
Rail Domino Logic for SCL 180nm CMOS Technology” 3rd ISSE
[1] N. H. E. Weste, and D. Harris, “ CMOS VLSI Design: A Circuits and National Conference on Complex Engineering Systems of National
Systems Perspective”, 4 th Edition, Addison Wesley, 2011 Importance, October 2017.
[2] J.M. Rabaey, A. Chandrakasan, and B. Nikolic, “Digital Integrated [6] H Partovi et al. “Flow Through Latch and Edge-Triggered Flip-flop
Circuits: A Design Perspective”, 2nd Edition, PHI Learning Private Hybrid Elements” IEEE International Solid State Circuits Conference,
Limited, 2003 Session-8, 1996.
[3] Gin Yee and Carl Sechen, “Dynamic Logic Synthesis” IEEE Custom [7] F. Klass et al. “A New Family of Semidynamic and Dynamic Flip-
Integrated Ciruits Conference, 1997. Flops with Embedded Logic for High Performance Processors”, IEEE
[4] David Harris and Mark A Horowitz, “Skew Tolerant Domino Journal of Solid-State Ciruits, Vol. 34, No.5, May 1999.
Circuits” IEEE Journal of Solid State Circuits, Vol.32, No. 11, [8] B. Nikolic, V.G. Oklobdzija, and V. Stojavonic, “Improved Sense-
November 1997 Amplifier Based Flip-Flop: Design and Measurements”, IEEE Journal
of Solid State Circuits, Vol. 35, No. 6, June 2000.