Arithmetic Circuits
Reference:
CMOS VLSI Design A Circuits and Systems Perspective, 4th Edition, by Neil Weste
Handbook of Digital CMOS Technology, Circuits, and Systems by Karim Abbas
Contents
• Ripple Carry
• Carry Look-Ahead
• Carry Skip
• Carry Increment
• Tree Adder: Brent-Kung, Sklansky, Kogge Stone Adder
2
Single-Bit Addition
Full Adder
Half Adder A B
A B Cout C
S=AꚚB Cout = AB + BC + AC S
Cout
Cout = A . B A B C Cout S
S
0 0 0 0 0
A B Cout S
0 0 1 0 1
0 0 0 0
0 1 0 0 1
0 1 0 1
0 1 1 1 0
1 0 0 1
1 0 0 0 1
1 1 1 0
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Slide 3
Carry-Ripple Adder
Example: Consider adding two 4-bit binary numbers, A = 1011 and B = 1101
A = 1011
B = 1101 The result is 11000, with a final carry-out (C₄) of 1
S = 11000
A4 B4 A3 B3 A2 B2 A1 B1
Cout Cin
C3 C2 C1
S4 S3 S2 S1
Slide 4
Advantages:
• Simplicity: The design is straightforward, making it easy to understand and implement
• Scalability: It's simple to extend the adder to handle more bits by adding more full
adders
Disadvantages:
• Speed: The main drawback of a Ripple Carry Adder is its speed. The carry output from
each full adder must propagate to the next adder, causing a delay proportional to the
number of bits (n). This delay is known as the carry propagation delay
• Performance: For large bit-widths, this delay can be significant, making ripple carry
adders slower than other adders like the Carry Look-ahead Adder (CLA) or the Carry
Save Adder (CSA)
Applications:
• Ripple Carry Adders are used in situations where simplicity and ease of
implementation are more important than speed, such as in small-scale
arithmetic operations in digital circuits or as part of more complex
adders in hierarchical designs.
Carry Look-Ahead Adder
• Most other arithmetic operations, e.g. multiplication and division are
implemented using several add/subtract steps. Thus, improving the
speed of addition will improve the speed of all other arithmetic
operations
• One widely used approach employs the principle of carry look-ahead
solves this problem by calculating the carry signals in advance, based on
the input signals. This type of adder circuit is called as Carry Look-Ahead
adder (CLA adder)
• Unlike the Ripple Carry Adder, where each carry bit must wait for the
previous one to be computed, the CLA adder calculates carry bits in
advance based on the input bits, thereby "looking ahead" to the final
result
It is based on the fact that a carry signal will be generated in two cases:
1. when both bits Ai and Bi are 1, or
2. when one of the two bits is 1 and the carry-in (carry of the previous stage) is 1.
That is,
Ci+1 = Ai.Bi + (Ai⊕Bi) Ci Alternatively, many adders
Ai.Bi is called Carry generate, Gi use K’i = Ai + Bi in place of Pi
Ai⊕Bi is called Carry propagate, Pi because OR is faster than
XOR.
Therefore,
Ci+1 = Gi + PiCi
Si = Pi ⊕ Ci
Si = Pi ⊕ Gi-1:0
The Boolean expression of the carry outputs of various stages can be
written as follows:
C1 = G0 + P0C0
C2 = G1 + P1C1 = G1 + P1 (G0 + P0C0)
= G1 + P1G0 + P1P0C0
C3 = G2 + P2C2 = G2 + P2 (G1 + P1G0 + P1P0C0)
= G2 + P2G1 + P2P1G0 + P2P1P0C0
C4 = G3 + P3C3
= G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0C0
• The 2-level implementation of the carry
signals has a propagation delay of 2 gates, 2τ
• The 4-bit carry look-ahead (CLA) adder
consists of 3 levels of logic:
• First level: Generates all the P & G signals. Four
sets of P & G logic (each consists of an XOR gate
and an AND gate). Output signals of this level (P’s
& G’s) will be valid after 1τ
• Second level: The CLA logic block which consists
of four 2-level implementation logic circuits. It
generates the carry signals (C1, C2, C3, and C4).
Output signals of this level (C1, C2, C3, and C4) will
be valid after 3τ
• Third level: Four XOR gates which generate the
sum signals (Si) (Si = Pi ⊕ Ci). Output signals of
this level (S0, S1, S2, and S3) will be valid after 4τ
• Thus, the 4 Sum signals (S0, S1, S2 & S3) will all be valid after a total delay
of 4τ compared to a delay of (2n+1)τ for Ripple Carry adders.
• For a 4-bit adder (n = 4), the Ripple Carry adder delay is 9τ.
• The disadvantage of the CLA adders is that the carry expressions (and
hence logic) become quite complex for more than 4 bits.
• Thus, CLA adders are usually implemented as 4-bit modules that are
used to build larger size adders.
16 Bit – Carry lookahead adder
Group Generate and Propagate
For larger adders, it is inefficient to compute each carry bit individually,
especially as the number of bits grows. Instead, the CLA uses group generate
and group propagate to compute carries for groups of bits in parallel.
• Group Generate (G₀₋₃) for a group of bits from bit 0 to bit 3: This signal indicates
that a carry is generated somewhere within the group and will produce a carry-out
from the group, regardless of the carry input to the group. It can be thought of as an
"or" condition across multiple generate conditions within the group.
• Group Propagate (P₀₋₃) for the same group: This signal indicates that if there is a
carry-in to the group, it will be propagated through to the next group. It is an "and"
condition across all propagate conditions within the group.
Calculating Group Generate and Group Propagate
For a group of 4 bits (let’s say bits 0 to 3), the group generate and propagate can be
defined as:
Group Generate (G₀₋₃):
G0-3 = G3+(P3⋅G2) + (P3⋅P2⋅G1) + (P3⋅P2⋅P1⋅G0)
This means that the group will generate a carry if the most significant bit generates a
carry (G₃), or if the carry is propagated through any combination of bits up to the
most significant bit that generates a carry.
Group Propagate (P₀₋₃):
P0-3=P3⋅P2⋅P1⋅P0
This means that the group will propagate a carry if all bits within the group propagate
the carry.
Use Group Generate and Propagate to Calculate Carry-Out for Group:
o If you have a carry-in C0 for the 4-bit group, the carry-out C4 can be
calculated using the group generate and propagate:
C4 = G0-3 + (P0-3⋅C0)
Applications
Carry Look-Ahead Adders are used in situations where high-speed
arithmetic operations are critical, such as in:
• ALUs (Arithmetic Logic Units): High-performance processors and microcontrollers.
• Floating-point units: In CPUs and GPUs for faster addition operations.
• Digital Signal Processing (DSP): For efficient computations in signal processing
tasks.
Overall, the CLA adder offers a significant speed advantage over simpler
adder designs, making it a popular choice for high-speed computing
applications.
PGK
• For a full adder, define what happens to carries
• Generate: Cout = 1 independent of C
• G=A•B
• Propagate: Cout = C
• P=AB
• Kill: Cout = 0 independent of C
• K = ~A • ~B
Slide 16
Carry Propagate Adders
• Carry into each bit can influence the carry into all subsequent bits.
• N-bit adder
• Each sum bit depends on all previous carries
• How do we compute all these carries quickly?
AN...1 BN...1
Cout Cin Cout Cin
00000 11111 carries
Cout Cin 1111 1111 A4...1
+
+0000 +0000 B4...1
1111 0000 S4...1
SN...1
Slide 17
Group Generate / Propagate
0 GCP==
0:00:0 in
Group Generate / Propagate
0 GCP==
0:00:0 in
Group Generate / Propagate
• Equations often factored into G and P
• Generate and propagate for groups spanning i:j
Gi: j = Gi:k + Pi:k Gk 1: j
Pi: j = Pi:k Pk 1: j 0 GCP==
0:00:0 in
• Base case
Gi:i Gi = Ai Bi G0:0 G0 = Cin
Pi:i Pi = Ai Bi P0:0 P0 = 0
• Sum:
Si = Pi Gi 1:0
PG Diagram Notation
Black cell Gray cell Buffer
i:k k-1:j i:k k-1:j i:j
i:j i:j i:j
Gi:k Gi:k
Gi:j Gi:j
Pi:k Pi:k Gi:j Gi:j
Gk-1:j Gk-1:j
Pi:j Pi:j
Pi:j
Pk-1:j
Slide 21
PG Logic
A4 B4 A3 B3 A2 B2 A1 B1 Cin
1: Bitwise PG logic
G4 P4 G3 P3 G2 P2 G1 P1 G0 P0
2: Group PG logic
G3:0 G2:0 G1:0 G0:0
C3 C2 C1 C0
3: Sum logic
Si = Pi Gi 1:0
C4
Cout S4 S3 S2 S1
Carry-Ripple Revisited
Gi:0 = Gi + Pi Gi 1:0
A4 B4 A3 B3 A2 B2 A1 B1 Cin
G4 P4 G3 P3 G2 P2 G1 P1 G0 P0
G3:0 G2:0 G1:0 G0:0
C3 C2 C1 C0
C4
Cout S4 S3 S2 S1
Slide 23
Carry-Ripple PG Diagram
Bit Position
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
tripple = t pg + ( N 1)t AO + txor
Delay
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
Slide 24
Carry-Skip/ Carry-Bypass Adder
• Carry-ripple is slow through all N stages
• Carry-skip allows carry to skip over groups of n bits
• Decision based on n-bit propagate signal
A16:13 B16:13 A12:9 B12:9 A8:5 B8:5 A4:1 B4:1
P16:13 P12:9 P8:5 P4:1
1 C12 1 C8 1 C4 1
Cout Cin
0 + 0 + 0 + 0 +
S16:13 S12:9 S8:5 S4:1
Slide 25
Carry-Increment Adder
• Consists of RCA’/CLAs and
incremental circuitry
Slide 26
Carry-Increment Adder
• An 8-bit increment adder includes two RCA of four bit each.
• The first ripple carry adder adds a desired number of first 4-bit inputs generating a
plurality of partitioned sum and partitioned carry.
• Now the carry out of the first block RCA is given to CIN of the conditional increment block.
• Thus the first four bit sum is directly taken from the ripple carry output.
• The second RCA block regardless of the first RCA output will carry out the addition
operation and will give out results which are fed to the conditional increment block.
• The input CIN to the first RCA block is given always low value.
• The conditional increment block consists of half adders.
• Based on the value of Cout of the 1st RCA block, the increment operation will take place.
• Here the half adder in carry increment block performs the increment operation.
• Hence the output sum of the second RCA is taken through the carry increment block.
Slide 27
Tree Adder
• If lookahead is good, lookahead across lookahead!
• Recursive lookahead gives O(log N) delay
• Many variations on tree adders
Slide 28
Brent-Kung Adders
• Logic Stages For Logarithmic/Tree Adders
• Black circular unit - dot operator – compute group propagate and generate logic
• Parallel adder made in a regular layout with an aim of minimizing the chip area and ease of
manufacturing.
• The addition of n-bit number can be performed in time O(log2(n)) with a chip size of area
O(n*log2(n))
Slide 29
Kogge-Stone (KS) Adders
• Depth of the KS adder increases logarithmically with the number of bits.
• The depth is defined as the number of units in series from the earliest input to the last
output, thus it is also the critical path.
• Thus, the KS adder is O(log2n) in complexity.
• While its delay at very low bit lengths can be worse than simpler adders, its scaling behavior
is unchallenged and thus it is among the fastest large adders.
• This behavior is common among most PPAs.
• Disadvantage is, its area. The 4-bit KS adder has 5 dot operators acting in parallel. The 8-bit
adder has 17 adders, while the 16-bit adder contains 49 adders.
Slide 30