Implementation of A Parallel Prefix Adder Based On Kogge-Stone Tree
Implementation of A Parallel Prefix Adder Based On Kogge-Stone Tree
Abstract—Binary adder is an important module for The critical path of parallel prefix adder is the carry tree, so
microprocessors. We present a novel structure of binary adder, lots of people make their greatest efforts to improve the speed
named Hyper-Parallel Prefix Adder (HPPA) in this paper. The of carry tree and have presented lots of structures of carry tree,
basic idea of HPPA is to divide the addition into two levels, both such as Kogge-Stone tree[1], Brent-Kung tree[5], Sklansky
of which are parallel prefix adder architecture. It can perfectly tree[6], Han-Carlson tree[7], Ladner-Fischer tree, Knowles
reduce the delay of wire loads by reducing the wire length in the tree and so on. These structures have the same goal which is to
critical paths; hence the performance of the adder can be compute all the carry bits as soon as possible. The difference
significantly improved. Moreover, we present an optimized carry of these structures is the links between the nodes of trees.
tree structure which is based on Kogge-stone tree named
Among the architectures of carry tree, the Kogge-Stone tree is
Grouped-Kogge-Stone tree (GKS tree), which has small nodes
especially in the top level. Combining HPPA with GKS tree, we
the fast architecture in principle, but its area and power costs
designed and implemented a 64-bit full adder. At the top level, we are expensive.
divide the two 64-bit addition operands into 8 groups, each of In this paper, we present an optimized carry tree structure
which is 8-bit. At the bottom level, we divide each 8-bit group into which is based on Kogge-stone tree named Grouped-Kogge-
4 equal sub-groups. Both of the two levels are composed of Stone tree (GKS tree). By using the GHC tree hierarchically,
parallel prefix adders based on Grouped-Kogge-Stone tree. We we can design optimized hyper-parallel prefix adders.
design the adder using Verilog HDL to verify the performance.
We have implemented the adder under 0.13um CMOS process. The remainder of this paper is organized as follows:
Simulation results show that the maximal delay of the proposed Section 2 describes basic algorithmic of parallel prefix adder;
adder layout is 578ps with the average power of 18.3mW. Section 3 presents the Grouped-Kogge-Stone tree; Section 4
presents the hyper-parallel prefix adder (HPPA) based on GKS
Keywords-parallel prefix adder; CMOS process; maximal delay; tree; Section 5 describes the design and implementation of a
average power; kogge-stone tree 64-bit HPPA based on 8-bit GKS tree, including the detailed
structure of the adder, RTL model and verification, circuit of
I. INTRODUCTION the logical, simulation results and comparison; Finally, we
Binary adder is an important module of microprocessors. It give the conclusions and the future works.
is not only be used to complete addition and subtract operation,
but also be used to achieve multiplication operation, division II. BASIC ALGORITHMIC OF PARALLEL PREFIX ADDER
operation and so on. To the best of our knowledge, the main There are many theories about parallel prefix adders. This
structures of adder include Carry-ripple, Carry-skip, Carry- section we will describe the basic algorithmic of parallel prefix
select, Carry-look-ahead, Parallel- prefix, and so on. adder. This is similar with [3][11][13][14][15], and we made
Carry-Ripple is the basic structure of adders. Its carry bit some reasoning changes.
must be spread from the least significant bit to the most Parallel prefix adder is composed of bit-generate Gi and
significant bit which forms the critical path of the adder[4]. bit-propagate Pi functions, and can be considered as three stage
The carry-skip adder divides operands into several groups and circuits, preprocessing stage, parallel prefix calculation stage
can improve performance only in some cases. Because of the and computing sum stage, respectively. Figure I shows the
use of forward-looking, the performance of carry-select adders details work flow and detail operations of each stage.
improved significantly. The critical path of carry-select adders
is the cascade of the carry bit of the first group and several Let’s consider the addition of two n-bit binary numbers
multiplexers. Each carry bit of carry-look-ahead adders has which are denoted as A=An-1An-2 …A0 and B=Bn-1Bn-2 …B0.
nothing to do with the front carry bit. The increasing The carry bits and the sum are denoted as C=Cn-1Cn-2…C0 and
complexity of each carry circuit makes the delay of adders S=Sn-1Sn-2 … S0. The following functions are hold. Note,
symbols +, and denote logical AND, OR and exclusive-OR
increasing linearly. The parallel prefix adder is an improved
architecture of carry-look-ahead adders.
operations, respectively.
Parallel prefix adder is particularly because it can be
attractively fast and compact when implemented in VLSI [3].
(Gi,j, Pi,j) = (Gi, Pi) (Gi-1,Pi-1) … (Gj+1, Pj+1) (Gj, Pj) (6)
Gn 1 , Pn 1 G0 , P0
FIGURE II. 4-BIT GROUPED-KOGGE-STONE TREE OF 64-BIT
ADDERS
(G0, 0 , P0,0 ) (G0 , P0 )
Cin
(G1,0 , P1,0 ) (G1 , P1 ) (G0 , P0 )
Ci Gi , j Pi , j C j 1
IV. HYPER-PARALLEL PREFIX ADDER
(G j ,0 , Pj ,0 ) (G j , Pj ) (G0 , P0 )
Cout C n 1 C0 We designed a Hyper-Parallel Prefix Adder (HPPA) by
C 0 G0 P0 C in using Grouped-Kogge-Stone tree to improve the performance
S i Pi Ci 1
C j G j Pj C j 1
of the adder. In this adder, the two operands of addition are
S n1 C out C n divided into several groups. As shown in Figure III, operand A
S0
is divided into j groups which are A0, A1, …, Ai, …, Aj-2, Aj-1,
respectively. Accordingly, operand B should also be split into j
FIGURE I. THE WORK FLOW OF PARALLEL PREFIX ADDER
groups which can be expressed as B0, B1, …, Bi, …, Bj-2, Bj-1,.
235
Note that, Ai and Bi must have the same number of bits which Both of the two 8-bit PPA have two common output
assumed to be Ni. In the operation of A plus B, Ai and Bi must signals G/P, which will be used in the top level of the 64-bit
do addition operator with the carry bit which we do know yet. adder. The input signals of the multiplexers come from 2-bit
In order to improve the speed of the operation, we use two Carry-Ripple adders. The output signals of the multiplexers
child parallel prefix adders to compute the sum, which is the are also used by the top level of the 64-bit adder.
same as carry-select adders. The select control signal comes 2
PG 7 6 5 4 3 1 0 PG 7 6 5 4 3 2 1 0
from the GKS tree. All the groups shown in Figure III are
divided again. CM0 1 1 CM0 1 1
1
CM1 CM1
A j 1 A j 2 Ai A1 A0
CM2 CM2
FIGURE IV. THE CARRY 8-BIT PPA AND NON-CARRY 8-BIT PPA
BASED ON 2-BIT GKS TREE
FIGURE III. DIVIDE OPERANDS A AND B
B. The Top Level of the Adder
Actually, a hyper-parallel prefix adder has two levels, both
of which are parallel prefix adders, which is the reason for its The architecture of the top level of the 64-bit adder is
name. The top level combines all the lower level together. shown in Figure V. All of the input signals of the top level
come from the bottom level.
The hyper-parallel prefix adder which has several child
parallel prefix adders is the improvement of parallel prefix
adder. It can fully develop parallelism insides adder, hence
may significantly improve the performance of adder. In order
to testify the performance of HPPA, we design a 64-bit HPPA
based on 8-bit GKS tree.
236
exiguous changes to the previous circuits to minimize the source. The simulation results show that the maximal delay of
logical level of the adder. The changes are shown in Table I. the proposed adder is 679ps, as shown in Figure VII. Figure
VIII gives the vivid comparison of those adders. The figure
In our design, we use dynamic domino logical circuits to also gives the detailed information about other adders
implement the P and the G, and use pass-transistor logical designed in references [3], [8], [9], [10].
circuit to implement multiplexer. The other circuits are
designed using static logic. The sizes of transistors in the
circuits are elaborate set. In order to maximize the adder VI. CONCLUSIONS
performance, we adopt skew CMOS circuits when we design In this paper, we present adder architecture: Hyper-Parallel
the size of transistors. The skew CMOS logical circuit [12] is Prefix Adder (HPPA) based on the Grouped-Kogge-Stone tree.
presented by Alexandre Solomatnikov to design noise-immune It can fully develop parallelism inside adder and reduce the
high-performance low-power static circuits. delay of wire loads by reducing the wire length in the critical
We designed the 64-bit adder based on the structure we path, hence can significantly improve the performance of
presented using 0.13um CMOS process. The layout of the adders. In the future, we will consider to improving the
adder we designed is shown in Figure VI. Its length is 635um, performance of the adder by reducing the carry tree level
and height is 40um. through other structures of nodes of the carry tree.
ACKNOWLEDGMENT
TABLE I. UNITS AND CORRESPONDING SYMBOLS
This work was supported by Natural Science Foundation of
level P G China (Grant No. 61303061) and State Key Laboratory of high
P&G P Ai Bi Gi Ai Bi
performance computing (Grant No.201513-01).
G Gi ( Pi G j ) REFERENCES
CM0,CM2,CM4 P Pi Pj
[1] P. Kogge, H. Stone, IEEE Trans. Computers, 8, vol. C-22, no. 8, p. 786–
793(1973).
CM1,CM3,CM5 P Pi Pj G Gi Pi G j [2] Dong-Yu Zheng, Yan Sun, Shao-Qing Li and Liang Fang, J. Comput.
Sci. & Technol., 1, p.25-27(2007).
[3] Yan Sun, Thesis of master’s degree of national university of defense
technology, (2005).
[4] Jan M.Rabaey, Anantha Chandrakasan, Borivoje Nikolic, Digaital
integrated circuits, p412(2004).
[5] R. Brent, H. Kung.IEEE Trans. Computers, vol. C-31, no. 3, pp. 260–
264(1982).
[6] J. Sklansky, Conditional-sum addition logic, IRE Trans, Electronic
FIGURE VI. LAYOUT OF 64 BIT FULL ADDER. Computing, vol. EC-9, pp. 226-231(1960).
[7] T. Han, D. Carlson.Proc. 8th Symp. Comp. Arith, pp. 49-56(1987).
[8] Sun Xuguang, Mao Zhigang, Lai Fengchang, Design and
Implementation of a 64bit CMOS Parallel Adder with Modified
Architecture. chinese journal of semiconductors, vol 24,No.2, 2003.
[9] A.Neve,H.Schettler,T.Ludwig,etal.Power-DelayProductMinimizationin
High-Performance 64-bit Carry-Select Adders. IEEE Transactionon
Very Large Scale Integration (VLSI)System,vol.12,no.3,March2004.
[10] Xiujiang Ren, Optimized Design of 64bit GHz Integer Arithmetic and
Logical Unit, Adder, Thesis of master’s degree of national university of
defense technology, (2007).
[11] Xiaofei Fan, The Research and Design on 64-bit 1.47GHz High-
Performance Integer Adder, Thesis of master’s degree of national
FIGURE VII. SIMULATION RESULT
university of defense technology, (2008).
[12] Alexandre Solomatnikov and et al,Skewed CMOS:Noise-Immune High-
100 Performance Low-Power Static Circuit Family,in:Proceeding of the
80 IEEE International Conference on Computer Design,241-246,2000
Delay(ns) [13] Lakshmanan, Ali Meaamar and Masuri Othman, High-Speed Hybrid
60
Power(mW) Parallel-Prefix Carry-Select Adder Using Ling's Algorithm, ICSE2006
40 Proc, 2006.
PDP(PJ)
20 [14] Robert Jackson and Sunil Talwar, High Speed Binary Addition, 2004.
Area(mm) [15] Giorgos Dimitrakopoulos, Dimitris Nikolos, High-Speed Parallel-Prefix
0
[8] [3] [9] HPPA VLSI Ling Adders, IEEE TRANSACTIONS ON COMPUTERS, VOL.
54, NO. 2, FEBRUARY 2005.
FIGURE VIII. PERFORMANCE COMPARISON
237