0% found this document useful (0 votes)

65 views

Field Programmable Gate Array Prototyping of End-Around Carry Parallel Prefix Tree Architectures

This document summarizes research on implementing an end-around carry parallel prefix tree architecture for field programmable gate arrays (FPGAs). It describes an 128-bit end-around carry adder designed to work independently for evaluation purposes. Fourteen different parallel prefix tree architectures are implemented and evaluated based on area requirements and critical path delay. Experimental results show one configuration has lower area and higher performance than the others.

Uploaded by

Sandeep Anugandula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

Field Programmable Gate Array Prototyping of End-Around Carry Parallel Prefix Tree Architectures

Uploaded by

Sandeep Anugandula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

www.ietdl.

org
Published in IET Computers & Digital Techniques Received on 27th March 2009 Revised on 27th September 2009 doi: 10.1049/iet-cdt.2009.0036

ISSN 1751-8601

Field programmable gate array prototyping of end-around carry parallel prex tree architectures
F. Liu1 Q. Tan1 G. Chen2 X. Song3 O. Ait Mohamed4 M. Gu5
National Lab of Parallel Distributed Processing, Hunan, China Lingcore Lab, Portland, OR, USA 3 ECE Department, Portland State University, Portland, OR, USA 4 ECE Department, Concordia University, Montreal, Quebec, Canada 5 School of Software, TsingHua University, Beijing, China E-mail: [email protected]
2 1

Abstract: As an important part of many processorss oating point unit, fused multiply-add unit performs a multiplication followed immediately by an addition. In IBM POWER6 microprocessors fused multiply-add unit, a fast 128-bit oating-point end-around-carry (EAC) adder is proposed. Very few algorithmic details exist in todays literature about this adder. In this study, a complete designed EAC adder that can work independently as a regular adder is proposed. Details about the proposed EAC adders arithmetic algorithms are described. In IBMs original EAC adder, the Kogge Stone tree has been chosen for its high performance on ASIC technology. In this study, the authors present a comparative study on different parallel prex trees which are used in the design of our new EAC adder targeting eld programmable gate array (FPGA) technology. Our study highlights the main performance differences among 14 different architecture congurations focusing on the area requirements and the critical path delay. The experimental results show that there is one architecture conguration with the lower area requirement and the higher performance.

Introduction

Fused multiply-add unit plays an important role in modern microprocessor. It performs oating-point multiplication followed immediately by an addition of the product with a third oating-point operand. In 2007, a seven-cycle fused multiply-add pipeline unit was proposed [1] as a part of the oating-point unit in IBMs POWER6 microprocessor. In this fused multiply-add dataow, the product should be aligned before it is added with the addend. Because the magnitude of the product is unknown in the early stages prior to the combination with the addend, it is difcult to determine a priori which operand is bigger [2]. Even if it was determined early that the product was bigger, there would be a problem on conditionally complementing two intermediate operands, the carry and sum outputs of the counter tree. Thus, an adder needs to be designed to always output a positive 306 & The Institution of Engineering and Technology 2010

magnitude result and preferably only needs to conditionally complement one operand [2]. Therefore a new 128-bit end-around carry (EAC) adder was designed and fabricated in IBMs fused multiply-add unit [3]. The intention is not to produce an adder with the best stand-alone performance but to provide the one with the best overall oating-point performance [3]. IBM implemented its EAC adder in a 65 nm SOI technology [4] and some sub-components are implemented using Kogge Stone tree [5]. In fact, the Ladner Fischer tree [6] was used in IBMs rst pass test chip. Compared to Ladner Fischer design, the Kogge Stone design is about 0.5 FO4 faster with only 6% area overhead and 5% power increase [3]. Therefore the Kogge Stone tree was chosen in the nal design. Besides Kogge Stone tree and Ladner Fischer tree, it is known that there are many other variations of parallel prex trees [7]. The motivation for our IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 4, pp. 306 316 doi: 10.1049/iet-cdt.2009.0036

www.ietdl.org
work was to nd the best EAC adder for use in a fused multiply-add unit. We also notice that eld programmable gate array (FPGA) technology has recently enjoyed a rapidly increasing popularity. With nanotechnology era, the logic density of FPGA has increased dramatically. Because the xed structure and large variety of resources of FPGA possess the potential to affect signicantly the implementation results. One interesting thing is to check whether the EAC adder can work well and to study the performance differences among different architecture congurations focusing on the area requirements and the critical path delay on FPGA technology. Since it would be difcult to evaluate the full oatingpoint performance, in this paper, we propose a complete designed EAC adder that can work independently without being a part of the fused multiply-add unit. Very few description on EAC adders formulations exist in todays literature, therefore details of the proposed EAC adders arithmetic algorithms are explained. Because the algorithms of our EAC adder mainly follows the IBM EAC adders arithmetic algorithms and can be read without the knowledge of the whole oating-point unit, it is our belief that our description would be helpful for people to get a better understanding about the nature of the EAC design. To make our EAC adder can work as a regular adder, we design some new logic units such as input logic unit, sign logic unit and so on. This design makes it easier to implement and test other design choices. On the other hand, the additional logic units do not affect the EAC adders key behaviours, evaluations of our EAC adders different designs has relevance to fused multiply-add unit design. We study the performance of EAC adder with different parallel prex trees on FPGA technology. The experimental results show that there is one architecture conguration with the lower area requirement and the higher performance. The paper is organised as follows. In Section 2, the related works are reviewed. In Section 3, some preliminaries and the algorithms of the 32-bit adder block are presented. Section 4 describes the architecture of our proposed 128-bit EAC adder and its arithmetic algorithms. Section 5 explains the implementation of different parallel prex trees in our EAC adder and reports the simulation results. Section 6 concludes this paper. EAC adder is used in recent processors. Although the EAC adder has become common hardware design practices, this technique has not been well documented. Shedletsky [11] analysed some behaviours of EAC adder using some real circuits examples. Yu et al. [3] proposed a fast 128-bit EAC adder which is fabricated as part of the IBM POWER6 microprocessor. They described the adders architecture and analysed its performance and power dissipation. Zhang et al. [12] presented a 108-bit EAC adder which is also used by a fused multiply-add unit. Structure-aware layout techniques were used to optimise their adders structure. All the works above focused on the EAC adders architecture design, while details of its arithmetic algorithms were not explained. Schwarz [2] discussed some aspects of the EAC adders algorithms, but some details were still not included. On the other hand, parallel prex tree is recently used as a subcomponent of the EAC adder. There are many classic parallel prex adders that have been proposed, including Sklansky [13], Kogge Stone and Brent Kung [14]. These prex networks achieve three extreme goals: minimal logic levels and wire tracks, minimal max-fanout and logic levels, and minimal wire tracks and max-fanout, respectively. In addition, Ladner Fischer, Han Carlson [15] and Knowles [7] implemented the trade-off between each pair of the extreme cases. Structure of the prex network determines the type of the prex adder. Ziegler et al. [16] considered sparsity, fanout and radix as three dimensions in the design space of regular parallel prex adders and presented a unied formalism to describe such structures. Liu et al. [17] studied how to nd optimal prex structures for specic applications and proposed an integer linear programming method to build minimal-power prex adders within a given timing and area constraints. In IBM POWER6s EAC adder, by chip test, it was found that Kogge Stone tree was a better choice than Ladner Fischer tree. The works discussed above are based on ASIC technology. Vitoroulis et al. [18] investigated the performance of parallel prex adders implemented with FPGA technology. It reported on the area requirements and critical path delay for a variety of classical parallel prex adder structures. However, parallel prex trees were implemented as a single adder, without being a part of bigger designs. In our work, we try to answer these questions: What are the arithmetic algorithms of EAC adder with parallel prex tree? If we use different parallel prex tree in the EAC adder on FPGA technology, which one is better? How parallel prex trees affect the other parts of EAC adder? As a part of EAC adder, should the implementation of parallel prex tree itself be changed?

Related work

In the past few years, several adders used in the fused multiply-add operation have been proposed [8 10]. These adder schemes are based on delay prole of the multiply compression tree. At a result, they are power efcient only when the nal addition is performed right after the compression tree and when the EAC computation is not needed [3]. For higher oating-point performance, the IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 4, pp. 306 316 doi: 10.1049/iet-cdt.2009.0036

3 Thirty-two-bit adder block in EAC adder

In this paper, the symbols 0 and 1 denote Boolean false and true, or digital number zero and one, respectively; the symbol 307

& The Institution of Engineering and Technology 2010

www.ietdl.org
^ denotes the Boolean AND; _ (or + ) denotes the Boolean OR; denotes the Boolean Exclusive OR. A binary number of length n (n 1) is an ordered sequence of binary bits where each bit can assume one of the values 0 or 1. For traditional integer adder, we use y = (yn1 yn2 , . . . , y1 y0 ) to x = (xn1 xn2 , . . . , x1 x0 ), denote the two n-bit addends and s = (sn1 sn2 , . . . , s0 ) to denote the corresponding sum (n 1); xi , yi , si denote the binary bits of x, y, s at position i, where 0 i n 1. Let c = {cn , cn1 , . . . , c0 } be the corresponding set of carries where c0 is the initial incoming carry, ci denotes the carry form the bit position i 1 and cn is the outgoing carry. To explain the adders algorithm, some standard notions such as propagated carry, generated carry, group-propagated carry and group-generated carry should be introduced. These notions are related to parallel prex trees and their denitions can be found in Koren [19]. In this paper, we use Pi = xi yi , Gi = xi ^ yi (for simplicity, Gi = xi yi ) to denote the propagated carry and generated carry at bit position i, respectively. We use Pi:j , Gi:j to denote the group-propagated carry and group-generated carry for the bit positions i , i 1, . . . , j , respectively. The notation of carry select adder is also important. For the group that consists of k bit positions starting with bit position j and ending with bit position i, where i = j + k 1, the outputs of carry select adder are the sum bits si , si1 , . . . , sj and the outgoing carry ci+1 . These outputs can be selected by the incoming carry into this group cj as follows ci+1 = [ci0+1 ^ cj ] _ [ci1+1 ^ cj ]
0 1 ^ cj ] _ [sm ^ cj ] sm = [sm

Figure 1 Block diagram of the 128-bit binary adder [3] as well as 32-bit conditional sums. The last sub-component is a sum selection block [3]. From Fig. 1 we know that the 128-bit EAC adder is composed of four 32-bit adder blocks. Each 32-bit adder blocks architecture is shown in Fig. 2. Each 32-bit adder block is actually a carry select adder consisting of four 8-bit adder blocks. Each 8-bit adder block has the structure depicted in Fig. 3. In each 8-bit adder block, there are two 8-bit adders which are implemented using parallel prex tree. For IBMs design, it is implemented using 8-bit Kogge Stone tree. The real structure of each 8-bit adder block is a conditional sum adder. In fact, there are two levels parallel prex tree in the 32-bit adder block as Fig. 2 shows. The rst level is the 8-bit parallel prex tree with sparseness of 2 that generates 8-bit carry signals, propagate terms as well as conditional sums. The second level is the parallel prex tree with sparseness of 8 that generates 32-bit carry signals, propagate terms and

(m = j , j + 1, . . . , i )

(1)

0 cj is the Boolean complement code of cj ; sm is the sum bit at bit position m under the condition that the incoming carry is 1 1 , ci+1 are the 0 and ci0+1 is the corresponding outgoing carry; sm sum bit at position m and the outgoing carry under the condition that the incoming carry into the group is 1. Other useful notions and formulations about parallel prex trees and carry select adder can be found in Koren [19].

Since the carry signal is on the critical path, to obtain a high performance oating-point unit, a 128-bit EAC adder was designed in IBM POWER6 microprocessor. Fig. 1 shows its block diagram. This adder is divided into three sub-blocks: the 32-bit adder block, the EAC logic block and the nal sum selection block [3]. Each 32-bit adder block is also partitioned into three sub-components. The rst subcomponent is an 8-bit prex-2 Kogge stone tree with sparseness of 2 that generates 8-bit propagates as well as conditional sums that are needed later for sum selection. The second sub-component is a prex-2 Kogge stone tree with sparseness of 8 that generates 32-bit propagated terms 308 & The Institution of Engineering and Technology 2010

Figure 2 Block diagram of 32-bit adder block IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 4, pp. 306 316 doi: 10.1049/iet-cdt.2009.0036

www.ietdl.org
designed EAC adder and describe its architecture. The new architecture makes our EAC independent without being a part of the fused multiply-add unit. Our new design mainly follows the algorithms of the EAC adder which is implemented in IBM POWER6 microprocessor. The additional logic units of our EAC adder are useful to ensure the whole adder can work independently. They do not affect the key algorithms. Therefore we take our EAC design as the example to explain the EAC adders arithmetic algorithms which makes our descriptions more clearly and easy to read. People can understand them without the knowledge of other details about the IBM POWER6s oating-point unit. Another advantage is that our new design is easy to implement and test, which gives us the possibility to implement different architecture congurations and compare their properties such as performance. Fig. 4 shows the architecture of the proposed EAC adder. In this adder, the inputs are two 129-bit binary addends x = ( sx127 x126 , . . . , x0 ), y = ( sy127 y126 , . . . , y0 ) and the outputs is the sum s = ( ss127 s126 , . . . , s0 ). They are all in sign magnitude format. x.x, y.y, s.s are the magnitudes of x, y, s and x.s, y.s, s.s are the corresponding sign bits. The magnitudes of operands are used to produce the positive magnitude of the sum and the sign bits of operands are used to produce the sign of the sum. The adder in Fig. 4

Figure 3 Eight-bit adder block conditional sums. In Fig. 2, we just show one implementation of parallel prex tree for the second level.

Complete EAC adder design

Although the EAC adder has been implemented on several microprocessors, very few details on their formulations and arithmetic algorithms can be found in todays literature. Schwarz [2] given nice explanations about some aspects of the EAC adders algorithms, but some details were not included. In this section, we try to describe the details of EAC adders algorithms clearly. We propose a completely

Figure 4 Architecture of modied EAC adder IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 4, pp. 306 316 doi: 10.1049/iet-cdt.2009.0036 309

& The Institution of Engineering and Technology 2010

www.ietdl.org
can implement four operations: x.x + y.y, x.x y.y, (x.x) + y.y and ( x.x) + ( y.y). as the follows s.s = y.y x.x = ( x.x y.y) = ( x.x + y.y + 1) = ( x.x + y.y) 1 = ( x.x + y.y + 0) + 1 1 = ( x.x + y.y + 0) (2)

4.1 Integrating addition and subtraction

EAC means that when subtracting two signed numbers that are in sign magnitude format, the subtraction is implemented by the addition of the rst operand with the Boolean complement code of the second operand. For this addition, instead of setting a carry into the least signicant digit, the carry out of the most signicant digit is taken as the carry into the least signicant digit. This ensures that the result of the addition is always a positive magnitude result and preferably only one operand needs to be conditionally complemented. The EAC adder is designed to form a unique sum for every possible pair of addends. When adding, it is similar to other regular adders. When subtracting, it uses the end around carry to ensure that the sum result is always positive. Hence, with EAC, the adder shown in Fig. 4 should satisfy the following constraints: (1) when x.s = y.s, the adder should do addition and we have s.s = x.s and s.s = x.x + y.y. (2) when x.s = y.s, the adder should do subtraction. If x.x y.y, then s.s = x.s and s.s = x.x y.y; if x.x , y.y, then s.s = y.s and s.s = y.y x.x. Fig. 5 shows the subtraction dataow of our EAC adder. The algorithm is described as follows: 1. Decide which one is bigger between x.x and y.y by performing an effective subtraction x.x y.y. If x.x y.y 0, then x.x y.y, otherwise x.x , y.y. We use y.y to denote the Boolean complement code of y.y. Since x.x y.y = x.x + y.y + 1 = x.x + 2n y.y, we have the property: when x.x y.y, the outgoing carry of x.x + y.y + 1 will be 1. Therefore the outgoing carry of x.x + y.y + 1 which is denoted by cout can be used to decide whether x.x is bigger than y.y. If cout = 1, then x.x y.y; if cout = 0, then x.x , y.y. 2. Do addition x.x + y.y + cout and compute the Boolean complement code of the sum to result in x.x + y.y + cout . When x.x y.y, which means cout = 1, the subtraction x.x y.y can be rewritten as x.s = x.x y.y = x.x + y.y + 1 = x.x + y.y + cout . When x.x , y.y, which means cout = 0, the subtraction y.y x.x can be rewritten

With the above equation we obtain the following property: when x.x , y.y, the output of the EAC adder is dened by the following equation s.s = x.x + y.y + cout (3)

3. Finally, the outgoing carry cout is used to select the correct s.s. When x.x y.y, the output of the EAC adder should be s.s = x.x + y.y + cout ; when x.x , y.y, the output of the EAC adder should be s.s = x.x + y.y + cout . After discussing how to implement the effective subtraction of operands x.x and y.y, we focus on the addition of them. Actually, it is easy to implement x.x + y.y. However, we must combine the addition with the subtraction in one single adder. Fig. 6 shows how to integrate them. In Fig. 6, the Add/sub-logic unit takes x.s, y.s as the inputs and os as the output. The output os is dened by os = x.s y.s (4)

The input logic unit takes os , y.y as the inputs and yt as the output. The output yt is dened by yt = y .y , y .y , os = 0 os = 1 (5)

The sign logic unit takes x.s, y.s, cout as the inputs and s.s as the output. The output s.s is calculated by s.s = ( x.s ^ cout ) _ ( y.s ^ cout ) (6)

Figure 5 Subtraction dataow of EAC adder 310 & The Institution of Engineering and Technology 2010

Figure 6 Integration of addition and subtraction IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 4, pp. 306 316 doi: 10.1049/iet-cdt.2009.0036

www.ietdl.org
In Fig. 6, when os = 0, we can use the EAC adder to do addition x + y; when os = 1, we use the EAC adder to perform the subtraction as shown in Fig. 5. The inputs of the EAC adder are yt , x.x, os ; the outputs are cout , s.s. When os = 0, because yt = y.y, actually, the inputs are x.x, y.y and the incoming carry 0; the outputs should be the sum s.s = x.x + y.y and the outgoing carry cout . When os = 1, the inputs are yt = y.y, x.x and the incoming carry 1; the outputs should be the correct result computed by the algorithm in Fig. 5. In this way, we perform both the addition and the subtraction using a single adder. We can use another logic unit named EAC logic unit to implement this method. c3 = G95:64 + P95:64 G63:32 + P95:64 P63:32 G31:0 + P95:64 P63:32 P31:0 cin c0 = G127:96 + P127:96 G95:64 + P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 G31:0 + P127:96 P95:64 P63:32 P31:0 cin (8)

Following the rst step of Fig. 5, we know that x.x + y.y + 1 should be done and the outgoing carry cout should be used to decide whether x.x is bigger than y.y or not. Thus, by the above equations, assuming cin = cin1 = 1, cout can be computed as cout = c0 = G127:96 + P127:96 G95:64 + P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 G31:0 + P127:96 P95:64 P63:32 P31:0 (9)

4.2 EAC logic unit

Fig. 6 shows the way to combine the addition with the subtraction. Actually, the effective subtraction needs two addition operations as shown in Fig. 5. With the help of the EAC logic unit, we can implement the addition and the subtraction by only one addition operation. Fig. 4 shows the design. In Fig. 4, the input logic unit, sign logic unit and add/sublogic unit are similar to those in Fig. 6. The four 32-bit adder blocks are all the 32-bit adder block shown in Fig. 2. In fact, the real addends of our EAC adder in Fig. 4 are two 128-bit binary numbers x.x and yt . They are divided into four groups as the inputs into the four 32-bit adder blocks. The four 32bit adder blocks output the group-propagated carries P127:96 , . . . , P31:0 , the group-generated carries G127:96 , . . . , G31:0 which are used by the EAC logic unit; they also output the conditional sums s0127:0 , s1127:0 which are used by the last sum selection unit. s0127:0 is the sum under the condition that the incoming carry is 0 while s1127:0 is the sum under the condition that the incoming carry is 1. In Fig. 4, the signals G127:96 , P127:96 , . . . , G31:0 , P31:0 are used by both R logic unit and EAC logic unit. The R logic t as the output. unit takes P31:0 , os as the inputs and P31:0 t P31:0 is computed by
t = P31:0

Then, for the second addition which means x.x + y.y + cout , we take cout = c0 as the incoming carry. Using the formulations of carry lookahead adder again, we can obtain group carry signals as
= G31:0 + P31:0 cout c1 = G31:0 + P31:0 {G127:96 + P127:96 G95:64

+ P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 G31:0 + P127:96 P95:64 P63:32 P31:0 } = G31:0 + P31:0 G127:96 + P127:96 P31:0 G95:64 + P127:96 P95:64 P31:0 G63:32
c2

+ P127:96 P95:64 P63:32 P31:0 = G63:32 + P63:32 G31:0 + P63:32 P31:0 cout = G63:32 + P63:32 G31:0 + P63:32 P31:0 G127:96 + P127:96 P63:32 P31:0 G95:64

+ P127:96 P95:64 P63:32 P31:0 = G95:64 + P95:64 G63:32 + P95:64 P63:32 G31:0 + P95:64 P63:32 P31:0 cout = G95:64 + P95:64 G63:32 + P95:64 P63:32 G31:0 + P95:64 P63:32 P31:0 G127:96

P31:0 , 0,

os = 1 os = 0

(7)

+ P127:96 P95:64 P63:32 P31:0 c0 = G127:96 + P127:96 G95:64 + P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 G31:0 + P127:96 P95:64 P63:32 P31:0 cout = G127:96 + P127:96 G95:64 + P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 G31:0 + P127:96 P95:64 P63:32 P31:0 (10)
c0 , c1 , c2 , c3 can be used to select the correct sum x.x + y.y + cout = sum127:0 . In the following, we will show how the EAC logic unit completes the task mentioned above.

The EAC logic unit takes the signals G127:96 , P127:96 , . . . , G31:0 t together with P31:0 as the inputs to calculate the incoming carries into each group c0 , c1 , c2 , c3 . With the help of the above logic units, the algorithm of EAC adder is as follows: = P31:0 . 1. When x.s = y.s, we have os = 1, y = y.y, From the formulation of carry lookahead adder, we can obtain
t t P31:0

c1 = G31:0 + P31:0 cin c2 = G63:32 + P63:32 G31:0 + P63:32 P31:0 cin IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 4, pp. 306 316 doi: 10.1049/iet-cdt.2009.0036

311

& The Institution of Engineering and Technology 2010

www.ietdl.org
Denition 4.1 (EAC logic unit): The EAC logic unit t as the takes the signals G127:96 , P127:96 , . . . , G31:0 , P31:0 inputs and c0 , c1 , c2 , c3 as the outputs. The outputs are dened as follows
c1 = G31:0 + + t t + P31:0 P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 P31:0
t P31:0 G127:96 t P31:0 P127:96 G95:64

be calculated as follows
c1 = G31:0 + P31:0 cin = G31:0 c2 = G63:32 + P63:32 G31:0 + P63:32 P31:0 cin c3

= G63:32 + P63:32 G31:0 = G95:64 + P95:64 G63:32 + P95:64 P63:32 G31:0 + P95:64 P63:32 P31:0 cin

c2 = G63:32 + P63:32 G31:0 + t t + P31:0 P127:96 P63:32 G95:64 + P127:96 P95:64 P63:32 P31:0
t P31:0 P63:32 G127:96

= G95:64 + P95:64 G63:32 + P95:64 P63:32 G31:0 c0 = G127:96 + P127:96 G95:64 + P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 G31:0 + P127:96 P95:64 P63:32 P31:0 cin = G127:96 + P127:96 G95:64 + P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 G31:0 (13)
With cin = 0, c1 , c2 , c3 , we can select the correct sum is the x.x + y.y from the outputs s0127:0 and s1127:0 and c0 outgoing carry.

c3 = G95:64 + P95:64 G63:32 + P95:64 P63:32 G31:0 t t + P95:64 P63:32 P31:0 G127:96 + P127:96 P95:64 P63:32 P31:0 c0 = G127:96 + P127:96 G95:64 + P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 G31:0 +
t P127:96 P95:64 P63:32 P31:0

(11) As we know, when x.s = y.s, we have os = 1 and t = P31:0 . So, for EAC logic unit, the equations of P31:0 calculating c0 , c1 , c2 , c3 can be rewritten as c1 = G31:0 + P31:0 G127:96 + P31:0 P127:96 G95:64 + P31:0 P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 P31:0 c2 = G63:32 + P63:32 G31:0 + P31:0 P63:32 G127:96 + P31:0 P127:96 P63:32 G95:64 + P127:96 P95:64 P63:32 P31:0 c3 = G95:64 + P95:64 G63:32 + P95:64 P63:32 G31:0 + P95:64 P63:32 P31:0 G127:96 + P127:96 P95:64 P63:32 P31:0 c0 = G127:96 + P127:96 G95:64 + P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 G31:0 + P127:96 P95:64 P63:32 P31:0 (12) In this case, it is easy to nd that the equations of calculating c0 , c1 , c2 , c3 are equivalent to the formulations of computing c0 , c1 , c2 , c3 above. Therefore the end-around-logic unit can be used to implement the subtraction dataow shown in Fig. 5 by only one addition. Furthermore, os and c0 can be used to select the correct sum s.s from sum127:0 and sum127:0 according to the following rules: When x.x y.y, we have os = 1, c0 = 1, cout = c0 = 1. As a result, sum127:0 = x.x + y.y + 1, and the sum is selected as s.s = sum127:0 = x.x + y.y + 1. When x.x , y .y , os = 1, c0 = cout = 0. the we sum have is

On the other hand, because of x.s = y.s, we have os = 0, t = 0, the EAC logic units formulations can yt = y.y, P31:0 be rewritten as follows
t t G127:96 + P31:0 P127:96 G95:64 c1 = G31:0 + P31:0 t t + P31:0 P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 P31:0 = G31:0 + 0 G127:96

+ 0 P127:96 G95:64 + 0 P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 0 = G31:0 c2 = G63:32 + P63:32 G31:0 t t + P31:0 P63:32 G127:96 + P31:0 P127:96 P63:32 G95:64
t + P127:96 P95:64 P63:32 P31:0

= G63:32 + P63:32 G31:0 c3 = G95:64 + P95:64 G63:32 + P95:64 P63:32 G31:0

t t + P95:64 P63:32 P31:0 G127:96 + P127:96 P95:64 P63:32 P31:0 = G95:64 + P95:64 G63:32 + P95:64 P63:32 G31:0

c0 = G127:96 + P127:96 G95:64 + P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 G31:0
t + P127:96 P95:64 P63:32 P31:0

= G127:96 + P127:96 G95:64 + P127:96 P95:64 G63:32 + P127:96 P95:64 P63:32 G31:0 (14)
, c1 , c2 , c3 are We can see that the equations calculating c0 same to the equations calculating c0 , c1 , c2 , c3 . Therefore c1 , c2 , c3 can be used to select the correct sum. Here, the rst group is a special case. sum31:0 is not only controlled by c0 , but also controlled by os

sum127:0 = x.x + y.y + 0.

Then,

s.s = sum127:0 = x.x + y.y + 0. 2. When x.s = y.s, we should do the addition x.x + y.y. Taking the formulations of carry lookahead adder and assuming the incoming carry cin = 0, the group carries can 312 & The Institution of Engineering and Technology 2010

sum31:0 =

s131:0 , s031:0 ,

c 0 ^ os = 1 others

(15)

IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 4, pp. 306 316 doi: 10.1049/iet-cdt.2009.0036

www.ietdl.org
By this way, when x.s = y.s, whatever the value of c0 is, we always have sum127:0 = x.x + y.y. The EAC logic unit can implement the simple addition x.x + y.y correctly. Furthermore, os and c0 can also be used to select the correct sum in subtraction dataow discussed above. In this way, the end around carry logic unit can combine the addition and subtraction correctly by doing only one addition operation. In paper [2], the formulation of the EAC adder is similar, but some details of the algorithms were not explained, and the EAC logic unit is introduced as a part of the fused multiply-add unit. This means it cannot do the addition independently. Our design given in Fig. 4 can perform the addition independently. So, it is easy to verify the correctness of the adders algorithms.

5.1 Parallel prex tree design in EAC adder

To implement a parallel prex tree, we need half-adder to calculate generated-carry and propagated-carry at each bit position. Then, using these carry signals, we need some other cells to compute group-generated carries and grouppropagated carries. Fig. 7 shows some gate-level basic cells which calculate group-propagated carry Pi:j and groupgenerated carry Gi:j in the parallel prex trees intermediate stages. In Fig. 7, the quadrate cell calculates Pi:j and Gi:j simultaneously whereas the triangular cell just calculates Gi:j . Therefore the circuit of the quadrate cell is more complex than that of the triangular cell. With the help of these basic cells, the rough implementation of Brent Kung tree is shown in Fig. 8. We use HAi (0 i 7) to denote Half adder. Here, we do not take the buffers into account. Here, for a regular parallel prex adder which does addition of two addends,

Implementation and validation

From the arithmetic algorithms discussed above, we know that for the 32-bit adder block in IBMs EAC adder design, the rst level and the second level parallel prex tree is a Kogge Stone tree. Comparing to Ladner Fischer tree, the Kogge Stone tree design is a better choice on ASIC technology. Here, we try to nd the best choice on FPGA technology. In this paper, our proposed EAC adder follows all the key algorithms of IBMs design, the additional logic units mainly are used to ensure that the EAC adder can work independently. Thus, it is not only useful to implement and test the EAC adder easily, but also useful as a reference to nd a better design for the EAC adder used in fused multiply-add unit. We will implement different parallel prex trees architecture congurations in our EAC adder and report the simulation results. Figure 7 Basic cells in parallel prex tree Knowles [7] has presented complete classes of regular fanout prex adders which are bounded at the extremes by the Kogge Stone tree and Ladner Fischer tree. For our study, using PFGA technology, we choose the regular parallel prex trees of Knowless adder family and other basic parallel prex trees to implement the rst level 8-bit parallel prex tree as depicted in Fig. 3. These chosen parallel prex trees are Kogge Stone; Ladner Fischer; Brent Kung; Han Carlson; Konwles [1, 1, 4]; Konwles [1, 2, 2]; Konwles [1, 1, 2]. Then, for the second level parallel prex tress in Fig. 2, we also choose Konwles [1, 1] and Konwles [1, 2] in Konwless adder family to implement them, respectively. These adders were selected because they span the design limits and intermediate cases in terms of area, depth of prex network, fan-out and interconnect count. The notions introduced in Section 3 are helpful to understand how these parallel prex trees work. However, we should change their regular implementation to ensure that they can work correctly in the EAC adder. IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 4, pp. 306 316 doi: 10.1049/iet-cdt.2009.0036

Figure 8 Eight-bit Brent Kung tree 313

& The Institution of Engineering and Technology 2010

www.ietdl.org
we always assume that the incoming carry into this adder is c0 = 0. For two N-bit binary addends x = (xn1 xn2 , . . . , x0 ), y = (yn1 yn2 , . . . , y0 ), the formulations of computing carry and sum at bit position i in parallel prex tree are ci = Gi1:0 _ (Pi1:0 ^ c0 ), si = Pi ci , where 0 i n 1. Because c0 = 0, we have ci = Gi1:0 _ (Pi1:0 ^ c0 ) = Gi1:0 . That is why we can use two different basic cells in Fig. 7 to build the regular BrentKung tree in Fig. 8. The idea is that sometimes only the signal Gi1:0 is needed, therefore the triangular cell which is more simple can be used to reduce the complexity. Vitoroulis [18] compared the performance and area for regular parallel prex trees which are implemented on FPGA technology. But when the parallel prex trees are implemented as components of our EAC adder in Fig. 4, they cannot be designed in the regular way shown in Fig. 8. Both Gi:0 and Pi:0 should be kept as the outputs for reuse in the next stage. For example, if we want to use Brent Kung tree as the component in the EAC adder, which means the parallel prex tree in Fig. 3 is implemented using Brent Kung tree, we can only use the quadrate cell to calculate the signals in the intermediate stages. We must change the regular design of Brent Kung tree shown in Fig. 8. Fig. 9 shows the rough architecture of the modied BrentKung tree adopted. Therefore on FPGA technology, the properties of the different parallel prex trees such as area and performance will be different from the results listed in Vitorouliss report. As a result, if we implement different parallel prex trees in our EAC adder, we should rst change the implementation of the parallel prex tree itself; then, we also should take into account the relationship between the parallel prex trees and the other parts of the EAC adder. the various tree structures which are already discussed in this paper. They are rstly coded in VHDL in two different levels and then all 14 different architecture congurations are modeled in the Aldec Active HDL simulation environment. The adder functionality was successfully veried using 100 000 random test vectors. After functional verication, all the 14 adder architectures were implemented on a high performance Virtex II-PRO Xilinx FPGA (XC2VP100) chip in Xilinx ISE synthesiser environment. We measured the area of an implemented design in terms of the number of FPGA slices taken by the implemented design, and the speed performance in terms of the longest signal path or critical path delay of the design (ns). The area and speed results are compared in Figs. 1013. These results show that, we achieve minimum area when using the 32-bit Knowles [1, 1] tree and 8-bit Ladner Fischer tree conguration and the maximum area when using the 32-bit Knowles[1, 1] tree and 8-bit Knowles [1, 1, 2] tree conguration (Fig. 13), which is 18% larger. By comparing the critical delay results of various EAC adders in Fig. 10, we can nd that the 32-bit Knowles [1, 2] tree and 8-bit Han Carlson tree conguration has the lowest delay; the 32-bit Knowles [1, 1] tree and 8-bit

5.2 Experimental results

In this section, we present the simulation, synthesis and implementation results. EAC adders were designed using

Figure 10 Critical path delay, logic delay and route delay (ns)

Figure 9 Eight-bit Brent Kung tree in EAC adder 314 & The Institution of Engineering and Technology 2010

Figure 11 Logic delay (ns) IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 4, pp. 306 316 doi: 10.1049/iet-cdt.2009.0036

www.ietdl.org
built-in carry logic is about 13.7 ns (Fig. 11), longer than that of parallel prex adders, which are about 10 ns. However, the routing delay of built-in carry logic is only 3.4 ns. In contrast, the routing delay of Brent Kung adder, which is almost the minimum among all parallel prex adders, is 11.8 ns. In summary, the total delay of built-in carry logic is 17.1 ns, less than that of Brent Kung adder, which is 21.9 ns. This result validates that built-in carry logic is a better choice in FPGA than parallel prex adder. However, in an EAC adder, we do not only use the sum signals from the adder, but also need the group propagated carries and group generated carries, which can only be obtained from parallel prex adders. That is to say, in order to port the EAC adder to FPGA, the use of parallel prex tree is still required. To achieve a better implementation, experiments over different parallel prex trees are helpful to nd the optimal solution. For the power consumption, the Xilink power estimation tool, XPOWER, gives very rough estimations. For all implementations of the EAC adders the power dissipation was estimated approximately 572 mW. We notice that Vitoroulis also did not list the power consumptions for regular parallel prex trees [18]. Therefore we will keep looking for better tools that can report precise power dissipation and consider the power consumption as a metric in future direction. But right now, based the simulation results we have, we may say Kogge Stone tree is not a better choice as in ASIC technology. Compared to other parallel prex trees, Kogge Stone implementation has longer delay, bigger area and similar power consumption.

Figure 12 Routing delay (ns)

Figure 13 Number of slices Ladner Fischer tree conguration has the maximum delay which is about 22.5% larger. As we know, the critical path delay has two main components, the logic delay and the routing delay. It can be seen in Fig. 10 that the routing delay for all adders is more than the logic delay, with very little variation. Here, the wiring (Routing) is automatically chosen by the synthesiser tools. Sometimes it can be optimised at the nal phase of any design manually or using other methods to decrease it. But sometimes it is very hard to do this optimisation. Although the logic delay is always related to the routing delay, in Figs. 11 and 12 we still compare them separately. The results show the 32-bit Knowles [1, 2] tree and 8-bit Han Carlson tree conguration also has the lowest logic delay, but not the lowest routing delay; the 32-bit Knowles [1, 1] tree and 8-bit Ladner Fischer tree conguration seems to have the maximum logic delay and the maximum routing delay. Finally the 32-bit Knowles [1, 2] and 8-bit Han Carlson conguration seems to be the best compromise between area and speed. Even though the occupied area is about 3% larger than the minimum, it is more than compensated by a signicant increase in terms of the speed. It is also known that FPGA have built-in carry logic based on fast-carry computations which outperforms parallel prex adders in both area and delay [18]. This is mainly because the built-in carry logic in FPGA can use a high speed bus to propagate the carry. In our experiments, the logic delay of IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 4, pp. 306 316 doi: 10.1049/iet-cdt.2009.0036

Conclusion

In this paper, we proposed a complete design of a binary oating-point EAC adder and explained the details of its arithmetic algorithms. Our EAC adders algorithms mainly follow a 128-bit binary oating-point adder which is implemented in the IBM POWER6 microprocessor. Compared to the IBMs design, our EAC adder can work independently, which makes it easy to implement and test. Because there are few details of the EAC adders arithmetic algorithms in todays literature, our paper can help designers to understand this arithmetic unit well. Then, we studied the performance of parallel prex trees implemented in our EAC adder with FPGA technology. After analysing the relationships between parallel prex trees and other parts of the EAC adder, we modied the implementation of regular parallel prex trees to ensure that they are able to be used within the EAC adder correctly. By comparing the areas and performances of 14 different parallel prex trees architecture congurations, we found that the 32-bit Knowles [1, 1] and 8-bit Ladner Fischer conguration has the minimum area while the 32-bit Knowles [1, 2] and 8-bit Han Carlson conguration has the minimum critical path delay. Although the occupied area is about 3% larger than the minimum, the 32-bit Knowles [1, 2] and 8-bit Han Carlson conguration 315

& The Institution of Engineering and Technology 2010

www.ietdl.org
seems to be the best compromise between area and speed for the FPGA implementation. Signal Process. (Special issue on VLSI Arithmetic), 1996, 14, (3) [10]
ZEYDEL B.R., OKLOBDZIJA V.G., MATHEW S., KRISHNAMURTHY R.K.,

References

[1] CURRAN B., MCCREDIE B., SIQAL L., ET AL .: 4GHz+ low-latency xed-point and binary oating-point execution units for the POWER 6 processor. Digest of 2006 IEEE Int. Solid-State Circuits Conf., 2006, pp. 1728 1734 [2] SCHWARZ E.M. : Binary oating-point unit design, in U.S.S. (ED.): High performance energy efcient microprocessor design (Springer, 2006), pp. 189 208 [3] YU X.Y., FLEISCHER B., CHAN Y.H., ET AL .: A 5 GHz+ 128-bit binary oating-point adder for the POWER 6 processor. Proc. Int. Conf. 32nd European Solid-State Circuits, 2006, pp. 166 169 [4] LEOBANDUNG D.M.E., NAYAKAMA H., ET AL .: High performance 65 nm SOI technology with dual stress liner and low capacitance sram cell. Digest of 2005 Symp. on VLSI Technology, 2005 [5] KOGGE P.M., STONE H.S. : A parallel algorithm for the efcient solution of a general class of recurrence equations, IEEE Trans. Comput., 1973, 22, (8), pp. 786 793 [6] LADNER R. , FISCHER M.: Parallel prex computation, J. ACM, 1980, 27, (4), pp. 831 838 [7] KNOWLES S.: A family of adders. Proc. 15th IEEE Symp. on Computer Arithmetic, 2001, pp. 277 281 [8] OKLOBDZIJA V.G., VILLEGER D.: Improving multiplier design by using improved column compression tree and optimized nal adder in CMOS technology, IEEE Trans. VLSI Syst., 1995, 3, (2) [9] STELLING P., OKLOBDZIJA V.G.: Design strategies for optimal hybrid nal adders in a parallel multiplier, J. VLSI

A 90 nm 1 GHz 22 mW 16 16-bit 2s complement multiplier for wireless baseband. Proc. 2003 Symp. on VLSI Circuits, 2003
BORKAR S. :

[11] SHEDLETSKY J.J.: Commenton on the sequential and indeterminate behavior of an end-around-carry adder, IEEE Trans. Comput., 1977, pp. 271 271 [12] ZHANG X.Y., CHAN Y.H., MONTOYE R., ET AL .: A 270 ps 20 mW 108-bit end-around carry adder for multiply-add fused oating point unit, J. Signal Process. Syst., 2009 [13] SKLANSKY J.: Conditional-sum addition logic, IRE Trans. Electronic Comput., 1960, EC-9, pp. 226 231 [14] BRENT R.P., KUNG H.T.: A regular layout for parallel adders, IEEE Trans. Comput., 1982, C, (31), pp. 260 264 [15] HAN T., CARLSON D.: Fast area-efcient VLSI adders. Proc. Eighth Symp. Comp, 1987, pp. 49 56 [16] ZIEGLER M.M. , STAN M.R.: A unied design space for regular parallel prex adders. Proc. Design, Automation and Test in Europe Conf. and Exhibition (DATE04), 2004, pp. 1386 1387 [17] LIU J.H., ZHU Y., ZHU H.K., ET AL .: Optimum prex adders in a comprehensive area, timing and power design space. Proc. 12th Conf. on Asia South Pacic Design Automation (ASP-DAC07), 2007, pp. 609 615 [18] VITOROULIS K., AI-KHALILI A.J. : Performance of parallel prex adders implemented with FPGA technology. IEEE Northeast Workshop on Circuits and Systems, 2007, pp. 498 501 [19] KOREN I.: Computer arithmetic algorithms (A.K. Peters, Natick, MA, 2002)

316 & The Institution of Engineering and Technology 2010

IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 4, pp. 306 316 doi: 10.1049/iet-cdt.2009.0036

MSCDFSM Prog. Guide
No ratings yet
MSCDFSM Prog. Guide
108 pages
5 Grade Math Worksheets Free Printable
0% (2)
5 Grade Math Worksheets Free Printable
3 pages
A 5GHz+ 128-Bit Binary Floating-Point Adder For The POWER6 Processor
No ratings yet
A 5GHz+ 128-Bit Binary Floating-Point Adder For The POWER6 Processor
4 pages
A 5Ghz+ 128-Bit Binary Floating-Point Adder For The Power6 Processor
No ratings yet
A 5Ghz+ 128-Bit Binary Floating-Point Adder For The Power6 Processor
4 pages
Performance analysis of parallel prefix adders developed with field programmable gate array technology
No ratings yet
Performance analysis of parallel prefix adders developed with field programmable gate array technology
8 pages
A 5GHz 128-Bit Binary Floating-Point Adder for the POWER6 Processor
No ratings yet
A 5GHz 128-Bit Binary Floating-Point Adder for the POWER6 Processor
4 pages
Systolic FIR Filter Design With Various Parallel Prefix Adders in FPGA: Performance Analysis
No ratings yet
Systolic FIR Filter Design With Various Parallel Prefix Adders in FPGA: Performance Analysis
5 pages
Design and Estimation of Delay
No ratings yet
Design and Estimation of Delay
65 pages
r2020
No ratings yet
r2020
6 pages
Analysis_of_Parallel_Prefix_Adders_with_Low_Power_and_Higher_Speed
No ratings yet
Analysis_of_Parallel_Prefix_Adders_with_Low_Power_and_Higher_Speed
5 pages
Parallel Prefix Adder
No ratings yet
Parallel Prefix Adder
4 pages
Implementation of A Parallel Prefix Adder Based On Kogge-Stone Tree
No ratings yet
Implementation of A Parallel Prefix Adder Based On Kogge-Stone Tree
4 pages
IJCRT2304688
No ratings yet
IJCRT2304688
8 pages
A Family of Adders: Simon Knowles Element 14, Aztec Centre, Bristol, UK
No ratings yet
A Family of Adders: Simon Knowles Element 14, Aztec Centre, Bristol, UK
8 pages
IJCRT2304688
No ratings yet
IJCRT2304688
7 pages
PRJ p388
No ratings yet
PRJ p388
6 pages
Multiplexer Based Error Efficient Fixed-Width Adder Tree Design For Signal Processing Applications
No ratings yet
Multiplexer Based Error Efficient Fixed-Width Adder Tree Design For Signal Processing Applications
8 pages
Design A Floating-Point Fused Add-Subtract Unit Using Verilog
No ratings yet
Design A Floating-Point Fused Add-Subtract Unit Using Verilog
5 pages
A New Approach To Implement Parallel Prefix Adders in An Fpga
No ratings yet
A New Approach To Implement Parallel Prefix Adders in An Fpga
5 pages
64 Bit Parallel Prefix Adder PDF
No ratings yet
64 Bit Parallel Prefix Adder PDF
4 pages
CDE4118
No ratings yet
CDE4118
8 pages
Abstract-A New Floating-Point Fused Multiply-Add (FMA) Design For The
No ratings yet
Abstract-A New Floating-Point Fused Multiply-Add (FMA) Design For The
5 pages
Design and Implementation of High Frequency 16-Bit Full Adder On FPGA Families
No ratings yet
Design and Implementation of High Frequency 16-Bit Full Adder On FPGA Families
7 pages
Bhattacharjee 2011
No ratings yet
Bhattacharjee 2011
5 pages
DOC-Reducing The Hardware Complexity of A Parallel Prefix Adder
100% (1)
DOC-Reducing The Hardware Complexity of A Parallel Prefix Adder
48 pages
Wa0007.edited
No ratings yet
Wa0007.edited
9 pages
Performance Analysis of Parallel Prefix Adder For Datapath Vlsi Design
No ratings yet
Performance Analysis of Parallel Prefix Adder For Datapath Vlsi Design
4 pages
das2008
No ratings yet
das2008
6 pages
Design of High Speed 128 Bit Parallel Prefix Adders: T.Kiran Kumar, P.Srikanth
No ratings yet
Design of High Speed 128 Bit Parallel Prefix Adders: T.Kiran Kumar, P.Srikanth
4 pages
A 4-Ghz 130-Nm Address Generation Unit With 32-Bit Sparse-Tree Adder Core
No ratings yet
A 4-Ghz 130-Nm Address Generation Unit With 32-Bit Sparse-Tree Adder Core
7 pages
High-Speed Area-Efficient VLSI Architecture of Three-Operand Binary Adder
No ratings yet
High-Speed Area-Efficient VLSI Architecture of Three-Operand Binary Adder
6 pages
Implementation of 32-Bit Wave Pipelining Sparse Tree Adders
No ratings yet
Implementation of 32-Bit Wave Pipelining Sparse Tree Adders
5 pages
9 .Efficient Design For Fixed Width Adder
No ratings yet
9 .Efficient Design For Fixed Width Adder
45 pages
Final Presentation
No ratings yet
Final Presentation
19 pages
64-Bit Prefix Adders: Power-Efficient Topologies and Design Solutions
No ratings yet
64-Bit Prefix Adders: Power-Efficient Topologies and Design Solutions
4 pages
Area_and_Delay_Efficient_Hybrid_Prefix_Adders_for_Residue_Number_System_Applications
No ratings yet
Area_and_Delay_Efficient_Hybrid_Prefix_Adders_for_Residue_Number_System_Applications
5 pages
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
Kogge-Stone Adder
No ratings yet
Kogge-Stone Adder
6 pages
Design of Arithmetic Unit For High Speed Performance Using Vedic Mathematics
No ratings yet
Design of Arithmetic Unit For High Speed Performance Using Vedic Mathematics
6 pages
Low Power Carry Look Adder Design Using FLUT’s FPGA Arithmetic(Fpga_arithmetic)_docs
No ratings yet
Low Power Carry Look Adder Design Using FLUT’s FPGA Arithmetic(Fpga_arithmetic)_docs
45 pages
DOC-Reducing The Hardware Complexity of A Parallel Prefix Adder
100% (1)
DOC-Reducing The Hardware Complexity of A Parallel Prefix Adder
46 pages
A Review On Implementation of Parallel Prefix Adders Using FPGA'S
No ratings yet
A Review On Implementation of Parallel Prefix Adders Using FPGA'S
3 pages
Assignment: - 4: Part - A
No ratings yet
Assignment: - 4: Part - A
9 pages
On-Chip Implementation of High Resolution High Speed Low Area Floating Point AdderSubtractor With Reducing Mean Latency For OFDM Applications
No ratings yet
On-Chip Implementation of High Resolution High Speed Low Area Floating Point AdderSubtractor With Reducing Mean Latency For OFDM Applications
6 pages
Design and Performance Analysis of Various 16 Bit Adders Iom
No ratings yet
Design and Performance Analysis of Various 16 Bit Adders Iom
12 pages
Ijtra1509038 PDF
No ratings yet
Ijtra1509038 PDF
4 pages
major conference paper
No ratings yet
major conference paper
14 pages
4 A_Novel_Design_of_High_Speed_Multiplier_Using_Hybrid_Adder_Technique
No ratings yet
4 A_Novel_Design_of_High_Speed_Multiplier_Using_Hybrid_Adder_Technique
5 pages
Bollepalli Deepthi Addition
No ratings yet
Bollepalli Deepthi Addition
10 pages
The IBM System 360 Model 91 Floating-Point Execution Unit
No ratings yet
The IBM System 360 Model 91 Floating-Point Execution Unit
20 pages
Dte Final Report
No ratings yet
Dte Final Report
11 pages
Superset Adder Paper
No ratings yet
Superset Adder Paper
5 pages
Base Paper
No ratings yet
Base Paper
8 pages
Asynchronous Hybrid Kogge-Stone Structure Carry Select Adder Based IEEE-754 Double-Precision Floating-Point Adder
No ratings yet
Asynchronous Hybrid Kogge-Stone Structure Carry Select Adder Based IEEE-754 Double-Precision Floating-Point Adder
8 pages
An Efficient Model For Design of 64-Bit High Speed Parallel Prefix VLSI Adder
No ratings yet
An Efficient Model For Design of 64-Bit High Speed Parallel Prefix VLSI Adder
5 pages
PP Adder
No ratings yet
PP Adder
33 pages
Major Conference Paper
No ratings yet
Major Conference Paper
14 pages
Thesis (1)
No ratings yet
Thesis (1)
41 pages
TCASI New Adder
No ratings yet
TCASI New Adder
11 pages
Vedic BasedSquaringCircuitUsingParallelPrefixAdders
No ratings yet
Vedic BasedSquaringCircuitUsingParallelPrefixAdders
6 pages
Design of 16-Bit Adder Structures - Performance Comparison
No ratings yet
Design of 16-Bit Adder Structures - Performance Comparison
14 pages
Digital Signal Processing With Field Programmable Gate Arrays
No ratings yet
Digital Signal Processing With Field Programmable Gate Arrays
42 pages
Design Manual For: Quality Housing
No ratings yet
Design Manual For: Quality Housing
182 pages
9 Competencies Social Work
No ratings yet
9 Competencies Social Work
9 pages
C Questions and Answer-Libre
0% (1)
C Questions and Answer-Libre
215 pages
Letters and Emails
No ratings yet
Letters and Emails
9 pages
How To Install Windows XP (With Pictures) - WikiHow
No ratings yet
How To Install Windows XP (With Pictures) - WikiHow
7 pages
Between The Folds - The World's Most Remarkable Origami
100% (1)
Between The Folds - The World's Most Remarkable Origami
8 pages
Creo Lesson - Drawing Layout and Views
100% (1)
Creo Lesson - Drawing Layout and Views
83 pages
Customers' Social Identity Matter?
No ratings yet
Customers' Social Identity Matter?
16 pages
The Customer Empowerd
No ratings yet
The Customer Empowerd
9 pages
Gms Case Study Final
No ratings yet
Gms Case Study Final
3 pages
39 QC Job Description
No ratings yet
39 QC Job Description
24 pages
Crystalpbx Programmingmanual-Rv03
No ratings yet
Crystalpbx Programmingmanual-Rv03
66 pages
Plant Floor Visibility and MES Selection
No ratings yet
Plant Floor Visibility and MES Selection
50 pages
School: Address: District: Date: Type of Test Teacher School Head
No ratings yet
School: Address: District: Date: Type of Test Teacher School Head
20 pages
Who Knows About Drawing Scale and Drawing Scale Factor in Autocad
No ratings yet
Who Knows About Drawing Scale and Drawing Scale Factor in Autocad
15 pages
Fundamentals of Database Systems
No ratings yet
Fundamentals of Database Systems
37 pages
Claim of Fact Essay
100% (2)
Claim of Fact Essay
9 pages
Basic HEC-RAS Tutorial - Hatari Labs
No ratings yet
Basic HEC-RAS Tutorial - Hatari Labs
6 pages
May 22 DLP
No ratings yet
May 22 DLP
7 pages
Komunikasi Organisasi Kepala Desa Dalam Membangun Kesadaran Masyarakat Desa Terhadap Pembangunan Desa
No ratings yet
Komunikasi Organisasi Kepala Desa Dalam Membangun Kesadaran Masyarakat Desa Terhadap Pembangunan Desa
14 pages
Linux Networking Tools 1567258207 PDF
No ratings yet
Linux Networking Tools 1567258207 PDF
1 page
Working at Height Risk Assessment
0% (1)
Working at Height Risk Assessment
9 pages
Greene - Chap 9
No ratings yet
Greene - Chap 9
2 pages
Westland Community Analysis
No ratings yet
Westland Community Analysis
17 pages
The Posttraumatic Cognitions Inventory (PTCI) : Development and Validation
No ratings yet
The Posttraumatic Cognitions Inventory (PTCI) : Development and Validation
12 pages
Lads On Billings 1998
No ratings yet
Lads On Billings 1998
18 pages
Employer Branding Essentials Guide
No ratings yet
Employer Branding Essentials Guide
4 pages
LogicDev LD
No ratings yet
LogicDev LD
572 pages

Field Programmable Gate Array Prototyping of End-Around Carry Parallel Prefix Tree Architectures

Uploaded by

Field Programmable Gate Array Prototyping of End-Around Carry Parallel Prefix Tree Architectures

Uploaded by

www.ietdl.

3 Thirty-two-bit adder block in EAC adder

& The Institution of Engineering and Technology 2010

Complete EAC adder design

& The Institution of Engineering and Technology 2010

4.1 Integrating addition and subtraction

4.2 EAC logic unit

& The Institution of Engineering and Technology 2010

= G63:32 + P63:32 G31:0 c3 = G95:64 + P95:64 G63:32 + P95:64 P63:32 G31:0

sum127:0 = x.x + y.y + 0.

5.1 Parallel prex tree design in EAC adder

Implementation and validation

Figure 8 Eight-bit Brent Kung tree 313

& The Institution of Engineering and Technology 2010

5.2 Experimental results

Figure 12 Routing delay (ns)

& The Institution of Engineering and Technology 2010

316 & The Institution of Engineering and Technology 2010

You might also like