0% found this document useful (0 votes)
20 views6 pages

Camus Dac16

Uploaded by

pskumarvlsipd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

Camus Dac16

Uploaded by

pskumarvlsipd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Low-power Carry Cut-Back Approximate Adder with

Fixed-point Implementation and Floating-point Precision


Vincent Camus, Jeremy Schlachter, Christian Enz
Integrated Circuits Laboratory (ICLAB)
Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland
vincent.camus@epfl.ch

ABSTRACT variability requires complex and conflicting design constraints.


This paper introduces an approximate adder architecture As a result, designers are being pushed to seek new energy-
based on a digital quasi-feedback technique called Carry Cut- efficient computing techniques to meet the increasing demand
Back in which high-significance stages can cut the carry prop- of data processing.
agation chain at lower-significance positions. This lightweight Error tolerance, i.e. accepting error in a design to save re-
approach prevents activation of the critical path, improving sources, is a well-known concept existing in many abstraction
energy efficiency while guaranteeing low worst-case relative layers, from physical sensors and analog circuits to software
error. It offers a degree of freedom which allows to dissociate design. Built on these ideas, approximate computing [1, 2] has
precision and dynamic range in fixed-point implementation. emerged as a promising candidate to improve performance
A design methodology is presented along with results and a and energy efficiency and sustain technology scaling. Design-
comparative study. For a worst-case accuracy of 98 %, energy ing approximate circuits explores a new trade-off, not only
savings up to 44 % and power-delay-area reductions up to by accepting unreliability, but by intentionally introducing
62 % are demonstrated compared to low-power conventional errors to overcome limitations of traditional design. With the
designs. exploding amount of data being processed in the cloud and
on mobile devices, a wide range of applications can trade a
Categories and Subject Descriptors little accuracy without compromising the functionality or the
user experience. In image or video processing applications,
B.5.1 [Register-Transfer-Level Implementation]: De- a small proportion of errors stays imperceptible to humans.
sign—arithmetic and logic units, data-path design; G.1.0 The growing demand for statistical algorithms such as data
[Numerical Analysis]: General—multiple precision arith- mining, recognition, search or machine learning represents
metic, error analysis another opportunity to compute in an approximate way as
the outcome of those applications is not required to be a sin-
Keywords gle golden result, but an adequate match. Finally, iterative
Approximate adders, error tolerance, approximate computing, applications like vision and tracking are inherently resilient
approximate circuit design, low-power digital circuits to errors since those can be compensated in the succeeding
frames or steps.
Acknowledgments To design approximate circuits, several approaches have
been investigated at different levels of hardware design, such
This work has been supported by the NanoTera IcySoC
as voltage-frequency over-scaling [3] at physical level, Gate-
project of the Swiss National Science Foundation.
Level Pruning [4] at circuit level, or Significance Driven
Computation [5] at algorithmic level. Another way consists
1. INTRODUCTION in redesigning the architecture of combinational circuits into
Density, speed and energy efficiency of integrated circuits an approximate version with smaller delay, area or power
have been increasing exponentially for the last four decades consumption. This technique is particularly suited for arith-
following Gordon Moore’s remarkable prediction. However, metic operators such as adders. Trading numerical accuracy
power and reliability pose several challenges to the future for circuit efficiency is not a new trend, it exists since the be-
of technology scaling. Power has definitely emerged as a ginning of digital electronics with the discretization of signals
critical concern due to the poor scaling of Vdd and Vth , while into binary data, first in fixed-point arithmetic, and later in
transistor miniaturization reaching atomic scale has led to Floating-Point Units (FPU). From that time on, those two
tremendous Process-Voltage-Temperature (PVT) variations. standards have been at the heart of DSPs and hardware accel-
Unfortunately, achieving low-power and robustness against erators. Even though FPU features a superior computational
ability and greater dynamic range, fixed-point arithmetic
Permission to make digital or hard copies of all or part of this work for personal or continues to be the mainstay in industry by offering a lower
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
hardware complexity and power consumption.
on the first page. Copyrights for components of this work owned by others than ACM This paper introduces a novel approximate adder architec-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, ture optimizing circuit timing together with arithmetic preci-
to post on servers or to redistribute to lists, requires prior specific permission and/or a sion thanks to a new technique called Carry Cut-Back (CCB).
fee. Request permissions from [email protected].
Using a quasi-feedback between two carry-chain positions, it
DAC ’16, June 05-09, 2016, Austin, TX, USA
prevents the critical-path activation, therefore relaxing the

c 2016 ACM. ISBN 978-1-4503-4236-0/16/06. . . $15.00
timing constraints in the entire design and strongly improving
DOI: http://dx.doi.org/10.1145/2897937.2897964
An-1-An-x An-x-1-An-2x An-2x-1-An-3x Ax-1-A0
Bn-1-Bn-x Bn-x-1-Bn-2x Bn-2x-1-Bn-3x Bx-1-B0

cut cut cut


PROP SPEC PROP SPEC PROP SPEC

ADD ADD ADD ADD

Sn-1-Sn-x Sn-x-1-Sn-2x Sn-2x-1-Sn-3x Sx-1-S0

Figure 1: General block diagram of the proposed Carry Cut-Back (CCB) approximate adder.

the circuit efficiency. With this approach, high-significance 3. CARRY CUT-BACK ADDER
carry stages are monitored to cut the carry propagation chain
at lower-significance positions, guaranteeing low relative er- 3.1 Proposed Architecture
rors of floating-point type. A brief design methodology is The structural diagram of the proposed adder is depicted
presented together with results and a comparative analysis in Fig. 1. The CCB adder is based on a conventional fixed-
for both high-performance and low-power implementations. point adder circuit (formed by the chain of ADD blocks)
with insertion of several multiplexers that can cut the carry
2. RELATED WORK propagation chain to shorten the effective critical path. The
Adders are the most common arithmetic blocks used in decision to cut the chain is taken by the carry propagate
DSPs, thus many attempts have been made to build them block (PROP) by generating the cut signal. The cut occurs
in an approximate manner. At architectural level, an in- upstream in the carry chain, by multiplexing the real carry
teresting way to build approximate adders is to use carry with a carry speculated from a short chain in an optional carry
speculation [6]. This technique exploits the fact that carry speculator block (SPEC). Taking place at a lower-significance
propagate sequences in additions are typically short, making stage, the carry cut-back guarantees a low relative error. The
it possible to estimate, more or less accurately, an intermedi- cut-back module appears functionally as a feedback between
ate carry using a limited number of previous stages. Thus, two carry-chain positions, but is not a recursive loop as it
the carry chain, critical path of the circuit, can be split into uses the local carry propagate and generate signals directly
two or more shorter paths, relaxing the constraints over the precomputed from the operand inputs. Hence, it cannot
entire design and pushing energy, delay and area beyond the influence the stability of the circuit.
limits imposed by traditional design. The main advantage of this approach remains in its timing
A number of speculative adders have been proposed in characteristic. Typically, the carry propagation chain of an
literature based on the ETAII concept [7]. It consists in addition is naturally broken because of short-size operands or
slicing the addition into regular sub-adder blocks with in- by the distribution of the input bits. Spanning over the whole
put carries speculated in a carry lookahead approach. The adder length, the critical path is only activated if all the
ETBA [8], direct descendant of the ETAII, uses an error stages are in propagate mode. Even if the adder within the
balancing technique based on multiplexers to mitigate the CCB architecture physically contains the entire carry chain
relative error in case of incorrect carry speculation. The through the ADD and multiplexers, this path can never be
ETAIV [9] and CSA [10] have enhanced the error rate and activated. By monitoring one or more stages of the adder, the
mean by chaining several blocks for carry speculation, but at PROP quickly detects such risk and calls a shorter path to be
the cost of increased complexity and energy. The ISA [11] used instead, ensuring that the design meets tighter timing
has generalized and optimized the architecture of speculative constraints. The shorter path can either be a carry speculated
compensated adders by shortening the speculation overhead from the SPEC or a straight cut using a monotonic gate, such
and by introducing a dual-direction compensation mecha- as the OR-cut of the adder implemented in Fig. 2 (cut = 1
nism that improves both circuit performances and accuracy. dictates the OR output regardless of its second input).
However, the multiplexers required for a good error compen- Fig. 2 shows a case study of the longest propagation chains
sation still represent a substantial area and energy overhead, that can flow through a CCB adder built with two cut-
particularly for low-speed implementations. backs. Each cut-back module splits the carry chain with two
With the progressive increase of accuracy and design flexi- possibilities:
bility, speculative adders have repeatedly been proven to be • cut = 0: No deliberate carry-cut in the typical case, i.e.
usable in specific applications [8, 10, 12, 13]. Despite all this the carry chain is naturally broken somewhere in the
progress, they have not yet been adopted by designers, the PROP. The critical path is limited since it cannot entirely
main reason being that occurrence and impact of errors in cross over the PROP. The case 1 in Fig. 2 shows two
real-life applications are not easy to anticipate today. Error examples of such behavior.
metrics are often based on full-range number assumption and • cut = 1: All the stages within the PROP are in propagate
uniform distribution that neither fit typical use nor worst mode. The carry chain necessarily propagates through the
possible cases. Moreover, applications and case studies re- PROP and there is a risk of long critical-path activation
main singular examples. The goal of this work is to propose if the other non-monitored stages are also propagates.
a more efficient adder architecture that can ensure that all Therefore, by intentionally cutting the carry chain, its
errors remain below an upper bound to be integrated without maximum length still remains limited. The case 2 in
uncertainty in codecs and hardware accelerators. Fig. 2 shows two examples of such behavior.
PROP PROP D P K FXW  G P  P P FXW  P P  3523DQG63(&
cut2 cut1 ZLWKLQSXWJXHVV
                
MSB LSB 2SHUDQGV
                
                  ,QH[DFWVXP
ADDlast ADD2 ADD1 ADD3 ADD2 ADD1 ADDfirst
                  ([SHFWHGVXP
Case 1 cut2 0
cut1 0
E G FXW  P FXW  P FXW  3523DQG25FXW
Case 2 cut1 1                
cut2 1 2SHUDQGV
               
Case 3 cut2 1 cut1 0                  ,QH[DFWVXP

                 ([SHFWHGVXP
Case 4 cut2 0 cut1 1
Figure 3: Example of CCB addition arithmetic with
(a) 2-bit PROP and SPEC, (b) 1-bit PROP OR-cut.
Longest chains
3.3 Floating-point Precision
Figure 2: Diagram of the longest carry chains and
resulting effective critical paths in the example of an Error propagation
implementation of CCB adder with two OR-cuts. It is interesting to note from Fig. 3 that the error caused
by the cut can propagate on many bits, but seems to keep
Cases 3 and 4 both contain one naturally broken chain the magnitude of the carry cut-back position, to wit, the first
(cut = 0) and one intentional cut (cut = 1). wrong bit. This statement that seems straightforward with
Despite the fact that the full carry chain physically exists the example has to be demonstrated carefully. Indeed, a
in the design, no input combination can activate it from successive series of erroneous sum bits can result in different
the start to the end. It is a false path and can therefore errors. Let Si , Ci and Pi denote the sum, carry-in and
be excluded from the timing analysis. The effective critical propagate signals of the ith stage addition, respectively. The
paths in Fig. 2 sum up the longest propagate chains that sum and carry propagation are defined by:
can occur in the circuit among the different cases. Insertion
of more carry cut-back modules, possibly overlapping each S i = Pi ⊕ C i (1)
others, would lead to shorter effective critical paths. Pi = 1 =⇒ Ci+1 = Ci . (2)

3.2 Arithmetic and Errors Assume a carry error at the ith bit of the adder, with an
erroneous carry of value Cerr . The sum bit and the carry-out
The CCB addition arithmetic is illustrated in Fig. 3. Errors
depend on the value of Pi . If Pi = 1, (1) gives Si = Cerr
only occur with the concurrence of three factors:
and (2) propagates the wrong carry Cerr to the next stage,
• Sequence of propagate signals spanning the entire PROP where the same formulae apply again. If Pi = 0, (1) gives
bit-width, triggering the cut. Si = Cerr and the wrong carry is not propagated, so the
• Sequence of propagate signals spanning the entire SPEC next stage addition is correct. Assuming that the erroneous
bit-width, making the exact carry prediction impossible sum spreads from the mth to the pth stage, the error pattern
with the SPEC bits only. appears as shown in Fig. 4:
• Wrong guess of the carry that inputs the SPEC (Fig. 3a)
Erroneous sum bits
or that directly substitutes for the real carry (Fig. 3b).
This occurs with a 50 % binary probability.
An error occurs in the right-hand path of Fig. 3a because Cerr Cerr Cerr ... Cerr
of the simultaneous occurrence of the three aforementioned p p–1 p–2 i m
properties. In the OR-cut implementation of Fig. 3b (without Figure 4: Balanced error pattern
SPEC), the cut signal is also the guessed carry. The first
condition of error occurrence is met for the two right-hand Just as in Fig. 3, where the error patterns are shown in red,
paths. The guess unintentionally follows the real carry and the last faulty bit counterbalances the first ones and the
leads to a correct sum in the central path, but happens to absolute error value is reduced to:
be wrong and leads to a faulty sum in the right-hand path.
2p − 2p−1 − 2p−2 − · · · − 2m = 2m . (3)
Occurrence of an error implies that one or both operands
have non-zero bits at the PROP position. As the error occurs This result is valid if the carry propagates normally. But
at the carry cut, at a lower-significance position, the expected there can be more than one cut-back module, and if all the
sum is necessarily much larger than the introduced error. In stages between two cut-backs propagate, it could disrupt
the computation of Fig. 3a, the absolute error is 16 while the normal propagation driven by (2). Thus, the previous
the expected sum is 43,265 so the relative error is 0.04 %. result needs to be recomputed for that case. Assume the
In the example of Fig. 3b, the relative error is only 0.006 %. same carry error (Ci = Cerr ) in a propagating stage (Pi = 1,
Such low relative errors are typical in speculative adders for else there would be no carry-chain perturbation). If another
calculations involving large value operands. However, it is cut-back happens to guess the same faulty carry Cerr , then it
the worst case that gives the upper-bound relative error and transparently follows (2) and the previous result holds. But
defines the minimum floating-point precision of the adder. if the carry-cut is in the opposite direction Cerr , as it runs
against (2), it reverses the error: the carry, that was false 3.4 Design and Implementation Strategy
until now, comes back to the value of the expected addition, The CCB technique allows considerable improvements
so the next stage is correct. But the current sum, determined concurrently in circuit performance and error control. This
by (1), is Si = Cerr . The error pattern appears this time as: section describes how to exploit its architectural advantages.
Erroneous sum bits
Design implementation
Cerr Cerr Cerr ... Cerr
Both PROP and SPEC can be implemented in a carry-
lookahead approach and should have very short bit-widths
p p–1 p–2 i m to limit overheads. Their areas can fortunately be balanced
Figure 5: Unbalanced error pattern as the adder segments that they overlay can be cut down
to simple sum generators. Moreover, the delay overhead is
All the erroneous bits are in the same direction and the
limited by the slowest between PROP and SPEC since they
absolute error is simply their sum:
are executed in parallel.
2p + 2p−1 + 2p−2 + · · · + 2m = 2p+1 − 2m . (4) The CCB adder physically contains the entire adder carry
chain but the cut-back scheme prevents from its activation
This error is of much higher magnitude than in the first case,
and splits the critical path into multiple shorter paths. How-
but can only occur if several carry-cuts happen in opposite
ever, long-established Static Timing Analysis (STA) used in
directions. To avoid such dramatic errors, the SPEC guess or
synthesis tools cannot easily identify those timing exceptions.
the straight carry-cut must be chosen in the same direction
It is thus necessary to provide the tools with additional tim-
for all the CCB modules of the adder.
ing constraints to manually exclude from the timing analysis
Worst-case relative error all the false paths generated by the CCB modules. This
additional information prevents the synthesis tools from un-
Having validated the fact that any error has the magnitude necessarily trying to meet delay constraints on them.
of the cut-back bit that caused it, the low impact of the error
on the expected sum should be demonstrated. Delay optimization
The worst case happens when the error magnitude is the
Assuming identical cut-back modules inserted in the adder
highest on the lowest expected calculation. Occurrence of an
and with the notation of Fig. 2, the effective critical-path
error implies that the three factors mentioned in section 3.2
delays are:
are realized, this assumes that the PROP and SPEC inter- ⎧
cept propagate signals only. All the non-zero operand bits ⎪
⎪ tfirst = tADD1 + tADD2 + tADDfirst
producing those propagates add up to the expected sum: ⎪



• Standing at higher-significance positions than the carry ⎪
⎨ tmid = 2 (tADD1 + tADD2 ) + tADD3 + tmux
error, the PROP non-zero bits significantly contribute to + max(tSPEC , tPROP ) (7)


maximizing the expected result and thus to minimizing ⎪


⎪ tlast = tADD1 + tADD2 + tADDlast + tmux
the worst-case relative error. ⎪

• Positioned directly before the carry-cut, the SPEC non- + max(tSPEC , tPROP )
zero bits contribute in a lower extent to increasing the where tfirst and tlast are the two boundary-path delays, and
sum by attenuating a portion of the magnitude of the tmid is the delay of the all the intermediate paths. The
error. Although, they participate equally with the PROP multiplexer delay tmux can eventually be replaced by the
bits in reducing the rate of errors. delay of the straight-cut gate.
• When the SPEC guess or the straight carry-cut is 0, i.e. In order to optimize the timing budget, it is possible to
speculating a low carry, an error replaces a real carry equalize those paths by sizing ADDfirst and ADDlast , and
at state 1 coming from a generate stage. Added to the to equalize the SPEC and PROP bit-widths. Note that the
SPEC bits, this stage further increases the sum to 2m . PROP and ADD2 blocks are proportionate as they cover the
Whenever a carry-cut error occurs, while it keeps the mag- same adder segment, the first as carry-lookahead and the
nitude of the cut bit significance, i.e. an arithmetic error of latter as sum generator.
value 2m , the sum is always expected to be greater than:
 k  k  k Error optimization
2m + 2 and 2 + 2 , (5) The CCB adder enables to dissociate the precision from
k∈PROP k∈SPEC k∈PROP
the dynamic range of the adder, which is fixed by the total
leading to a relative error lower than: adder bit-width. It offers a large design space to minimize
2m 2m the application quality loss and maximize the savings by
 k and  k  k, (6) trading off mean, maximum and rate of errors, configurable
2m + 2 2 + 2 by choosing positions and bit-widths of the CCB modules.
k∈PROP k∈SPEC k∈PROP
The error rate depends on the number of cut-back modules
in the cases where the carry guess is at 0 and 1, respectively. and of the PROP and SPEC bit-widths. The maximum error
This result holds if multiple errors occur in different carry-cut can be adjusted mainly by sizing the PROP bit-width and
modules as the ratio of error over sum is preserved. positioning the carry-cut (i.e. sizing ADD1 ), and to a lesser
A floating-point precision is thus configurable at design extent by modifying the SPEC bit-width and input guess.
time by sizing and positioning PROP and SPEC and selecting Optimum trade-offs to adjust Signal-to-Noise Ratio (SNR),
the carry guess. It is easy to verify that the worst-case relative Root Mean Square (RMS) error or any other accuracy metric
error is 7.7 % for the example in Fig. 3a and 12.5 % in Fig. 3b, can be achieved using the same models than those built for
corresponding to precisions between 4 and 5 bits. speculative adders [12, 13].
1.2 100 1.2 10
PDAP REMAX (%) PDAP REMAX (%)
Energy RERMS (%) Energy RERMS (%)
1 10 1 1

Relative errors (%)

Relative errors (%)


-1
Normalized costs

Normalized costs
0.8 1 0.8 10

-1 -2
0.6 10 0.6 10

-2 -3
0.4 10 0.4 10

0.2 10-3 0.2 10-4

-4 -5
0 10 0 10
,1, 0)
1,1 )
1,1 )
2,1 )
1,2 )
1,2 )
3,1 )
3,1 )
2,2 )
4,1 )
3,2 )
5,1 )
4,2 )
(1, ,2,0)

5,1 )
4,2 )
3,3 )
5,2 )
7,1 )
8,1 )
7,2 )
6,3 )
(1, ,0,0)

)
)
(1, ,-,0)

)
)
)
ct

4,1 t
(7, ,0,0
(6, ,2,0
(6, ,2,0
(8, ,1,0
(7, ,0,0
(7, ,0,0
(3, ,0,0
(4, ,2,0
(3, ,2,0
(4, ,2,0
(2, ,2,0
(2, ,3,0

,-,0

(3, ,0,0
(4, ,0,0
(3, ,0,0
(2, ,0,0
(2, ,2,0
(2, ,0,0
(2, ,0,0
(2, ,0,0

(1, ,1,-,0
(1, ,1,-,0

(1, ,4,-,0
(1, ,3,-,0
-,1
(4, xac
(10 Exa
(10 ,1,1,

,3,
2,4

,2
E
1

9
10
10
9
11
12
,1

Figure 6: Relative errors and normalized costs of Figure 7: Relative errors and normalized costs of
various 32-bit CCB adders (on the horizontal axis) various 32-bit CCB adders (on the horizontal axis)
synthesized at 3.3 GHz in a 65 nm technology. synthesized at 0.8 GHz in a 65 nm technology.

4. RESULTS AND COMPARATIVE STUDY 4.2 CCB Adder Results


4.1 General Considerations Error characteristics and normalized costs in terms of
energy and Power-Delay-Area Product (PDAP) are shown
Metrics for a selection of CCB adders at 3.3 GHz in Fig. 6 and at
The metrics used to characterize approximate adders in 0.8 GHz in Fig. 7. Costs are normalized to the exact adder
this work are based on the relative error (RE), which has represented on the left of the figures. CCB adders are denoted
the advantage of being independent of the size of the adder. by quintuples of bit-widths: (number of cut-backs, ADD1 ,
It is defined as: PROP, ADD3 , SPEC) assuming a regular block structure and
  the optimizations described in 3.4. These figures highlight
 S approx − S exact 
RE =  
 (8) the large design space and error engineering possibilities
S exact
enabled by the proposed adder. The CCB design parameters
where S approx and S exact are the approximate and correct allow to tune the precision on more than three orders of
sums of an addition, respectively. magnitude of errors with optimal circuit efficiency.
The main metric considered is the maximum of the relative Timing constraints have a significant influence on the
error (REMAX ) that delimits the minimum precision of the results. At equivalent precision, low-speed implementations
circuit. The RMS of the relative error (RERMS ) is also taken show better savings than high-performance ones compared
into account as it is proportional to the SNR and interesting to the exact adders. At 2 % REMAX , CCB adders at 3.3 GHz
for many applications, particularly in multimedia processing. achieve 14 % energy savings and 27 % PDAP reductions
against 44 % and 62 % for adders at 0.8 GHz. This is due to
Methodology the fact that high-speed circuits require more CCB modules
Approximate adders are commonly characterized and vali- to split the carry chain into smaller pieces, but at the cost
dated through the simulation of random sets of inputs. As a of additional hardware overhead.
matter of fact, the presented results are statistical estimations Fig. 7 presents a small but sharp drop in circuit efficiency
depending on the random sample distribution (occurrence at 1.7 % REMAX . This corresponds to the precision from
of specific patterns initiates errors in specific adders). In which the design becomes delay constrained. Indeed, higher
this work, adders are characterized using two samples of five precision demands wider PROP, SPEC and ADD1 which all
million unsigned random inputs. First, a logarithmically lie in the effective critical path. This does not appear for
uniform distribution exhibiting a very large dynamic range is 3.3 GHz adders which are always tightly constrained.
used to detect the worst-case error REMAX . Then, a uniform Note that RERMS and REMAX follow the same trend, but
distribution is used to estimate RERMS . with a larger variability for high-speed and low-precision
In this work, several 32-bit approximate adders have been adders. Those generally contain several cut-back modules,
synthesized for low-power (0.8 GHz) and high-performance so a small change in their structure repeated over many of
(3.3 GHz) in an industrial 65 nm technology. Over 5000 im- them strongly impacts the overall error rate and mean.
plementations with diverse error characteristics have been
investigated by varying design parameters. All circuits have 4.3 Comparative Study
been generated with regular block structures from high-level Tables 1 and 2 compare the costs and PDAP of the CCB
descriptions in order to benefit from the compiler’s optimiza- adder with other approximate adders at 3.3 GHz and 0.8 GHz.
tion libraries and most favorable architecture choices to fit Only ETBA [8] and ISA [11] are shown for comparison
each timing constraint. Delay, area and power have been as they exhibit good savings for a bit-width of 32 bits.
estimated using Synopsys Design Compiler. A modified ETBA has been built with fixed carry guess
Table 1: Comparison of 32-bit adders at 3.3 GHz critical-path activation, therefore relaxing timing constraints
REMAX Energy Area and strongly improving circuit efficiency. In this approach,
Architecture PDAP
(%) (fJ) (μm2 ) high-significance carry stages are monitored to cut the carry
Exact 0 47 910 424 chain at lower-significance positions to guarantee a precision
ETBA modified (2) 24 497 117 of floating-point type with a marginal overhead.
ISA (2, 0, 1, 2) 35 13 317 41 For a worst-case relative error of 2 %, the results for 32-bit
CCB adder (7, 1, 1, 2, 0) 22 472 102
adders show energy savings up to 44 % and PDAP reductions
ETBA modified (4) 38 738 281
ISA (4, 2, 2, 2) 6 37 689 252
of up to 62 % compared to low-power conventional circuits.
CCB adder (3, 4, 1, 2, 0) 35 643 223 Besides, the proposed adder surpasses the state-of-the-art
ISA (16, 0, 5, 4) 46 826 378 approximate adders, performing up to 30 % better than the
3
CCB adder (2, 5, 1, 3, 0) 38 749 287 ISA and 45 % better than the ETBA in term of PDAP.
Thanks to the instinctive floating-point precision which
Table 2: Comparison of 32-bit adders at 0.8 GHz ensures that all errors remain below an upper bound, this ap-
REMAX Energy Area proximate adder could help designing low-power and highly-
Architecture PDAP
(%) (fJ) (μm2 ) efficient hardware accelerators with a more acceptable and
Exact 0 79 401 316 predictable impact on their accuracy.
ETBA modified (4) 50 380 187
ISA (8, 4, 0, 4) 6 49 309 150 6. REFERENCES
CCB adder (4, 4, 1, 0, 0) 41 253 103
[1] C. M. Kirsch and H. Payer. Incorrect Systems: It’s not the
ETBA modified (8) 82 451 370 Problem, It’s the Solution. In Design Automation
ISA (8, 4, 4, 4) 0.4 69 379 263 Conference (DAC), 2012 49th ACM/EDAC/IEEE, pages
CCB adder (2, 8, 1, 0, 0) 62 334 216 913–917, June 2012.
ISA (16, 3, 7, 2) 0.2 78 401 313 [2] K. Palem and A. Lingamneni. Ten Years of Building Broken
CCB adder (1, 10, 1, -, 0) 0.1 68 362 246 Chips: The Physics and Engineering of Inexact Computing.
In ACM Transactions on Embedded Computing Systems,
volume 12, pages 87:1–87:23, May 2013.
as the original variable guess was weakening its efficiency.
[3] S. Ghosh, S. Bhunia, and K. Roy. CRISTA: A New
For given REMAX , the best implementation of each archi- Paradigm for Low-power, Variation-tolerant, and Adaptive
tecture has been selected. All structures are regular and Circuit Synthesis Using Critical Path Isolation. In IEEE
denoted by n-tuples of bit-widths: (block size) for ETBA, Trans. on Computer-Aided Design of Integrated Circuits and
(block size, SPEC, correction, reduction) for ISA and as al- Systems, volume 26, pages 1947–1956, Nov 2007.
[4] J. Schlachter, V. Camus, C. Enz, and K. Palem. Automatic
ready stated for CCB adders. Generation of Inexact Digital Circuits by Gate-Level
Among high-performance adders (Table 1), CCB and Pruning. In Circuits and Systems (ISCAS), 2015 IEEE
ETBA architectures are completely overtaken by the ISA International Symposium on, pages 173–176, May 2015.
for very low precisions (35 % REMAX ). In this case, the [5] D. Mohapatra, G. Karakonstantis, and K. Roy. Significance
minimal architecture of the ISA optimally fits the difficult Driven Computation: A Voltage-scalable, Variation-aware,
Quality-tuning Motion Estimator. In Low Power Electronics
delay constraint without loss of circuit efficiency. The situa- and Design (ISLPED), 2009 ACM/IEEE International
tion reverses at higher accuracy, for which the need of wider Symposium on, pages 195–200, Aug 2009.
speculation and compensation hardware in the critical path [6] T. Liu and S.-L. Lu. Performance Improvement with
reduces the efficiency of ETBA and ISA. At 6 % REMAX , Circuit-level Speculation. In Microarchitecture (MICRO-33),
2000 33rd Annual IEEE/ACM International Symposium on,
the CCB adder performs 11 % better than the ISA and 21 % pages 348–355, Dec 2000.
better than the ETBA in term of PDAP. Increasing the [7] N. Zhu, W.-L. Goh, and K.-S. Yeo. An Enhanced Low-power
precision to 3 % REMAX widens the gap with the CCB adder High-speed Adder For Error-tolerant Application. In
performing 24 % better than the ISA. Integrated Circuits (ISIC), 12th IEEE International
For low-power implementations (Table 2), the CCB adder Symposium on, pages 69–72, Dec 2009.
[8] M. Weber, M. Putic, H. Zhang, J. Lach, and J. Huang.
always outperforms the state-of-the-art. Indeed, low speed Balancing Adder for Error Tolerant Applications. In Circuits
allows smaller and more energy-efficient architectures to and Systems (ISCAS), 2013 IEEE International Symposium
be used in the addition sub-blocks. The speculation and on, pages 3038–3041, May 2013.
compensation blocks of ISA and ETBA thus become a large [9] N. Zhu, W.-L. Goh, G. Wang, and K.-S. Yeo. Enhanced
area and energy overhead. Thanks to its lightweight cut- Low-power High-speed Adder for Error-tolerant Application.
In SoC Design Conference (ISOCC), 2010 IEEE
back mechanism, CCB architectures exhibit 18-30 % PDAP International, pages 323–327, Nov 2010.
reductions compared to ISA and 40-45 % compared to ETBA [10] Y. Kim, Y. Zhang, and P. Li. An Energy Efficient
while maintaining equal or greater precision. Approximate Adder with Carry Skip for Error Resilient
Moreover, while circuit savings of ISA and ETBA pro- Neuromorphic VLSI Systems. In Computer-Aided Design
(ICCAD), 2013 IEEE/ACM International Conference on,
gressively disappear at higher accuracy compared to the pages 130–137, Nov 2013.
exact adder, the CCB architecture still offers significant sav- [11] V. Camus, J. Schlachter, and C. Enz. Energy-efficient
ings. Up to 14 % energy savings and 22 % PDAP reductions Inexact Speculative Adder with High Performance and
are demonstrated for 0.1 % REMAX , corresponding to 11-bit Accuracy Control. In Circuits and Systems (ISCAS), 2015
precision, i.e. the mantissa precision of a 16-bit FPU. IEEE International Symposium on, pages 45–48, May 2015.
[12] C. Liu, J. Han, and F. Lombardi. An Analytical Framework
for Evaluating the Error Characteristics of Approximate
5. CONCLUSION Adders. In IEEE Transactions on Computers, volume 64,
This paper has introduced a novel architecture of approx- pages 1268–1281, May 2015.
imate adder optimizing circuit timing together with arith- [13] J. Miao, K. He, A. Gerstlauer, and M. Orshansky. Modeling
and Synthesis of Quality-energy Optimal Approximate
metic precision. By performing a quasi-feedback in the carry Adders. In Computer-Aided Design (ICCAD), 2012
chain, the Carry-Cut Back (CCB) technique prevents the IEEE/ACM International Conference on, Nov 2012.

You might also like