Camus Dac16
Camus Dac16
Figure 1: General block diagram of the proposed Carry Cut-Back (CCB) approximate adder.
the circuit efficiency. With this approach, high-significance 3. CARRY CUT-BACK ADDER
carry stages are monitored to cut the carry propagation chain
at lower-significance positions, guaranteeing low relative er- 3.1 Proposed Architecture
rors of floating-point type. A brief design methodology is The structural diagram of the proposed adder is depicted
presented together with results and a comparative analysis in Fig. 1. The CCB adder is based on a conventional fixed-
for both high-performance and low-power implementations. point adder circuit (formed by the chain of ADD blocks)
with insertion of several multiplexers that can cut the carry
2. RELATED WORK propagation chain to shorten the effective critical path. The
Adders are the most common arithmetic blocks used in decision to cut the chain is taken by the carry propagate
DSPs, thus many attempts have been made to build them block (PROP) by generating the cut signal. The cut occurs
in an approximate manner. At architectural level, an in- upstream in the carry chain, by multiplexing the real carry
teresting way to build approximate adders is to use carry with a carry speculated from a short chain in an optional carry
speculation [6]. This technique exploits the fact that carry speculator block (SPEC). Taking place at a lower-significance
propagate sequences in additions are typically short, making stage, the carry cut-back guarantees a low relative error. The
it possible to estimate, more or less accurately, an intermedi- cut-back module appears functionally as a feedback between
ate carry using a limited number of previous stages. Thus, two carry-chain positions, but is not a recursive loop as it
the carry chain, critical path of the circuit, can be split into uses the local carry propagate and generate signals directly
two or more shorter paths, relaxing the constraints over the precomputed from the operand inputs. Hence, it cannot
entire design and pushing energy, delay and area beyond the influence the stability of the circuit.
limits imposed by traditional design. The main advantage of this approach remains in its timing
A number of speculative adders have been proposed in characteristic. Typically, the carry propagation chain of an
literature based on the ETAII concept [7]. It consists in addition is naturally broken because of short-size operands or
slicing the addition into regular sub-adder blocks with in- by the distribution of the input bits. Spanning over the whole
put carries speculated in a carry lookahead approach. The adder length, the critical path is only activated if all the
ETBA [8], direct descendant of the ETAII, uses an error stages are in propagate mode. Even if the adder within the
balancing technique based on multiplexers to mitigate the CCB architecture physically contains the entire carry chain
relative error in case of incorrect carry speculation. The through the ADD and multiplexers, this path can never be
ETAIV [9] and CSA [10] have enhanced the error rate and activated. By monitoring one or more stages of the adder, the
mean by chaining several blocks for carry speculation, but at PROP quickly detects such risk and calls a shorter path to be
the cost of increased complexity and energy. The ISA [11] used instead, ensuring that the design meets tighter timing
has generalized and optimized the architecture of speculative constraints. The shorter path can either be a carry speculated
compensated adders by shortening the speculation overhead from the SPEC or a straight cut using a monotonic gate, such
and by introducing a dual-direction compensation mecha- as the OR-cut of the adder implemented in Fig. 2 (cut = 1
nism that improves both circuit performances and accuracy. dictates the OR output regardless of its second input).
However, the multiplexers required for a good error compen- Fig. 2 shows a case study of the longest propagation chains
sation still represent a substantial area and energy overhead, that can flow through a CCB adder built with two cut-
particularly for low-speed implementations. backs. Each cut-back module splits the carry chain with two
With the progressive increase of accuracy and design flexi- possibilities:
bility, speculative adders have repeatedly been proven to be • cut = 0: No deliberate carry-cut in the typical case, i.e.
usable in specific applications [8, 10, 12, 13]. Despite all this the carry chain is naturally broken somewhere in the
progress, they have not yet been adopted by designers, the PROP. The critical path is limited since it cannot entirely
main reason being that occurrence and impact of errors in cross over the PROP. The case 1 in Fig. 2 shows two
real-life applications are not easy to anticipate today. Error examples of such behavior.
metrics are often based on full-range number assumption and • cut = 1: All the stages within the PROP are in propagate
uniform distribution that neither fit typical use nor worst mode. The carry chain necessarily propagates through the
possible cases. Moreover, applications and case studies re- PROP and there is a risk of long critical-path activation
main singular examples. The goal of this work is to propose if the other non-monitored stages are also propagates.
a more efficient adder architecture that can ensure that all Therefore, by intentionally cutting the carry chain, its
errors remain below an upper bound to be integrated without maximum length still remains limited. The case 2 in
uncertainty in codecs and hardware accelerators. Fig. 2 shows two examples of such behavior.
PROP PROP D P K FXW G P P P FXW P P 3523DQG63(&
cut2 cut1 ZLWKLQSXWJXHVV
MSB LSB 2SHUDQGV
,QH[DFWVXP
ADDlast ADD2 ADD1 ADD3 ADD2 ADD1 ADDfirst
([SHFWHGVXP
Case 1 cut2 0
cut1 0
E G FXW P FXW P FXW 3523DQG25FXW
Case 2 cut1 1
cut2 1 2SHUDQGV
Case 3 cut2 1 cut1 0 ,QH[DFWVXP
([SHFWHGVXP
Case 4 cut2 0 cut1 1
Figure 3: Example of CCB addition arithmetic with
(a) 2-bit PROP and SPEC, (b) 1-bit PROP OR-cut.
Longest chains
3.3 Floating-point Precision
Figure 2: Diagram of the longest carry chains and
resulting effective critical paths in the example of an Error propagation
implementation of CCB adder with two OR-cuts. It is interesting to note from Fig. 3 that the error caused
by the cut can propagate on many bits, but seems to keep
Cases 3 and 4 both contain one naturally broken chain the magnitude of the carry cut-back position, to wit, the first
(cut = 0) and one intentional cut (cut = 1). wrong bit. This statement that seems straightforward with
Despite the fact that the full carry chain physically exists the example has to be demonstrated carefully. Indeed, a
in the design, no input combination can activate it from successive series of erroneous sum bits can result in different
the start to the end. It is a false path and can therefore errors. Let Si , Ci and Pi denote the sum, carry-in and
be excluded from the timing analysis. The effective critical propagate signals of the ith stage addition, respectively. The
paths in Fig. 2 sum up the longest propagate chains that sum and carry propagation are defined by:
can occur in the circuit among the different cases. Insertion
of more carry cut-back modules, possibly overlapping each S i = Pi ⊕ C i (1)
others, would lead to shorter effective critical paths. Pi = 1 =⇒ Ci+1 = Ci . (2)
3.2 Arithmetic and Errors Assume a carry error at the ith bit of the adder, with an
erroneous carry of value Cerr . The sum bit and the carry-out
The CCB addition arithmetic is illustrated in Fig. 3. Errors
depend on the value of Pi . If Pi = 1, (1) gives Si = Cerr
only occur with the concurrence of three factors:
and (2) propagates the wrong carry Cerr to the next stage,
• Sequence of propagate signals spanning the entire PROP where the same formulae apply again. If Pi = 0, (1) gives
bit-width, triggering the cut. Si = Cerr and the wrong carry is not propagated, so the
• Sequence of propagate signals spanning the entire SPEC next stage addition is correct. Assuming that the erroneous
bit-width, making the exact carry prediction impossible sum spreads from the mth to the pth stage, the error pattern
with the SPEC bits only. appears as shown in Fig. 4:
• Wrong guess of the carry that inputs the SPEC (Fig. 3a)
Erroneous sum bits
or that directly substitutes for the real carry (Fig. 3b).
This occurs with a 50 % binary probability.
An error occurs in the right-hand path of Fig. 3a because Cerr Cerr Cerr ... Cerr
of the simultaneous occurrence of the three aforementioned p p–1 p–2 i m
properties. In the OR-cut implementation of Fig. 3b (without Figure 4: Balanced error pattern
SPEC), the cut signal is also the guessed carry. The first
condition of error occurrence is met for the two right-hand Just as in Fig. 3, where the error patterns are shown in red,
paths. The guess unintentionally follows the real carry and the last faulty bit counterbalances the first ones and the
leads to a correct sum in the central path, but happens to absolute error value is reduced to:
be wrong and leads to a faulty sum in the right-hand path.
2p − 2p−1 − 2p−2 − · · · − 2m = 2m . (3)
Occurrence of an error implies that one or both operands
have non-zero bits at the PROP position. As the error occurs This result is valid if the carry propagates normally. But
at the carry cut, at a lower-significance position, the expected there can be more than one cut-back module, and if all the
sum is necessarily much larger than the introduced error. In stages between two cut-backs propagate, it could disrupt
the computation of Fig. 3a, the absolute error is 16 while the normal propagation driven by (2). Thus, the previous
the expected sum is 43,265 so the relative error is 0.04 %. result needs to be recomputed for that case. Assume the
In the example of Fig. 3b, the relative error is only 0.006 %. same carry error (Ci = Cerr ) in a propagating stage (Pi = 1,
Such low relative errors are typical in speculative adders for else there would be no carry-chain perturbation). If another
calculations involving large value operands. However, it is cut-back happens to guess the same faulty carry Cerr , then it
the worst case that gives the upper-bound relative error and transparently follows (2) and the previous result holds. But
defines the minimum floating-point precision of the adder. if the carry-cut is in the opposite direction Cerr , as it runs
against (2), it reverses the error: the carry, that was false 3.4 Design and Implementation Strategy
until now, comes back to the value of the expected addition, The CCB technique allows considerable improvements
so the next stage is correct. But the current sum, determined concurrently in circuit performance and error control. This
by (1), is Si = Cerr . The error pattern appears this time as: section describes how to exploit its architectural advantages.
Erroneous sum bits
Design implementation
Cerr Cerr Cerr ... Cerr
Both PROP and SPEC can be implemented in a carry-
lookahead approach and should have very short bit-widths
p p–1 p–2 i m to limit overheads. Their areas can fortunately be balanced
Figure 5: Unbalanced error pattern as the adder segments that they overlay can be cut down
to simple sum generators. Moreover, the delay overhead is
All the erroneous bits are in the same direction and the
limited by the slowest between PROP and SPEC since they
absolute error is simply their sum:
are executed in parallel.
2p + 2p−1 + 2p−2 + · · · + 2m = 2p+1 − 2m . (4) The CCB adder physically contains the entire adder carry
chain but the cut-back scheme prevents from its activation
This error is of much higher magnitude than in the first case,
and splits the critical path into multiple shorter paths. How-
but can only occur if several carry-cuts happen in opposite
ever, long-established Static Timing Analysis (STA) used in
directions. To avoid such dramatic errors, the SPEC guess or
synthesis tools cannot easily identify those timing exceptions.
the straight carry-cut must be chosen in the same direction
It is thus necessary to provide the tools with additional tim-
for all the CCB modules of the adder.
ing constraints to manually exclude from the timing analysis
Worst-case relative error all the false paths generated by the CCB modules. This
additional information prevents the synthesis tools from un-
Having validated the fact that any error has the magnitude necessarily trying to meet delay constraints on them.
of the cut-back bit that caused it, the low impact of the error
on the expected sum should be demonstrated. Delay optimization
The worst case happens when the error magnitude is the
Assuming identical cut-back modules inserted in the adder
highest on the lowest expected calculation. Occurrence of an
and with the notation of Fig. 2, the effective critical-path
error implies that the three factors mentioned in section 3.2
delays are:
are realized, this assumes that the PROP and SPEC inter- ⎧
cept propagate signals only. All the non-zero operand bits ⎪
⎪ tfirst = tADD1 + tADD2 + tADDfirst
producing those propagates add up to the expected sum: ⎪
⎪
⎪
⎪
• Standing at higher-significance positions than the carry ⎪
⎨ tmid = 2 (tADD1 + tADD2 ) + tADD3 + tmux
error, the PROP non-zero bits significantly contribute to + max(tSPEC , tPROP ) (7)
⎪
⎪
maximizing the expected result and thus to minimizing ⎪
⎪
⎪
⎪ tlast = tADD1 + tADD2 + tADDlast + tmux
the worst-case relative error. ⎪
⎩
• Positioned directly before the carry-cut, the SPEC non- + max(tSPEC , tPROP )
zero bits contribute in a lower extent to increasing the where tfirst and tlast are the two boundary-path delays, and
sum by attenuating a portion of the magnitude of the tmid is the delay of the all the intermediate paths. The
error. Although, they participate equally with the PROP multiplexer delay tmux can eventually be replaced by the
bits in reducing the rate of errors. delay of the straight-cut gate.
• When the SPEC guess or the straight carry-cut is 0, i.e. In order to optimize the timing budget, it is possible to
speculating a low carry, an error replaces a real carry equalize those paths by sizing ADDfirst and ADDlast , and
at state 1 coming from a generate stage. Added to the to equalize the SPEC and PROP bit-widths. Note that the
SPEC bits, this stage further increases the sum to 2m . PROP and ADD2 blocks are proportionate as they cover the
Whenever a carry-cut error occurs, while it keeps the mag- same adder segment, the first as carry-lookahead and the
nitude of the cut bit significance, i.e. an arithmetic error of latter as sum generator.
value 2m , the sum is always expected to be greater than:
k k k Error optimization
2m + 2 and 2 + 2 , (5) The CCB adder enables to dissociate the precision from
k∈PROP k∈SPEC k∈PROP
the dynamic range of the adder, which is fixed by the total
leading to a relative error lower than: adder bit-width. It offers a large design space to minimize
2m 2m the application quality loss and maximize the savings by
k and k k, (6) trading off mean, maximum and rate of errors, configurable
2m + 2 2 + 2 by choosing positions and bit-widths of the CCB modules.
k∈PROP k∈SPEC k∈PROP
The error rate depends on the number of cut-back modules
in the cases where the carry guess is at 0 and 1, respectively. and of the PROP and SPEC bit-widths. The maximum error
This result holds if multiple errors occur in different carry-cut can be adjusted mainly by sizing the PROP bit-width and
modules as the ratio of error over sum is preserved. positioning the carry-cut (i.e. sizing ADD1 ), and to a lesser
A floating-point precision is thus configurable at design extent by modifying the SPEC bit-width and input guess.
time by sizing and positioning PROP and SPEC and selecting Optimum trade-offs to adjust Signal-to-Noise Ratio (SNR),
the carry guess. It is easy to verify that the worst-case relative Root Mean Square (RMS) error or any other accuracy metric
error is 7.7 % for the example in Fig. 3a and 12.5 % in Fig. 3b, can be achieved using the same models than those built for
corresponding to precisions between 4 and 5 bits. speculative adders [12, 13].
1.2 100 1.2 10
PDAP REMAX (%) PDAP REMAX (%)
Energy RERMS (%) Energy RERMS (%)
1 10 1 1
Normalized costs
0.8 1 0.8 10
-1 -2
0.6 10 0.6 10
-2 -3
0.4 10 0.4 10
-4 -5
0 10 0 10
,1, 0)
1,1 )
1,1 )
2,1 )
1,2 )
1,2 )
3,1 )
3,1 )
2,2 )
4,1 )
3,2 )
5,1 )
4,2 )
(1, ,2,0)
5,1 )
4,2 )
3,3 )
5,2 )
7,1 )
8,1 )
7,2 )
6,3 )
(1, ,0,0)
)
)
(1, ,-,0)
)
)
)
ct
4,1 t
(7, ,0,0
(6, ,2,0
(6, ,2,0
(8, ,1,0
(7, ,0,0
(7, ,0,0
(3, ,0,0
(4, ,2,0
(3, ,2,0
(4, ,2,0
(2, ,2,0
(2, ,3,0
,-,0
(3, ,0,0
(4, ,0,0
(3, ,0,0
(2, ,0,0
(2, ,2,0
(2, ,0,0
(2, ,0,0
(2, ,0,0
(1, ,1,-,0
(1, ,1,-,0
(1, ,4,-,0
(1, ,3,-,0
-,1
(4, xac
(10 Exa
(10 ,1,1,
,3,
2,4
,2
E
1
9
10
10
9
11
12
,1
Figure 6: Relative errors and normalized costs of Figure 7: Relative errors and normalized costs of
various 32-bit CCB adders (on the horizontal axis) various 32-bit CCB adders (on the horizontal axis)
synthesized at 3.3 GHz in a 65 nm technology. synthesized at 0.8 GHz in a 65 nm technology.