Review of Adders
Review of Adders
Adders
343
(LSBs) (k < n). A speculative design makes an adder sig-
nificantly faster than the conventional design. Segmented
adders are proposed in [7,19,27]. An n-bit segmented adder
is implemented by several smaller adders operating in par-
allel. Hence, the carry propagation chain is truncated into
shorter segments. Segmentation is also utilized in [1,5,8,10,
12,25], but their carry inputs for each sub-adder are selected
differently. This type of adder is referred to as a carry se-
lect adder. Another method for reducing the critical path
delay and power dissipation of a conventional adder is by
approximating the full adder [2, 17, 20, 24]; the approximate
adder is usually applied to the LSBs of an accurate adder.
In the sequel, the approximate adders are divided into four Figure 1: The almost correct adder (ACA). : the carry
categories. propagation path of the sum bit.
344
ators (one with carry-0 and one with carry-1) and one carry
predictor in each block. The output of the ith carry pre-
dictor is used to select carry signals for the (i + 1)th sum
generator. l input bits (rather than k, l < k) in a block are
used in a carry predictor. Therefore, the hardware overhead
is reduced compared to SCSA.
in a smaller overall error variance. global speculative carry (referred to as a consistent carry).
In CCA, the carry prediction depends not only on its LSBs,
2.3 Carry Select Adders but also on the higher bits. The critical path delay and area
In the carry select adders, several signals are commonly complexity of CCA are similar to SCSA.
used: generate gj = aj bj , propagate pj = aj ⊕ bj , and P i =
∏ i
k−1 2.3.6 The Generate Signals Exploited Carry Specu-
pj . P i = 1 means that all k propagate signals in the ith lation Adder (GCSA)
j=0
block are true. In [5], the generate signals are used for carry speculation.
GSCA has a similar structure as CSA. The only difference
2.3.1 The Speculative Carry Select Adder (SCSA) between them is the carry selection; the carry-in for the
The SCSA (i + 1)th sub-adder is selected by its own propagate signals
⌈ ⌉ is proposed in [1]. An n-bit SCSA consists
of m = nk sub-adders (window adders). Each sub-adder rather than its previous block. The carry-in is the most
i
is made of two k-bit adders: adder0 and adder1, as shown significant generate signal gk−1 of the ith block if P i = 1, or
th
in Fig. 4. Adder0 has carry-in “0” while the carry-in of else it is the carry-out of the i sub-carry generator. The
adder1 is “1”; then the carry-out of adder0 is connected to critical path delay of GCSA is O(log(2k)) due to the carry
a multiplexer to select the addition result as a part of the propagation. This carry selection scheme effectively controls
final result. Thus, the critical path delay of SCSA is tadder + the maximal relative error.
tmux , where tadder is the delay of the sub-adder (O(log(k))),
and tmux is the delay of the multiplexer. SCSA and ETAII 2.4 Approximate Full Adders
achieve the same accuracy for the same parameter k, because
the same function is used to predict the carry for every sum 2.4.1 The Lower-Part-OR Adder (LOA)
bit. Compared with ETAII, SCSA uses an additional adder LOA [17] divides an n-bit adder into an (n − l)-bit more
and multiplexer in each block and thus, the circuit of SCSA significant sub-adder and an l-bit less significant sub-adder.
is more complex than ETAII. For the less significant sub-adder, its inputs are simply pro-
cessed by using OR gates (as a simple approximate full
2.3.2 The Carry Skip Adder (CSA) adder). The more significant (n − l)-bit sub-adder is an
Similar to ⌈SCSA,
⌉ an n-bit carry skip adder (CSA) [8] is accurate adder. An extra AND gate is used to generate the
divided into nk blocks, but each block consists of a sub- carry-in signal for the more significant sub-adder by AND-
carry generator and a sub-adder. The carry-in of the (i+1)th ing the most significant input bits of the less significant sub-
sub-adder is determined by the propagate signals of the ith adder. The critical path of LOA is from the AND gate
block: the carry-in is the carry-out of the (i − 1)th sub-carry to the most significant sum bit of the accurate adder, i.e.,
generator when all the propagate signals are true (P i = 1), approximately O(log(n − l)). LOA has been utilized in a
otherwise it is the carry-out of the ith sub-carry generator. recently-proposed approximate floating-point adder [14].
Therefore, the critical path delay of CSA is O(log(2k)). This
carry select scheme enhances the carry prediction accuracy. 2.4.2 Approximate Mirror Adders (AMAs)
In [2], five AMAs are proposed by reducing the number of
2.3.3 The Gracefully-Degrading Accuracy-Configurable transistors and the internal node capacitance of the mirror
Adder (GDA) adder (MA). The AMA adder cells are then used in the
An accuracy-configurable adder, referred to as the gracefully- LSBs of a multiple-bit adder. However, the critical paths of
degrading accuracy-configurable adder (GDA), is presented AMA1-4 are longer than LOA because the carry propagates
in [25]. Control signals are used to configure the accuracy through every bit. As for AMA5, the carry-out is one of the
of GDA by selecting the accurate or approximate carry-in inputs; thus, no carry propagation exists in the LSBs of an
using a multiplexer for each sub-adder. The delay of GDA is approximate multiple-bit adder.
determined by the carry propagation and thus by the control
signals to multiplexers. 2.4.3 Approximate Full Adders using Pass Transis-
tors
2.3.4 The Carry Speculative Adder Three approximate adders (AXAs) based on XOR/XNOR
Different from SCSA, the carry speculative adder (CSPA) gates and multiplexers (implemented by pass transistors)
in [12] contains one sum generator, two internal carry gener- have been presented in [24]. Several approximate comple-
345
mentary pass transistor logic (CPL) adders have been pro- the smallest ER and MRED. ESA-3 is the least accurate
posed by reducing the number of transistors in the accurate design in terms of ER and MRED.
CPL adder [20]. Significant area and power savings have
been obtained for both types of approximate designs. 3.2 Circuit Characteristics
To assess circuit characteristics, the considered 16-bit ap-
proximate adders and the accurate CLA are implemented
3. COMPARATIVE EVALUATION in VHDL and synthesized using the Synopsys Design Com-
In the evaluation, SCSA is selected as a typical carry se- piler based on an STM 65-nm process; delay, area and power
lect adder and LOA is considered as a representative design are then obtained. Among ETAII, SCSA and ACAA (with
using approximate full adders. All adders and sub-adders the same error characteristics when the same parameter k
are implemented as CLA in this paper. is selected), SCSA incurs the largest power dissipation be-
cause two sub-adders and one multiplexer are utilized in
3.1 Error Characteristics each block, and ACAA is the slowest because of its critical
path (2k) is twice that of the other two adders. The block
To evaluate the accuracy of approximate adders, analyt-
of ETAII (a carry generator and a sum generator) is signif-
ical and simulation-based approaches have been proposed
icantly simpler than those of SCSA and ACAA. Therefore,
[6, 11, 13, 18, 22]. In this paper, Monte Carlo simulation is
ETAII has a shorter delay and consumes less power and area
performed. The error rate (ER, the probability of produc-
than SCSA and ACAA, thus ETAII is selected for circuit
ing an incorrect result), the normalized mean error distance
comparisons.
(NMED, the normalization of mean error distance (MED) by
A circuit with larger area is likely to consume more power,
the maximum output of the accurate adder) and the mean
so only power and delay are considered in the comparison.
relative error distance (MRED, the average value of all pos-
Fig. 6 shows the delay and power of the approximate adders
sible relative error distances (REDs)) are used to assess the
with ascending delay (Fig. 6(a)) and power (Fig. 6(b)) from
error characteristics of the approximate designs. Error dis-
left to right. Obviously, the accurate CLA has the longest
tance (ED) and RED are calculated as: ED = |M ′ −M | and
delay among all adders, but not the highest power dissipa-
RED = ED M
, where M ′ is the approximate result and M is
tion. LOA (shown in different patterns) is the slowest, but
the accurate result [11]. M ED is the mean of all possible
it is very power efficient compared with the other approxi-
EDs.
mate adders. With the same k, ESA is the fastest (when k
The functions of the approximate adders (16−bit) are im-
is 3 or 4) and most power efficient due to its simple segment
plemented in MATLAB and simulated with 108 random in-
structure. ETAII is the slowest (except for k = 3) excluding
put combinations. The error measures in ER, NMED and
LOA, and ACA incurs the largest power consumption due to
MRED are obtained. Fig. 5 shows the simulation results
its complex speculation circuit. Among all the approximate
where each adder’s name is followed by the value of its pa-
adders shown in Fig. 6, ESA-3 is the fastest, while LOA-8 is
rameter k. For ACA, ETAII and ESA, k is the size of the
the slowest. The delays of all ACAs are very close (less than
sub-adder, while k is the size of the less significant adder for
400 ps), and all the LOAs have a delay larger than 600 ps.
LOA (as implemented by OR gates).
For ESA and ETAII, a smaller k leads to a smaller delay and
The NMED and MRED values of the approximate adders
power dissipation, a larger k also shows significantly larger
with data sorted by the MRED are shown in Fig. 5(a). The
values of these metrics.
logarithms (base 10) of the NMED and MRED are plotted,
Since a smaller delay does not always imply lower power
and the vertical axis is labeled by negative numbers. In this
dissipation, the power-delay-product (PDP) is used as a
figure, ETAII-k represents ETAII-k, SCSA-k and ACAA-
joint metric to evaluate the circuit characteristics of the ap-
k because they have the same carry propagation chain for
proximate adders. Fig. 7 shows in ascending order the PDPs
each sum bit and hence, the same error characteristics (ER,
of the approximate adders from left to right. ESA-3 has
NMED and MRED). The NMED and MRED show the same
the smallest PDP, while the accurate CLA has the largest
trend, so we only consider MRED in the comparison. Fig.
value. LOA-10 and LOA-9 have moderate PDPs (due to
5(b) shows the comparison of ER and MRED of the approx-
large delay and low power dissipation). In terms of PDP,
imate adders with data sorted by ER.
the approximate adders can be classified into three classes:
Among these approximate adders, ETAII-6 has the small-
ESA-3, ESA-4 and ETAII-3 have the smallest PDP with less
est MRED, while ESA-3 has the largest. LOA (shown in
than 10 fJ, then ESA-5, LOA-10, ETAII-4 and ACA-3 with
different patterns in Fig. 5(b)) has a structure different
around 20 fJ, and the PDPs of the other approximate adders
from the other approximate adders. Its higher part is to-
are larger than 20 fJ and less than 45 fJ.
tally accurate, while the approximate part is less significant.
Therefore, the MRED of LOA is rather small but its ER is
very large. The information used to predict each carry in 4. DISCUSSION AND CONCLUSION
ESA is rather limited, so its MRED and ER are the largest In this paper, current approximate adders are reviewed;
(excluding LOA). Specifically, a lower ER usually indicates a their error and circuit characteristics are evaluated. Fig. 8
smaller MRED; a larger k normally means a lower ER and shows the MRED and PDP of the approximate adders in
MRED for all the approximate adders (except for LOA). a two-dimension (2-D) plot. ESA-3 and ESA-4 have rather
Compared with ETAII, SCSA and ACAA (represented by small PDP but a considerably large MRED; LOA-8, LOA-7
ETAII in Fig. 5), ACA gives slightly higher ER and MRED and ETAII-6 are the opposite, i.e., they have a small MRED
for the same k due to the shorter carry propagation chain but a large PDP. These approximate adders do not show the
(thus, less information is used for predicting the carry bits). best tradeoff, but they can be used for special applications
In summary, ETAII-6, SCSA-6 and ACAA-6 are the most where either hardware efficiency or high accuracy is required.
accurate adders among the compared designs since they have The best tradeoff is met by ETAII-3 with a small PDP and
346
(a) (b)
Figure 5: A comparison of error characteristics of approximate adders with data sorted on (a) MRED and (b) ER.
(a) (b)
Figure 6: A comparison of delay and power of approximate adders with data sorted on (a) delay and (b) power.
MRED. ESA-5, ACA-3 and ACA-4 show moderate MRED [2] V. Gupta, D. Mohapatra, A. Raghunathan, and
and PDP. K. Roy. Low-power digital signal processing using
Overall, ESA is the most hardware-efficient design but it approximate adders. IEEE Trans. CAD,
is also the least accurate. ETAII, SCSA and ACAA have the 32(1):124–137, 2013.
same accuracy when their parameters are the same, whereas [3] J. Han and M. Orshansky. Approximate Computing:
ETATII shows the smallest PDP among them. ACA is the An Emerging Paradigm For Energy-Efficient Design.
most power consuming design with a moderate accuracy, In ETS, pages 1–6, Avignon, France, 2013.
while LOA is the slowest but it is highly power efficient [4] R. Hegde and N. Shanbhag. Soft digital signal
among all approximate adders. processing. IEEE Trans. VLSI Syst., 9(6):813–823,
2001.
5. REFERENCES [5] J. Hu and W. Qian. A new approximate adder with
[1] K. Du, P. Varman, and K. Mohanram. High low relative error and correct sign calculation. In
performance reliable variable latency carry select DATE, in press, 2015.
addition. In DATE, pages 1257–1262, 2012.
347
image processing. In IEEE-NANO, pages 239–243,
2014.
[15] Y. Liu, T. Zhang, and K. Parhi. Computation error
analysis in digital signal processing systems with
overscaled supply voltage. IEEE Trans. VLSI Syst.,
18(4):517–526, 2010.
[16] S.-L. Lu. Speeding up processing with approximation
circuits. Computer, 37(3):67–73, 2004.
[17] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and
C. Lucas. Bio-Inspired Imprecise Computational
Blocks for Efficient VLSI Implementation of
Soft-Computing Applications. IEEE Trans. Circuits
Syst., 57(4):850–862, 2010.
[18] J. Miao, K. He, A. Gerstlauer, and M. Orshansky.
Modeling and synthesis of quality-energy optimal
approximate adders. In ICCAD, pages 728–735, 2012.
Figure 7: Sorted PDP of the approximate adders.
[19] D. Mohapatra, V. Chippa, A. Raghunathan, and
K. Roy. Design of voltage-scalable meta-functions for
approximate computing. In DATE, pages 1–6, 2011.
[20] D. Nanu, R. P. K., D. Sowkarthiga, and K. S. A.
Ameen. Approximate adder design using cpl logic for
image compression. International Journal of
Innovative Research and Development, 3(4):362–370,
2014.
[21] B. Parhami. Computer Arithmetic: Algorithms and
Hardware Designs, 2nd edition. Oxford University
Press, New York, 2010.
[22] R. Venkatesan, A. Agarwal, K. Roy, and
A. Raghunathan. Macaco: Modeling and analysis of
circuits for approximate computing. In ICCAD, pages
667–673, 2010.
[23] A. K. Verma, P. Brisk, and P. Ienne. Variable latency
speculative addition: A new paradigm for arithmetic
Figure 8: MRED and PDP of the approximate adders. circuit design. In DATE, pages 1250–1255, 2008.
[24] Z. Yang, A. Jain, J. Liang, J. Han, and F. Lombardi.
Approximate XOR/XNOR-based adders for inexact
computing. In IEEE-NANO, pages 690–693, 2013.
[6] J. Huang, J. Lach, and G. Robins. A methodology for [25] R. Ye, T. Wang, F. Yuan, R. Kumar, and Q. Xu. On
energy-quality tradeoff using imprecise hardware. In reconfiguration-oriented approximate adder design and
DAC, pages 504–509, 2012. its application. In ICCAD, pages 48–54, 2013.
[7] A. B. Kahng and S. Kang. Accuracy-configurable [26] N. Zhu, W. L. Goh, G. Wang, and K. S. Yeo.
adder for approximate arithmetic designs. In DAC, Enhanced low-power high-speed adder for
pages 820–825, 2012. error-tolerant application. In SOCC, pages 323–327,
[8] Y. Kim, Y. Zhang, and P. Li. An energy efficient 2010.
approximate adder with carry skip for error resilient [27] N. Zhu, W. L. Goh, and K. S. Yeo. An enhanced
neuromorphic vlsi systems. In ICCAD, pages 130–137, low-power high-speed adder for error-tolerant
2013. application. In ISIC 2009, pages 69–72, 2009.
[9] I. Koren. Computer Arithmetic Algorithms, 2nd [28] N. Zhu, W. L. Goh, and K. S. Yeo. Ultra low-power
edition. A K Peters/CRC Press, Natick, 2001. high-speed flexible Probabilistic Adder for
[10] L. Li and H. Zhou. On error modeling and analysis of Error-Tolerant Applications. In SOCC, pages 393–396,
approximate adders. In ICCAD, pages 511–518, 2014. 2011.
[11] J. Liang, J. Han, and F. Lombardi. New metrics for [29] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H.
the reliability of approximate and probabilistic adders. Kong. Design of Low-Power High-Speed
IEEE Trans. Comput., 62(9):1760–1771, 2013. Truncation-Error-Tolerant Adder and Its Application
[12] I. Lin, Y. Yang, and C. Lin. High-performance in Digital Signal Processing. IEEE Trans. VLSI Syst.,
low-power carry speculative addition with varible 18(8):1225–1229, 2010.
latency. IEEE Trans. VLSI Syst., in press, 2014.
[13] C. Liu, J. Han, and F. Lombardi. An analytical
framework for evaluating the error characteristics of
approximate adders. IEEE Trans. Comput., 2014.
[14] W. Liu, L. Chen, C. Wang, M. ONeill, and
F. Lombardi. Inexact floating-point adder for dynamic
348