0% found this document useful (0 votes)

8 views

Articulo 1

Uploaded by

hyperloop624

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Articulo 1

Uploaded by

hyperloop624

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Microprocessors and Microsystems 100 (2023) 104847

Contents lists available at ScienceDirect

Microprocessors and Microsystems

journal homepage: www.elsevier.com/locate/micpro

Hardware design and implementation of high-efficiency cube-root of

complex numbers✰
Elias Rajaby *, Sayed Masoud Sayedi, Ehsan Yazdian
Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran

A R T I C L E I N F O A B S T R A C T

Keywords: This study presented an algorithm for fast hardware execution of complex cube root. In this algorithm, which is
Complex cube root based on the Laurent series of ∛z function, first, the z-plane’s numbers are mapped by using a rapid scaling and
Computational modeling rotation operation to a pre-specified limited region, and then the sequences of the series are computed. The
Computer architecture
parameters of the algorithm are thoroughly analyzed and selected to achieve high precision. The algorithm has
Hardware implementation
Field programmable gate arrays
been implemented on a field programmable gate array-based platform using the Simulink HDL Coder tool and
Digital signal processing Xilinx ISE 14.7. In addition, the resource usage and speed parameters are carefully examined for the imple
mentation of each step of the algorithm. Hardware was implemented in two 56-bit and 32-bit versions (for
comparison). The 32-bit version occupies 140 slice Regs, 421 slice LUTs, and 5 DSP48s. The hardware with the
capability of computing complex cube roots has appropriate specifications comparable with those of previous
implementations of real cube root calculation on FPGA.

1. Introduction number and its cubic root bits. In this study the input number was
changed to 33 bits by the addition of a zero bit in its most significant
Cube root calculation as a fundamental operation in solving cubic position and then divides it into eleven 3-bit sections. Next, these sec
and quartic equations is a complicated operation used in some digital tions were used to calculate each of the cubic root bits from the most
signal processing applications [1–6]. Different algorithms and imple significant bit (MSB) to the least significant bit by solving a conditional
mentations have been proposed for this calculation [7–21]. For example, first-order equation. Implementation of this algorithm needs 1 multi
a field programmable gate array (FPGA)-based hardware was presented plier, 5 adders and several multiplexers and registers. After imple
for the cube root calculation of 32-bit floating point input numbers in mentation on FPGA, the resulting hardware performs the computations
accordance with the IEEE 754–2008 standard format [19]. This hard in 13 clock cycles for 32 bit input numbers.
ware separates the exponential and mantissa sections of the input In [21] the cubic root of a fixed-point 32-bit binary-coded decimal
number, which are 8 bit and 23 bit respectively, and then the results of (BCD) real number was computed on FPGA based on the long division
dividing exponent part by three, which are quotient and reminder, are method. Like previous work, its presentation of mathematical theory
obtained without calculation via a read only memory (ROM) memory. begins by third power of a two-digit number. But, here, unlike the
Further, the cubic root of the mantissa is calculated by applying the previous work, which focused on bit by bit recovery, the authors focused
Newton-Raphson relationship for cube root function, which requires on obtaining mathematical relations for the recovery of decimal digits of
Newton-Raphson approximation for reciprocal function too. After the cube root. It considers the cubic root as a three-digit decimal number
implementation on Virtex5 FPGA, all of the steps takes a latency of 19 (Y1 Y2 Y3 ), and then the digits of this number are calculated from left to
clock cycles. right. Y1 is obtained directly from eight MSBs of the input by using some
In another research [20], according to the mathematical relationship conditional relations. Subsequently, Y2 and Y3 in order are obtained by
related to the third power of a two-digit number ((pq)3), first ’p’ and solving two second-degree algebraic equations through a trial-and-error
then ’q’ is obtained. Then some mathematical expressions were pro method. The coefficients of the equations are obtained by using the
vided, showing the relationships between a 32-bit real fixed-point previously discovered digits of the cubic root and the remaining bits of

✰
Simulink and VHDL files of the designed hardware are available at https://disk.yandex.com/d/GicXvykrtTlpuA.
* Corresponding author.
E-mail address: [email protected] (E. Rajaby).

https://doi.org/10.1016/j.micpro.2023.104847
Received 26 January 2023; Received in revised form 29 March 2023; Accepted 26 April 2023
Available online 16 May 2023
0141-9331/© 2023 Elsevier B.V. All rights reserved.
E. Rajaby et al. Microprocessors and Microsystems 100 (2023) 104847

the input number. The design was implemented on a Virtex 5 device.

The hardware needs at least 10 clock cycles to complete the
computations.
In some applications, there is a need for computing the cube root of
complex numbers; for instance, to execute the syndrome decoding al
gorithm in performing Sparse Fast Fourier Transform (SFFT), solving
cubic equations and calculating the complex cube root within it are
necessary [5,6].
In this paper, a fast algorithm with its hardware implementation are
presented for the cube root calculation of complex numbers. In the
design to improve the speed and area usage of the FPGA, each mathe
matical relationship involved in the algorithm has been implemented
through a low-level design approach to carefully manage its resources
and time consumption.
The remaining sections of the paper are organized as follows:
Section 2 describes the proposed algorithm and the related mathe
matical relationships. A detailed implementation of the relationships
and architecture of the designed hardware is discussed in Section 3.
Section 4 presents the results of FPGA implementation and then com Fig. 1. Convergence zone C, and subregion S.
pares them with those of some previous works. Finally, the main find
ings are provided in Section 5.

2. The proposed algorithm

The algorithms used in previous works for computing the real cubic
root are not generally suitable for complex cube root computing. For
instance, using Newton-Raphson-based methods [7,10] requires the
division or reciprocal operation in the complex domain with high
computational resources. In addition, the Newton-Raphson algorithm
necessitates somewhat strict initializing parameter values before
running, and if it is not performed, the algorithm may not converge at all
or converge after many iterations. Several other proposed algorithms in
other studies [8,9,14] are not compatible with the complex domain or
they need some major modifications to be used for complex values.
The proposed method in this work is a division-free algorithm that
can truly converge and is based on the Laurent series expansion of the
function in the complex domain. The function is holomorphic at point
’a’ and its expansion is expressed as:

̃ 1 1 z − a (z − a)2
z3n = a3 + 2 − 5 +…
3 a3 9 a3 Fig. 2. Mapping operation on the input number.
⎛ ⎞
1
1
+ a3− n n
(− a + z) ⎝ 3 ⎠ domain is equivalent to the evaluation of Eq. (4) in the real domain:
n ( )12
(real(z) − 1)2 + imag(z)2 <1 (4)
n ∈ [0, ∞), (converges when |a − z| < |a|) (1) To make the computation simple, a part of region C, namely, S, is
To avoid redundant computations, the relationship between each considered for the mapping operations which is characterized by the
term in the series and its previous one is used instead of applying explicit following relationships:
formulae in the sequence: real(z) ≥ (imag(z)) (5)
(1 )
− t + 1 ⋅(z − a)
(2) (6)
3 1
rt = rt− 1 ⋅ , r0 = a3 real(z) ≥ − (imag(z))
t⋅a
Further, to avoid long-bit words in hardware implementation, first, h
< ‖ z ‖1 < h (7)
the input number is scaled around a specific constant value of ’a’, and 8
then after computing the series, the scaling is compensated by a reverse
operation. If value ’a’ is far from input value ’z’, the time of scaling where
operations and thus the latency of the system represent an increase.
Furthermore, the value of ’a’ in Eqs. (1) and (2) can affect the and h and
complexity of the computation. By these considerations, ’a’ is set to "1′′ ,
and input value ’z’ is considered in the convergence zone C expressed h/8 are two boundary values of region S that are located on the real axis.
by: Region S, which is similar to a cut corner Square Rhombus, is shown in
Fig. 1. The region has two main properties. First, with the scaling factor
|z − 1| < 1 (3)
of 8, each point outside the region is mapped only once in the region,
A mapping operation moves the input values that are not inside the and second, point zero (which is a divergence point) and all the points
convergence zone into the zone. The evaluation of Eq. (3) in the complex near zero, which slowly converge, are not located in the region.

2
E. Rajaby et al. Microprocessors and Microsystems 100 (2023) 104847

∫∫ Rn (z)
1 dxdy
(14)
|z|3
, (z = x + iy)
...S ds
The reason is that the input values are not uniformly scaled to their
corresponding mapping values in region S; therefore, there is a need for
geometrical gridding in which the distances of the grid lines for the
integration vary accordingly. In other words, the number of points
considered for the integration should not be changed with the value of
’h’. Hence, to compute the average of relative reminders of selected
uniformly distributed points (Here 218 points) on the entire input range,
first, they are mapped to region S, and then the mapped points are
placed on a geometrical grid for the integration, followed by computing
the relative error of each point. This process is expressed as follows:
∫ 215 − 1 ∫ 215 − 1 Rn (map(x+iy))
− 215 − 215 1 dxdy
(15)
|map(x+iy)|3
AR n = ∫ 215 − 1 ∫ 215 −1
− 215 − 215
dxdy

Fig. 3 displays the average error values for five different values of ’n’
in Eq. (1) and for different values of ’h’. Based on these results, the
Fig. 3. Average error obtained by Eq. (15) for different values. values of n = 40 and h = 1.93 are chosen for the implementation.
To fulfill condition (7), one way is to multiply or divide ||z||1 by
The mapping operation for a sample input number ’z’ outside region value eight consecutively until the condition becomes true. However,
S is illustrated in Fig. 2. As shown, scaling and rotation operations are this method is time-consuming, thus the following method is employed
performed for the mapping operation. These operations and their cor for this purpose. From Eq. (7), we have:
responding reverse operations after computing the cube root of the
1 1
mapped data should not be complex for the hardware implementation. ≤ ‖ z ‖1 ⋅ ⋅8m < 1 (16)
8 h
Hence, the two above-mentioned operations and their corresponding
reverse operations are performed by Eqs. (8) and (9), as well as (10) and or
(11), respectively: ( )
1
z1 = z⋅8m , m ∈ Z (8) − 1 ≤ log8 ‖ z ‖1 ⋅ + m < 0 (17)
h

By defining ‖ z ‖1 as:
′
(9)
jkπ
z2 = z1 ⋅e 2 , k ∈ {1, 2, 3}
1
(18)
′
̃1 ̃1 ‖ z ‖1 =‖ z‖1 ⋅
(10)
jkπ
z31 = z32 ⋅e− 6 h
(17) can be re-written as:
̃1
z̃3 = z31 ⋅2− (11)
1 m
( ′ ) ( ′ )
− log8 ‖ z ‖1 − 1 ≤ m < − log8 ‖ z ‖1 (19)
The value of parameter ’m’ in Eqs. (8) and (11) is determined by the
constraint defined in (7). The value of ’h’ in Eq. (7) can be in the interval By using (19), scaling factor ’m’ can be calculated as:
of 0–2. This value affects the precision of the calculated result or the ⌊ ( ’ )⌋
m = − log8 ‖ z‖1 − 1
estimation error, thus finding its optimal value is necessary. The amount
of errors is estimated by the absolute difference value between the exact
⌊ ( ) ⌋
... log2 ‖ z‖’1 ... (20)
=− − 1
root value and the estimated value: 3
⃒ ⃒ ⌊ ′ ⌋
⃒̃ 1 1⃒
Rn (z) = ⃒⃒z3n − z3 ⃒⃒ (12) The value of log2 (‖ z ‖1 ) can be simply calculated by a binary
representation of ‖ z ‖1 . It is equivalent to the Most Significant ’1′ Po
′

To determine the average of relative reminders (errors) on region S sition (MS1P()) in the binary representation of the number, subtracted
for a specified value of ’h’, it is necessary to compute the following by the fractional length value which is 40 here. Hence, ’m’ can be
formula: expressed as:
⌊ ( ′ ) ⌋
...S Rn (z)
1 ds MS1P ‖ z ‖1 − 40
m=− − 1 (21)
(13)
|z|3
AR n = 3
...S ds
As expressed in Eq. (14), the formula cannot be calculated in the For the rotation step, Eqs. (5) and (6) provide the approximate
usual way of using uniform gridding. angular position of input ’z’ in the complex plane and are determined
based on that parameter ’k’ in Eqs. (9) and (10). The implementation of
jkπ
Eq. (10), considering its term e− 6 , is more complicated than Eq. (9).
jπ j2π j3π
Terms e , e and e are equal to j, − 1, and –j, respectively, in Eq. (9). To
2 2 2

Table 1 implement Eq. (10), considering the modular property of the phase,
Values of k and k’ for different evaluations of (5) and (6). parameter k’ is defined as expressed in Eq. (22), and the reverse rotation
Evaluation of (5) False False True True
is applied by using Eq. (23):
( )3 ( )3
Evaluation of (6) False True False True ′
jk π jkπ
k 2 3 1 0 − ′
2 6 j3k π jkπ

k’ 2 3 1 0 e =e => e 2 = e− 2

3
E. Rajaby et al. Microprocessors and Microsystems 100 (2023) 104847

Fig. 4. Block diagram of the proposed hardware; details of step 2 is shown in Fig.5, step 4 in Fig.6, steps 5 and 7 in Fig. 7 and step 6 in Fig. 8.

Fig. 5. Circuit diagram of step 2 of the algorithm .

′ was optimized through a low-level design approach. Each step of the

3k π kπ
=> ≡− (mod 2π) (22) algorithm/block of the architecture and their corresponding circuits are
2 2 explained in the following sections.
̃1 ̃1 jk′ π Step 1: In the first step of the algorithm, an R_adder1 circuit com
z31 = z32 ⋅e 2 (23) putes the sum of the absolute values of the imaginary and real parts.
Step 2: To compute ||z|| in Eq. (18) without the need for multipli
′
The values of k and k’ that are determined by Eqs. (5) and (6) are
expressed in TABLE 1. cation, the value of 1h is chosen in such a way that only a few shift and
After performing the scaling and rotation process, the third root is addition operations are needed for the calculation. In other words, the
estimated by Eqs. (1) and (2), and then the reverse mapping is per binary representation of 1h contains only a few nonzero bits. As
formed on the result of the series by Eqs. (10) and (11). mentioned earlier, the optimal value of h in our design is around 1.93,
The steps of the proposed approach are summarized in Algorithm 1. and 1h is nearly 0.5181. Thus, the value of 0.5313 (12 + 32
1
) is selected for
1
h, requiring two R_shifts and one R_addition for the implementation of
3. Implementation and architecture Eq. (18). In step 2, to calculate parameter ’m’, the function of the most
significant ’1′ position (MS1P) in Eq. (21) is implemented by using a
The proposed design is a fixed point-based architecture design. After priority circuit. It consists of 56 6-bit length 2 to 1 multiplexers that are
a range analysis in the floating point mode, for the input value of the connected serially (Fig. 5). Each bit of the 56-bit input data is connected
architecture (input ’z’ in Fig. 4), a 56-bit word-length 40-bit fraction- to one of the select signals of the multiplexers, and the input data of the
length format was taken into consideration. The majority of the sig multiplexers, which are 6-bit numbers, demonstrate the place of the
nals during calculations, considering their real and imaginary parts, are
numbers with 112 bits.
The block diagram of the proposed hardware and the algorithm’s
steps performed by each block is depicted in Fig. 4. For efficient resource 1
Prefixes R and C in the text denote Real and Complex calculation,
usage of the hardware, the implementation of each step of the algorithm respectively.

4
E. Rajaby et al. Microprocessors and Microsystems 100 (2023) 104847

LUT1 = { − 39, − 39, − 36, − 36, − 36, …, 15},

(24)
LUT2 = { − 13, − 13, − 12, − 12, − 12, …, 5}
Step 3: A dynamic shifter with a rounding procedure is used to
implement Eq. (8). It receives the output of the previous step (the
number of shifts) and performs the shifting operation accordingly.
Fig. 6. The angular position detector unit. Step 4: The angular position of the number on the z-plane is detected
by two comparators (Fig. 6) and the evaluation of the real and imaginary
components of ’z’.
Step 5: To implement Eq. (9) and rotate numbers on the z-plane by
the integer multiplicand of 2π , instead of using a complex multiplier, it is
implemented by applying two multiplexers that are controlled by the
output of comparators in Fig. 6. The circuit is shown in Fig. 7.
Step 6: The recursive formula (2) is employed to compute the first 40
(8 for the 32-bit version) terms of the series in Eq. (1). To optimize
implementation, the relation is decomposed into four terms as expressed
in Eqs. (25) to (28). Eq. (25) is a constant value. Further, Eq. (26) is the
previous calculated term in the series, which is saved in a register. The
values in Eq. (27) are not dependent on ’z’, thus they are precomputed
and stored as a real vector in a LUT. Furthermore, Eq. (28) is computed
only once at the beginning of the calculation after the mapping step.
r0 = a = 1 (25)

rt− 1 (26)

4
− t
xt = 3 t ∈ {1, 2, …, 40} =>
t
{ }
1 1 116
Fig. 7. The rotation unit. LUT3 = ,− ,…− (27)
3 3 9

connected select bit in the main data bit word. The output range of the z− 1 (28)
MS1P function is [0 55]. To avoid the implementation of "division by 3′′
and floor rounding operations, two 54-point lookup tables, as expressed Using the four above-mentioned values, the following calculations
in Eq. (24), are used to provide 3* are performed for the series calculation. Eq. (26) is multiplied by Eq.
(27) by using two R-multipliers as expressed in Eq. (29). Then, Eq. (28) is
1 (for reverse rotation) multiplied by the result of Eq. (29) through a C_multiplier as expressed
values in the output. This unit is illustrated in Fig. 5. in Eq. (30). To increase the efficiency, as expressed in Eq. (31), the
C_multiplier is made up of 3 R-multipliers and 5 R_adders instead of
conventional 4 R_multipliers and 2 R_adders.

Fig. 8. The unit for step 6 of the algorithm (a Laurant series generator).

5
E. Rajaby et al. Microprocessors and Microsystems 100 (2023) 104847

Table 2
Some sample complex cube root calculation results obtained by the proposed hardware.
Number Exact cube root Cube root by the 56-bit hardware Relative error

− 3255.365024682178 + 7128.25509509333i 2.826762700162849 − 19.660566429183490i 2.82677118011569 - 19.6605760468843i 6.45e-07

1.04995984450020 + 0.674768219654003i 1.05721150890496 + 0.203760817022829i 1.057211508972549 + 0.203760817164240i 1.45e-10
− 0.004689200353823 + 0.00000000000000i − 0.167378471036465 + 0.000000000000i − 0.167378471639658+0.0000000000000i 3.60e-09
− 1401.327899706519 - 3068.48008295311i 2.13436837450103 + 14.8448581158965i 2.13436837450342 + 14.8448581158943i 2.16e-13
− 0.329488528351652 - 0.21174941676904i − 0.718429929994387 - 0.138466019596150i − 0.718429932013664 - 0.138466018477094i 3.15e-09
821.614695155512 + 9026.624548114502i − 5.389113344235370 − 22.214227764551840i − 5.38911130328913 - 22.2142286939108i 9.81e-08
1021.56803888605 - 2236.92198047282i − 1.92093153705092 + 13.3603723043068i − 1.92093153704819 + 13.3603723043047i 2.55e-13
0.0159305391625821 - 0.00467762834432i 0.253956056031554 - 0.0242498631143139i 0.253956056037006 - 0.0242498631180101i 2.58e-11
− 5481.34114634651 + 1609.46697740049i − 17.7955204147214 + 1.69926616794021i − 17.7955204147214 + 1.69926616794021i 2.04e-16

Table 3 Algorithm 1
The resource utilization of the 56-bit cube root hardware after place and root. The steps of the proposed complex cube root calculation.
Device utilization summary Input: z
Logic utilization Used Available Utilization 1
̃

Number of slice registers 722 126,800 0.6% Output: z3

n
Number of slice LUTs 4346 63,400 6.9%
Number of DSP48E1s 60 240 25% 1 Computing ‖ z‖1 .
2 Obtaining ‖ z ‖1 and ’m’ by (18) and (21).
′

3 ’z’ is rescaled by (8).

y = xt ⋅Re(rt− 1 ) + xt ⋅Im(rt− 1 )i (29) 4 Detecting the position of input ’z’ number in the complex plane by (5) and (6).
5 Scaled ’z’ is rotated by simple arithmetic operation to region S by (8) according to
Table 1.
(a + bi)⋅(c + di) =
6 Forty sequences of Laurent series (1) are computed in a loop using (2).
7 Rescaled answer is rotated using (23). Thus step 5 is compensated.
(a⋅c − b⋅d) + (a⋅d + b⋅c)i (30) √̅̅̅
8 Output of series is rescaled by (11) which compensates for step 3. This is 3 ̃z.

(a + bi)⋅(c + di) = (q1 + q2) + (q1 + q3)i

The design was implemented by using the Matlab HDL Coder
(q1 = d⋅(a − b), q2 = a⋅(c − d), q3 = b⋅(c + d)) (31) toolbox.
A counter unit controls the iteration steps until 40 sequences of the
series are generated and summed up as expressed in Eq. (32). The circuit 4. Results
diagram of step 6 is displayed in Fig. 8. The critical path is shown by a
dashed line. This path includes two multiplexers, an adder, and two The functional verification of the design was performed by applying
multipliers. various complex numbers with different amplitudes and angles to the
circuit and measuring the relative error. Some sample results are pre
rt = y⋅(z − 1)
sented in Table 2. Based on the results, relative errors are extremely
small. The hardware was synthesized and implemented on Artix 7 FPGA
st = st− 1 + rt (32)
XC7A100T using Xilinx ISE 14.7. The resource usage is provided in
Step 7: The reverse rotation of the output of the series is imple Table 3. The maximum throughput of the hardware includes 925,925
mented similarly to that of the second step by employing some multi samples per second at 37.037 MHz clock frequency. To compare our
plexers to exchange real and imaginary components and negate them. design in terms of precision and hardware usage with previous ones, a
Step 8: At the final step of the algorithm, the output of the reverse 32-bit version of the hardware was also implemented, with only the
rotation step is scaled by a dynamic shifter. Considering Eqs. (8) and cubic root core section without the scaling units in the front- and back-
(11), the number of shifts is one-third of the shifts applied in the second end of the system. The previous works, which were 32-bit designs, did
step, and it is in the reverse direction. The shift values are obtained from not scale numbers, and only accepted input numbers in a specified
a look-up table. range. They also computed the cube root with specific precision.

Table 4
Comparison of the present work with some similar works.
Ref. Year Type Device Resourceutilization Computation time Max error Input range and
(ns) precision

[11] 2009 Real N/A 12 Reg N/A N/A N/A

1966 LUT
15 DSP48
[19] 2014 Real Virtex 439 Reg 127.3 0.0001% IEEE 754–2008 standard (single
5 576 LUT *floating point precision)
12 DSP48 format
2015 Real Virtex 119 Reg 174 2% [0 232 )
[20]
5 380 LUT 1
0 DSP48
2016 Real Virtex 7 Reg N/A N/A [0 108 ]
[21]
5 2320 LUTs 1
4 DSP48
Present work: 32-bit complex cube 2022 Complex Virtex 140 Reg 122.8 0.29% Real and imaginary [− 2 2)
root core 5 421 LUT 2 − 14
5 DSP48

6
E. Rajaby et al. Microprocessors and Microsystems 100 (2023) 104847

Additionally, the 32-bit version circuit computed eight sequences of the [8] O. Ahmadi, F.R. Henriquez, Low Complexity Cubing and Cube Root Computation
over $\F_ {3^ m} $ in Polynomial Basis, IEEE Trans. Comput. 59 (10) (2010)
Laurent series to obtain a specific relative error of less than 0.29% in this
1297–1308.
study. [9] Y. Li, W. Chu, On the improved implementations and performance evaluation of
The timing and resource usage of the 32-bit version design and of digit-by-digit integer restoring and non-restoring cube root algorithms, in: 2016
some similar works that compute the real cube root are presented in International Conference on Computer, Information and Telecommunication
Systems (CITS), 2016, pp. 1–5.
Table 4. The 32-bit version can work at 65.14 MHz clock frequency and [10] S. Yammen, J. Ieamsaard, Newton’s cube root finding data sequence, in: 2021 9th
needs 8 clock cycles to complete the operation. Our design has International Electrical Engineering Congress (iEECON), 2021, pp. 405–407.
comparatively reasonable resource usage, speed, and precision while [11] V. Pieterse, P. Black, cube root. Dictionary of Algorithms and Data Structures,
2009.
having the capability of computing complex roots. The hardware pre [12] L. Moroz, V. Samotyy, C.J. Walczyk, J.L. Cieśliński, Fast calculation of cube and
sented in [20] is superior to our work only in terms of resource utili inverse cube roots using a magic constant and its implementation on
zation. The hardware reported in [19] is a suitable choice when the microcontrollers, Energies 14 (4) (2021) 1058.
[13] M.S.B. Mohamad, An algorithms for finding the cube roots in finite fields, Procedia
numbers are real but with high resource usage. For fair comparison, our Comput. Sci. 179 (2021) 838–844.
32-bit version hardware like other works was implemented on Virtex 5 [14] G.H. Cho, S. Kwon, H.-.S. Lee, A refinement of Müller’s cube root algorithm, in:
FPGA by ISE 14.7. Finite Fields and Their Applications, 67, 2020, 101708.
[15] C. Zhou, H. Geng, P. Wang, C. Guo, Ten-input cube root logic computation with
rational designed DNA nanoswitches coupled with DNA strand displacement
5. Conclusion process, ACS Appl. Mater. Interfaces 12 (2) (2019) 2601–2606.
[16] J. Jo, I.-.C. Park, Low-latency low-cost architecture for square and cube roots,
IEICE Trans. Fundam. Electr. Commun. Comput. Sci. 100 (9) (2017) 1951–1955.
The proposed hardware in this work computes complex and real
[17] A. Pineiro, J.D. Bruguera, F. Lamberti, P. Montuschi, A radix-2 digit-by-digit
cube roots by detecting approximate absolute value and angular position architecture for cube root, IEEE Trans. Comput. 57 (4) (2008) 562–566, https://
and then using a mapping (shift and rotation) process and computing the doi.org/10.1109/TC.2007.70848.
Laurent series. It is fast, with efficient resource usage due to utilizing [18] G.H. Cho, N. Koo, E. Ha, S. Kwon, New cube root algorithm based on the third
order linear recurrence relations in finite fields, Designs Codes Cryptogr. 75 (3)
techniques such as computational reuse, converting multiplications to (2015) 483–495.
add-shift operations, and using pre-computing data. The design can be [19] C.M. Guardia, E. Boemo, FPGA implementation of a binary32 floating point cube
utilized in different applications. For example, based on the application, root, in: 2014 IX Southern Conference on Programmable Logic (SPL), Nov. 2014,
pp. 1–6, https://doi.org/10.1109/SPL.2014.7002202.
the bit width of the signals and the number of sequences of the Laurent [20] R.V.W. Putra, T. Adiono, Optimized hardware algorithm for integer cube root
series can be modified to achieve desired run time and precision. In this calculation and its efficient architecture, in: 2015 International Symposium on
work we implemented two cases of 56-bit and 32-bit. With the proposed Intelligent Signal Processing and Communication Systems (ISPACS), 2015,
pp. 263–267.
design, higher-order roots can be computed by changing the pre [21] S.K. Padhan, S. Gadtia, B. Bhoi, FPGA based implementation for extracting the
computed coefficient of the Laurant series. roots of real number, Alexandria Eng. J. 55 (3) (Sep. 2016) 2849–2854, https://
doi.org/10.1016/j.aej.2016.07.003.
Declaration of Competing Interest

Elias Rajaby received his B.Sc. from Electrical Engineering

The authors declare that they have no known competing financial Department, Yazd University, Yazd, Iran, in 2014, and the M.
interests or personal relationships that could have appeared to influence Sc. degree in Digital Electronic Engineering from Amirkabir
the work reported in this paper. University of Technology, Tehran, Iran, in 2016. He is
currently pursuing his Ph.D. degree in Isfahan University of
Technology, Isfahan, Iran. His research interests include digital
Data availability system implementation, image processing, computer vision,
and evolutionary algorithms.
Data will be made available on request.

Supplementary materials

Supplementary material associated with this article can be found, in

Sayed Masoud Sayedi received B.Sc. and M.Sc. degrees in
the online version, at doi:10.1016/j.micpro.2023.104847.
electrical engineering from Isfahan University of Technology
(IUT), Isfahan, Iran, in 1986 and 1988, respectively. He also
References received his Ph.D. degree in electronics from Concordia Uni
versity, Montreal, QC, Canada, in 1996. From 1988 to 1992,
and then since 1997, he has been with IUT, where he is
[1] D. Trofimowicz, T.P. Stefański, Multimodal particle swarm optimization with
currently a full professor at the Department of Electrical and
phase analysis to solve complex equations of electromagnetic analysis, in: 2020
Computer Engineering. His current research interests include
23rd International Microwave and Radar Conference (MIKON), 2020, pp. 44–48.
VLSI fabrication processes, image sensors, low-power VLSI
[2] V. Maslennikov, Method for approximate determination of the roots of a cubic
circuits, and data converters.
equation with positive coefficients and complex conjugate roots, Vestnik
Natsional’nogo Issledovatel’skogo Yadernogo Universiteta MIFI 4 (2) (2015)
179–183.
[3] X. Liu, Amount of log-square-Hoyt fading in satellite optical communications, IEEE
Commun. Lett. 16 (5) (2012) 666–669.
[4] D.M. Kipping, Investigations of approximate expressions for the transit duration,
Mon. Not. R. Astron. Soc. 407 (1) (2010) 301–313. Ehsan Yazdian received his B.Sc. degree in electrical engi
[5] B. Ghazi, H. Hassanieh, P. Indyk, D. Katabi, E. Price, L. Shi, Sample-optimal neering from Isfahan University of Technology (IUT), Isfahan,
average-case sparse fourier transform in two dimensions, in: 2013 51st Annual Iran, in 2004. Furthermore, he received his M.Sc. and Ph.D.
Allerton Conference on Communication, Control, and Computing (Allerton), 2013, degrees in electrical engineering from Sharif University of
pp. 1258–1265. Technology, Tehran, Iran, in 2006 and 2012, respectively.
[6] S.-.H. Hsieh, C.-.S. Lu, and S.-.C. Pei, “Sparse fast fourier transform for exactly and Since 2013, he has been with the Electrical Engineering
generally k-sparse signals by downsampling and sparse recovery,” arXiv preprint Department, IUT. His current research interests include statis
arXiv:1407.8315, 2014. tical array signal processing, wireless communications, digital
[7] Y. Zhang, Z. Ke, D. Guo, F. Li, Solving for time-varying and static cube roots in real communication systems, software defined radio, and design
and complex domains via discrete-time ZD models, Neur. Comput. Appl. 23 (2) and implementation of signal processing algorithms on FPGA.
(2013) 255–268.

Procurment 1Z0-1065
No ratings yet
Procurment 1Z0-1065
79 pages
Operators Telco Cloud - White Paper: 1. Executive Summary
No ratings yet
Operators Telco Cloud - White Paper: 1. Executive Summary
9 pages
PSPCL Bill 3002171725
No ratings yet
PSPCL Bill 3002171725
2 pages
Power System Analysis
100% (8)
Power System Analysis
105 pages
Guardia 與 Boemo - 2014 - FPGA implementation of a binary32 floating point c
No ratings yet
Guardia 與 Boemo - 2014 - FPGA implementation of a binary32 floating point c
6 pages
Root Computation of Floating Point Numbers
No ratings yet
Root Computation of Floating Point Numbers
12 pages
FPGA Implementation of Modified Non-Restoring Square Root Core
No ratings yet
FPGA Implementation of Modified Non-Restoring Square Root Core
6 pages
Square Root SQRT - Novel FPGA
No ratings yet
Square Root SQRT - Novel FPGA
8 pages
Novel Square Root Algorithm and Its FPGA Implementation: G.Anupama, A.Raghuram
No ratings yet
Novel Square Root Algorithm and Its FPGA Implementation: G.Anupama, A.Raghuram
4 pages
Root Computation
No ratings yet
Root Computation
13 pages
Fpga Computation of Magnitude of Complex Numbers Using Modified Cordic Algorithm
No ratings yet
Fpga Computation of Magnitude of Complex Numbers Using Modified Cordic Algorithm
4 pages
Fix Point Implementation of Elementry Functions
No ratings yet
Fix Point Implementation of Elementry Functions
134 pages
Fast Inverse Square Root
No ratings yet
Fast Inverse Square Root
8 pages
Complex EMethod
No ratings yet
Complex EMethod
21 pages
symmetry-16-01138
No ratings yet
symmetry-16-01138
15 pages
An Optimized Square Root Algorithm For Implementation in Fpga Hardware
No ratings yet
An Optimized Square Root Algorithm For Implementation in Fpga Hardware
8 pages
Fast Inverse Square Root
No ratings yet
Fast Inverse Square Root
12 pages
Energy-Efficient Logarithmic Square Rooter For Error-Resilient Applications
No ratings yet
Energy-Efficient Logarithmic Square Rooter For Error-Resilient Applications
4 pages
Parallel_Square_and_Cube_Computations
No ratings yet
Parallel_Square_and_Cube_Computations
6 pages
Kremer MIsolating Real Roots Using Adaptable-Precision Interval Arithmeticaster
No ratings yet
Kremer MIsolating Real Roots Using Adaptable-Precision Interval Arithmeticaster
59 pages
Computation: A Modification of The Fast Inverse Square Root Algorithm
No ratings yet
Computation: A Modification of The Fast Inverse Square Root Algorithm
14 pages
Algorithm and Architecture For Logarithm, Exponential, and Powering Computation
No ratings yet
Algorithm and Architecture For Logarithm, Exponential, and Powering Computation
12 pages
Arith
No ratings yet
Arith
245 pages
Simple Computation of DIT FFT: International Journal of Advanced Research in Computer Science and Software Engineering
No ratings yet
Simple Computation of DIT FFT: International Journal of Advanced Research in Computer Science and Software Engineering
4 pages
Low-Complexity Methodology For Complex Square-Root Computation
No ratings yet
Low-Complexity Methodology For Complex Square-Root Computation
5 pages
FPGA Based Reciprocator
No ratings yet
FPGA Based Reciprocator
5 pages
Efficiently Computing The Inverse Square Root Using Integer Operations
No ratings yet
Efficiently Computing The Inverse Square Root Using Integer Operations
13 pages
406Fast Inverse Square Root
No ratings yet
406Fast Inverse Square Root
16 pages
Low Latency Floating-Point Division and Square Root Unit
No ratings yet
Low Latency Floating-Point Division and Square Root Unit
14 pages
155.FFT_ROPEC
No ratings yet
155.FFT_ROPEC
7 pages
Floating-Point Inverse Square Root Algorithm Based On Taylor-Series Expansion
No ratings yet
Floating-Point Inverse Square Root Algorithm Based On Taylor-Series Expansion
5 pages
Project Report Vlsi
No ratings yet
Project Report Vlsi
33 pages
Generalising The Fast Reciprocal Square Root Algorithm: Mike Day
No ratings yet
Generalising The Fast Reciprocal Square Root Algorithm: Mike Day
19 pages
ECE448 Lecture16 Fixed Point VHDL 2008
No ratings yet
ECE448 Lecture16 Fixed Point VHDL 2008
57 pages
Lecture Notes On Numerical Analysis
No ratings yet
Lecture Notes On Numerical Analysis
68 pages
cvc cut and paste
No ratings yet
cvc cut and paste
4 pages
Fast Floating Point Square Root: Thomas F. Hain, David B. Mercer
No ratings yet
Fast Floating Point Square Root: Thomas F. Hain, David B. Mercer
7 pages
Final Version
No ratings yet
Final Version
14 pages
1-s2.0-S0045790624001459-main
No ratings yet
1-s2.0-S0045790624001459-main
11 pages
Coordinate Rotation-Based Design Methodology For Square Root and Division Computation
No ratings yet
Coordinate Rotation-Based Design Methodology For Square Root and Division Computation
5 pages
Python Cheat Sheet
No ratings yet
Python Cheat Sheet
2 pages
Brent Elementary
No ratings yet
Brent Elementary
10 pages
Numerical Methods
No ratings yet
Numerical Methods
60 pages
Fpga Implementation of FFT Algorithms Using Floating
No ratings yet
Fpga Implementation of FFT Algorithms Using Floating
5 pages
RK4 Method
No ratings yet
RK4 Method
6 pages
Square Rooting Algorithms For Integer and Floating-Point Numbers
No ratings yet
Square Rooting Algorithms For Integer and Floating-Point Numbers
5 pages
Simplified VHDL Coding of Modified NonRe
No ratings yet
Simplified VHDL Coding of Modified NonRe
6 pages
Floating Point Elsevier
No ratings yet
Floating Point Elsevier
12 pages
Root Power2019
No ratings yet
Root Power2019
14 pages
Synopsis and Literature Survey
No ratings yet
Synopsis and Literature Survey
10 pages
COL100 IIT Delhi
No ratings yet
COL100 IIT Delhi
13 pages
CHAP 1 Intro MathBackGround
No ratings yet
CHAP 1 Intro MathBackGround
38 pages
VHDL Implementation of An Optimized 8-Point FFT - IFFT Processor in Pipeline Architecture For OFDM Systems
No ratings yet
VHDL Implementation of An Optimized 8-Point FFT - IFFT Processor in Pipeline Architecture For OFDM Systems
5 pages
Arbitrary-Precision Arithmetic
No ratings yet
Arbitrary-Precision Arithmetic
61 pages
Moler & Morrison - Pythagorean Sums
No ratings yet
Moler & Morrison - Pythagorean Sums
5 pages
Unit 4 - 2
No ratings yet
Unit 4 - 2
21 pages
Area-Efficient_Iterative_Logarithmic_Approximate_Multipliers_for_IEEE_754_and_Posit_Numbers
No ratings yet
Area-Efficient_Iterative_Logarithmic_Approximate_Multipliers_for_IEEE_754_and_Posit_Numbers
13 pages
Hardware Implementation of Bit-Parallel Finite Field Multipliers
No ratings yet
Hardware Implementation of Bit-Parallel Finite Field Multipliers
68 pages
May RL - Hal. 1 - 23
No ratings yet
May RL - Hal. 1 - 23
25 pages
Hybrid FP FXP Dot Product
No ratings yet
Hybrid FP FXP Dot Product
12 pages
Optimization of Fixed-Point Circuits Represented by Taylor Series and Real-Valued Polynomials Including Analysis of Precision and Range
No ratings yet
Optimization of Fixed-Point Circuits Represented by Taylor Series and Real-Valued Polynomials Including Analysis of Precision and Range
226 pages
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Internship RNT Robotics O
No ratings yet
Internship RNT Robotics O
5 pages
UCJV300 Multilayer Printing Guide
No ratings yet
UCJV300 Multilayer Printing Guide
30 pages
Vlsi Module-4
No ratings yet
Vlsi Module-4
40 pages
Free Proxy Listt
No ratings yet
Free Proxy Listt
5 pages
02 - Quotation Farmat English
No ratings yet
02 - Quotation Farmat English
3 pages
ATV900 Programming Manual EN NHA80757 02
No ratings yet
ATV900 Programming Manual EN NHA80757 02
664 pages
The Godfather Term One Sample Basic Four Annual Scheme of Learning Termly Scheme of Learning WEEK 1 - 12
No ratings yet
The Godfather Term One Sample Basic Four Annual Scheme of Learning Termly Scheme of Learning WEEK 1 - 12
304 pages
Arabsat Digital Television Bouquets Channels and Radio Programs List Satellite Frequency Type
No ratings yet
Arabsat Digital Television Bouquets Channels and Radio Programs List Satellite Frequency Type
68 pages
Money Sense
No ratings yet
Money Sense
140 pages
Deep Fakes - A Looming Challenge For Privacy Democracy and Natio
No ratings yet
Deep Fakes - A Looming Challenge For Privacy Democracy and Natio
69 pages
Iso TS 13399-150-2008
No ratings yet
Iso TS 13399-150-2008
94 pages
BICSI Best Practices
No ratings yet
BICSI Best Practices
23 pages
NEVADA UNEMPLOYMENT INSURANCE BENEFITS HACK @RichieHacker
No ratings yet
NEVADA UNEMPLOYMENT INSURANCE BENEFITS HACK @RichieHacker
11 pages
College Faculty List of Seminars/Training Attended First Semester S.Y. 2021-2022
No ratings yet
College Faculty List of Seminars/Training Attended First Semester S.Y. 2021-2022
2 pages
Digital Voltage Regulator: Installation and Maintenance
No ratings yet
Digital Voltage Regulator: Installation and Maintenance
20 pages
Scratch Coding Lesson Plan
No ratings yet
Scratch Coding Lesson Plan
2 pages
Case Summary Google in China Final
No ratings yet
Case Summary Google in China Final
3 pages
LM215WF1 Tle1
No ratings yet
LM215WF1 Tle1
35 pages
Barangay Final Inventory and Turnover
100% (3)
Barangay Final Inventory and Turnover
1 page
Chapter2: Homework: ST ND
100% (1)
Chapter2: Homework: ST ND
1 page
BT-1 - RS232 Turn Bluetooth Adapter, FCC ID: 2ANPBBT-1A0: Products Introduction
No ratings yet
BT-1 - RS232 Turn Bluetooth Adapter, FCC ID: 2ANPBBT-1A0: Products Introduction
2 pages
Ej1g Discontinuation Notice en
No ratings yet
Ej1g Discontinuation Notice en
6 pages
Recording N Editing
No ratings yet
Recording N Editing
20 pages
En LB470 Manual 56925BA2 04
No ratings yet
En LB470 Manual 56925BA2 04
229 pages
FujiXerox C1110 Service Manual
83% (6)
FujiXerox C1110 Service Manual
676 pages
ConocoPhilips Addendum To Norsok Z-010
No ratings yet
ConocoPhilips Addendum To Norsok Z-010
11 pages

Articulo 1

Uploaded by

Articulo 1

Uploaded by

Microprocessors and Microsystems 100 (2023) 104847

Contents lists available at ScienceDirect

Microprocessors and Microsystems

Hardware design and implementation of high-efficiency cube-root of

the input number. The design was implemented on a Virtex 5 device.

2. The proposed algorithm

Fig. 5. Circuit diagram of step 2 of the algorithm .

′ was optimized through a low-level design approach. Each step of the

LUT1 = { − 39, − 39, − 36, − 36, − 36, …, 15},

− 3255.365024682178 + 7128.25509509333i 2.826762700162849 − 19.660566429183490i 2.82677118011569 - 19.6605760468843i 6.45e-07

Number of slice registers 722 126,800 0.6% Output: z3

3 ’z’ is rescaled by (8).

(a + bi)⋅(c + di) = (q1 + q2) + (q1 + q3)i

[11] 2009 Real N/A 12 Reg N/A N/A N/A

Elias Rajaby received his B.Sc. from Electrical Engineering

Supplementary material associated with this article can be found, in

You might also like