0% found this document useful (0 votes)
15 views5 pages

Low Power and Area SHA-256 Hardware Accelerator On Virtex-7 FPGA

This paper presents a new low power and area hardware architecture for the SHA-256 algorithm implemented on a Virtex-7 FPGA, focusing on optimization techniques to enhance efficiency. The design achieves a maximum frequency of 83.33 MHz, dynamic power consumption of 13 mW, and area utilization of 275 slices, making it suitable for applications such as blockchain. The proposed architecture emphasizes power reduction while maintaining good throughput, contrasting with existing designs that prioritize speed.

Uploaded by

dasujayanth2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

Low Power and Area SHA-256 Hardware Accelerator On Virtex-7 FPGA

This paper presents a new low power and area hardware architecture for the SHA-256 algorithm implemented on a Virtex-7 FPGA, focusing on optimization techniques to enhance efficiency. The design achieves a maximum frequency of 83.33 MHz, dynamic power consumption of 13 mW, and area utilization of 275 slices, making it suitable for applications such as blockchain. The proposed architecture emphasizes power reduction while maintaining good throughput, contrasting with existing designs that prioritize speed.

Uploaded by

dasujayanth2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of NILES2020:

2nd Novel Intelligent and Leading Emerging Sciences Conference

Low power and area SHA-256 hardware


accelerator on Virtex-7 FPGA
Ali H. Gad1,*, Seif Eldeen E. Abdalazeem1,*, Omar A. Abdelmegid1,*, and Hassan Mostafa1,2,†
1
University of Science and Technology, Nanotechnology and Nanoelectronics Engineering Program,
Zewail City of Science and Technology, October Gardens, 6th of October, Giza 12578, Egypt
2
Electronics and Communication Department, Faculty of Engineering, Cairo University, Giza 12613,
Egypt
* These authors have contributed equally to this work.
† Author to whom correspondence should be addressed. Email: [email protected]
{[email protected], [email protected], [email protected]}

Abstract— Lately, there have been many technological collision resistance, that it is computationally infeasible to
developments in communication especially in online have two inputs with the same hashed output ≠
transactions, so the demand for highly secure systems and
cryptographic algorithms has increased. Cryptographic hash
( ) ≠ ( ) [3]. Besides, hash functions are highly
functions are used to protect and authenticate information and sensitive. A slight change in the input message will
transactions. SHA-256 (Secure Hash Algorithm-256) is a one- completely change the output hash.
way hash function characterized by being highly secure and fast SHA-256 takes any message up to 2 and outputs a hash
while having a high collision resistance. This paper presents a with a fixed length of 256 bits with 128 bits of security
new hardware architecture of SHA-256 with low power against collision attacks. Other hash functions provide
consumption and area based on a sequential computation of the different security and output lengths such as SHA-224, SHA-
message scheduler and the working variables of SHA-256. The
384, and SHA-512. Although SHA-512 provides better
hardware was described in HDL and implemented on Virtex-7
FPGA which offers high efficiency and speed. Different security, it consumes a greater memory and takes a longer
optimization techniques were used to further reduce the power computational time, making it unsuitable for some
and area such as gated clock conversion, arithmetic resource applications. For SHA-256 high security, high collision
sharing, and structural modeling of small building blocks. The resistance, and its computational time, it is used in various
proposed design ran with a maximum frequency of 83.33 MHz. important applications such as blockchain. The blockchain
The implementation reports indicated a dynamic power utilizes SHA-256 to authenticate and verify the transactions
consumption of 13 mW and area utilization of 275 slices while included in a block before joining the blockchain. It also
maintaining a good throughput of 0.637 Gbits/s and a relatively prevents any alteration in the contents of the block after being
high efficiency of 2.32 Mbits/s per slice. Such design with low
introduced to the chain, as any trivial change in the data will
power and area can be used to hash messages on a portable
device opening a whole new area for different applications and significantly alter the hash of the block and the following
opportunities. blocks connected to it, which will be easily detected.
An FPGA (field-programmable gate array) will be used
Keywords—SHA-256, low power, low area, hardware for implementation to exploit the hardware acceleration. It is
acceleration, FPGA, hash, blockchain. suitable to implement cryptographic algorithms and to
perform the tasks more efficiently allowing design
optimization. Encryption using FPGA is nearly 20 times
I. INTRODUCTION
faster than the dual-core processor while reducing the CPU
Due to the rapid development and huge growth of the usage by 85%. Moreover, it offers shorter design time, more
digital world of technology and the increase in internet speed, flexibility, and lower costs [4].
information security has become a necessity. these In this paper, investigations are made to implement a
algorithms are used widely in different applications banking, proposed design for SHA-256 along with optimization
shopping, and different money related activities and recently techniques to achieve low power and area on Xilinx Virtex-7
they are utilized in blockchain technologies and internet of FPGA (xc7v2000t fhg1761-2), using Vivado Design Suite, so
things applications due to the extensive research and projects that the design can be of future use for hardware-accelerated
focused on these two fields. especially hash functions, have blockchains on FPGAs. Most papers focused only on
become an essential part of securing data communication [1], increasing the SHA-256 throughput, while this paper’s main
[2]. target is to reduce the power consumed and the area utilized
Hash functions are used to protect data, verify a digital by the architecture while maintaining good throughput.
signature, or authenticate a wide range of online processes The paper is organized as follows. Section II describes
and transactions, using complex logical operations. The hash the SHA-256 algorithm and how it works. Section III
functions have unique properties making them highly secure. discusses the implementation of the proposed architecture
They are one-way functions, meaning for a given hash output and the optimization methods used. Section IV presents our
ℎ , it is computationally infeasible to retrieve the input results and comparisons with other papers. Finally, future
message , ℎ ( ) = ℎ . They also have a very high recommendations and the conclusion are given in section V.

978-1-7281-8226-1/20/$31.00
Authorized licensed use limited to:©2020 IEEE Downloaded on June 13,2025 at 10:12:55 UTC from IEEE Xplore.
Zhejiang University. Restrictions apply.
181
Proceedings of NILES2020:
2nd Novel Intelligent and Leading Emerging Sciences Conference

II. SHA-256 ALGORITHM


The SHA-256 algorithm aims to encrypt an intermediate Where the definitions of the logical functions are:
hash value using the message as a key. To hash a certain
message, it has to go through three distinct operations, ( )= ( )∧ ( )∧ ( )
padding, message scheduler, and compression function [3]. ( )= ( )∧ ( )∧ ( )
∑ ( )= ( )∧ ( )∧ ( )
A. Padding ∑ ( )= ( )∧ ( )∧ ( )
In padding, the message has to be padded such as it ℎ( , , ) = (¬ & )∧( & )
becomes a multiple of 512, parsed into 512-bit blocks, and an ( , , )=( & )∧( & )∧( & )
initial hash value must be set. To start padding, for a ℎ ℎ
message of length ℓ, the bit “1” is added to its end, followed ℎ
by zero-bits where is the smallest non-negative solution
to ℓ + 1 + ≡ 448 512, then the message length ℓ is The sequence of constant words , … , , are the first
expressed in binary in the last 64-bits. thirty-two bits of the fractional parts of the cube roots of the
After obtaining a padded message as a multiple of 512, it first sixty-four primes.
is parsed into 512-bit blocks [ : 1] so that each block
goes through the message scheduler and the compression III. IMPLEMENTATION
function. Before starting to compute the hash, an initial hash To design a low power and low area SHA-256
value is set to be the first 32-bits of the fractional parts of architecture, the proposed design was implemented along with
the square roots of the first eight primes. various optimization techniques to further decrease
B. Message scheduler consumption.
The proposed design is based on partitioning modules
In the message scheduler, a 32-bit message words and hierarchies to small building blocks to further optimize
( = 0, . . . , 63), generated times for each of the 512 them and keep the area usage to a minimum, as shown in
padded blocks, given by the following equation, are Fig.1.
()
initialized. equals for the first 16 words, and equals A. Padding
( )+ ( )+ + for the remaining In our design, in the padding block, on the rising edge of
48 words. the first clock, the input message is padded and parsed into a
512-bit block, it then goes through the bit selection block to
C. Compression Function be sent to the message scheduler during the course of the next
The hash computation proceeds as follows: 16 clock cycles in the form of 32-bit words.
=1
{ B. Message Scheduler
The working variables, registers , , … , are initialized The main role of the message scheduler block is to
with the ( − 1) intermediate hash value (the initial hash determine in order to compute the working variables and
value when = 1). Apply the SHA-256 compression the intermediate hash values. In normal designs, the 64
function to update , , … , message words are computed, then are used to determine the
For = 0 to 63 64 working variables. On the contrary, in our proposed
{ design, only one message word is computed and used to
Compute , ℎ( , , ), ( , , ), ∑ ( ), determine its corresponding working variables, all in one
∑ ( ) (their definitions are below) cycle. This in turn had a huge effect on throughput, area, and
= + + + ∑ ( ) + ℎ( , , ) power. Instead of computing all working variables in 128
clock cycles, 64 for message words, and 64 for the actual
= ( , , )+∑ ( ) working variables, the whole operation now only takes 64
= cycles, nearly doubling the throughput. It also decreased the
= area used in the register and its switching activity,
= resulting in huge power reduction.
= + C. Compression Function
= The compression block contains combinatorial
= functions used to calculate the variables, and a register
= memory storing the values of K constants, which reduces the
= + access time to the .
} The block receives one per cycle, along with values
from , Ch, Maj, sum1, and sum2, then it computes the
Compute the intermediate hash value ( ) as the sum
32-bit temporary variables which in turn update the working
of the previous hash and the registers , , … ,
variables at the rising edge of each clock. At the end of the
}. After the iterations, the hash of the message is ( ) = 64 clock cycle, the working variables are added to the
( ) ( ) ( )
( , ,… , ). intermediate hash values to output the final hash.

Authorized licensed use limited to: Zhejiang University. Downloaded on June 13,2025 at 10:12:55 UTC from IEEE Xplore. Restrictions apply.
182
Proceedings of NILES2020:
2nd Novel Intelligent and Leading Emerging Sciences Conference

Fig.1. Simplified Schematic Architecture of SHA-256

We used component inference in constructing our


modules. This allowed our design to reach a low power IV. THE RESULTS
consumption, to be optimized way further by the synthesis The proposed SHA-256 design was described using
tool, and to be portable onto various architectures without SystemVerilog. The design was synthesized and
modifications. implemented on Xilinx Virtex-7 FPGA (xc7v2000t fhg1761-
2), using Vivado Design Suite.
To further reduce and optimize the power consumption, The proposed SHA-256 architecture processes a 512-bit
we used resource sharing for the arithmetic components, due block within 67 clock cycles. The message padding takes 1
to the large number of operations performed, which cycle, hash computation takes 64 cycles, and outputting the
contributed to area and power reduction. Moreover, by final hash takes 2 cycles. Fig.2 shows the post-
decreasing the logical resources and introducing registers in- implementation timing simulation results of hashing the
between, the critical path also decreased leading to an messages abc (616263) and aaa (616161).
increase in the maximum allowed frequency and the overall The implemented hardware achieved a maximum
efficiency. Further, power gating technique was implemented frequency of 83.33 MHz with a static power of 148 mV and
to achieve a lower power consumption, however the area a dynamic power of 13 mV using 275 slices, 763 LUTs, and
usage increased because of the added logic gates and a clock 568 ffs. A throughput of 0.637 Gbits/s and an efficiency of
skew was introduced. As a result, a gated clock conversion 2.32 Mbits/s per slice were calculated using equations (1), (2)
from logic to flops was introduced to reduce the area and keep as shown below [5]
the power as it is. Also, a clock-tree synthesis is added to ( )
compensate for the skew through balancing clock buffers and ℎ ℎ =
routing. = 0.637 / (1)
ℎ ℎ
The results of the mentioned optimization techniques =
differed when implemented on different FPGAs.
= 2.32 /( . ) (2)

Fig.2. The post-implementation timing simulation results of hashing the messages 616263 and 616161

Authorized licensed use limited to: Zhejiang University. Downloaded on June 13,2025 at 10:12:55 UTC from IEEE Xplore. Restrictions apply.
183
Proceedings of NILES2020:
2nd Novel Intelligent and Leading Emerging Sciences Conference

Table I. Detailed Utilization Report AES IP. Their SHA-256 was implemented on different FPGA
devices, thus the Virtex platform was chosen to compare
Percentage of
Resources Utilization Available
Utilization
with. Though having a better throughput, they had a larger
area and a lower efficiency than our proposed design. [9]
LUT 763 1221600 0.062% implemented their SHA-256 on Virtex-5 operating at 179.08
FF 568 2443200 0.023% MHz using 2796 slices. They did not mention any power
Slices 275 305400 0.09% consumption results.
Table II summarizes the comparison between our work
and other existing work. Most referenced papers reported
For accurate power estimation, a switching activity
higher operating frequencies than our design. That is because
interchange format (SAIF) file was generated during a post-
most papers targeted throughput optimization and speed,
implementation timing simulation and used to generate a
whereas we focused on power optimization which results in
power report under maximum condition. The average
lower speed, which is what to be expected.
dynamic power consumed for hashing 512-bit block was 13
mW. The clocks, signals, logic, and I/O consumed 3 mW, 5 V. CONCLUSION & FUTURE RECOMMENDATIONS
mW, 4 mW, and 1 mW respectively. Table I summarizes the
proposed design results.
A new hardware implementation of SHA-256 on Xilinx
Throughout literature, there were inconsistencies in FPGA has been proposed in this paper. The SHA-256 is
reporting the results of proposed designs. Papers estimated chosen because of its complex security, high collision
their area utilization using slices, others used LUTs and Flip resistance, acceptable computation time along with its
Flops. Thus, the comparison report includes different factors utilization in different applications. Optimization techniques
and parameters to have a comparison, as fair as possible with were used based on gated clock conversion, arithmetic
resource sharing, and structural modeling of small building
other papers. Power consumption was rarely mentioned in
literature as most papers focused on optimizing speed, blocks.
represented in achieving high frequency and throughput. So The proposed design along with the optimization
some of the previous work we compare our design with will techniques used have resulted in a significant power and area
be old, as we could not find more recent publications. reduction with a relatively large efficiency while maintaining
a decent maximum frequency and throughput in comparison
with other related work. The hardware operated at 83.33 MHz
Only [6] and [7] mentioned the estimated power. [6]
compared the power consumption at a fixed throughput of 0.2 frequency with 0.637 Gbits/s throughput while utilizing 275
Gbits/s between ASIC and FPGA, thus their FPGA results slices and consuming 13 mW of dynamic power. Such design
were chosen. While [7] tried applying different techniques with low power and area can be used to hash messages on a
and estimating the power consumption of each one, their portable device opening a whole new area for different
applications and opportunities.
minimum power was chosen to compare with. Though, each
For future work, we recommend implementing the code
of the two papers reported higher power consumption than
our proposed design. on actual FPGA hardware and measuring the dynamic power
consumption to get a more accurate power estimation. Also,
In [8], their main target was decreasing power by implementing the same optimization techniques used on
decreasing the area. They hashed two 512-bits blocks, but no different hardware is recommended, as the effect of different
techniques on the power and area may depend on the
power results were mentioned, only their area utilization
which is larger than our proposed design. [4] implemented hardware used.
and compared an FPGA encrypt engine with SHA-256 and

Table II. Implementation results and comparison with previous work

Paper Our Work [4] [6] [7] [8] [9]


Slices 275 973 740 - - 2796
LUTS 763 - - - 1803 -
Flip Flops 568 - - - 714 -
Frequency
83.333 184.46 201.1 66 - 179.08
(MHz)
Throughput
0.637 1.388 0.2 - - -
(Gbits/s)
Efficiency
2.32 1.43 0.27 - - -
(Mbits/(s.slice))
Power
13 - 73.47 88.5 - -
(mW)
Platform Virtex-7 Virtex-5 Virtex-5 Virtex-2 Virtex-6 Virtex-5

Authorized licensed use limited to: Zhejiang University. Downloaded on June 13,2025 at 10:12:55 UTC from IEEE Xplore. Restrictions apply.
184
Proceedings of NILES2020:
2nd Novel Intelligent and Leading Emerging Sciences Conference

[4] C. Li, Q. Zhou, Y. Liu, and Q. Yao, “Cost-Efficient Data Cryptographic


VI. ACKNOWLEDGEMENT Engine Based on FPGA,” 2011 Fourth International Conference on
Ubi-Media Computing, 2011.
This work was supported by the Egyptian [5] R. García, I. Algredo-Badillo, M. Morales-Sandoval, C. Feregrino
Information Technology Industry Development Agency Uribe, and R. Cumplido, “A compact FPGA-based processor for the
Secure Hash Algorithm SHA-256,” Computers & Electrical
(ITIDA) under ITAC Program. Engineering, vol. 40, no. 1, pp. 194–202, 2014.
[6] X. Guo, et al., “On The Impact of Target Technology in SHA-3
REFERENCES Hardware Benchmark Rankings,” IACR Cryptology ePrint Archive,
vol.2010, pp. 536, Jan. 2010.
[1] M. Bahnasawi, A., K. Ibrahim, A. Mohamed, M. Khalifa, A. Moustafa,
K. Abelmonim, Y. ismail, and H. Mostafa, “ASIC-Oriented [7] R. G. Dimond, et al., “Combining Instruction Coding and Scheduling
Comparative Review of Hardware Security Algorithms for the Internet to Optimize Energy in System-on-FPGA,” 2006 14th Annual IEEE
of Things Applications”, IEEE International Conference on Symposium on Field-Programmable Custom Computing Machines,
Microelectronics (ICM 2016), Cairo, Egypt, pp. 285-288, 2016. 2006.
[2] N. Samir, A. S. Hussein, M. Khaled, A. N. ElZeiny, M. Osama, H. [8] M. Thakur, "Low Power Implementation of Secure Hashing Algorithm
Yassin, A. Abdelbaky, O. Mahmoud, A. Shawky, and H. Mostafa, (SHA-2) using VHDL on FPGA of SHA-256", International Journal
“ASIC and FPGA Comparative Study for IoT Lightweight Hardware for Research in Applied Science and Engineering Technology, vol. 6,
Security Algorithms”, Journal of Circuits, Systems, and Computers no. 5, pp. 2298-2303, 2018.
(JCSC), vol. 28, no. 12, pp. 1-13, 2019. [9] C. Jeong and Y. Kim, “Implementation of efficient SHA-256 hash
[3] Q. Dang, “Federal Information Processing Standard (FIPS) 180-4, algorithm for secure vehicle communication using FPGA,” 2014
Secure Hash Standard,” Aug. 2015. International SoC Design Conference (ISOCC), 2014.

Authorized licensed use limited to: Zhejiang University. Downloaded on June 13,2025 at 10:12:55 UTC from IEEE Xplore. Restrictions apply.
185

You might also like