0% found this document useful (0 votes)
20 views5 pages

2016 - High Throughput Pipelined Hardware Implementation of The Keccak Hash Function

Uploaded by

Mouna Bedoui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

2016 - High Throughput Pipelined Hardware Implementation of The Keccak Hash Function

Uploaded by

Mouna Bedoui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2016 International Symposium on Signal, Image, Video and Communications (ISIVC)

HIGH THROUGHPUT PIPELINED HARDWARE IMPLEMENTATION OF THE KECCAK


HASH FUNCTION

Hassen MESTIRI, Fatma KAHRI, Mouna BEDOUI, Belgacem BOUALLEGUE, Mohsen MACHHOUT

Electronics and Micro-Electronics Laboratory,


Faculty of Sciences of Monastir, University of Monastir, Tunisia
[email protected]

ABSTRACT The authors presented in [8] a new implementation of


KECCAK function on Virtex-5 FPGA platform. They show
The cryptographic hash algorithm has been developed by
that their implementation has better hardware.
designers with the goal to enhance its performances in terms
In this paper, we propose a new pipelined architecture for
of frequency, throughput, power consumption and area. The
the KECCAK-1600 cryptographic hash algorithm. We
cryptographic hash algorithm is implemented in many
present its detailed implementation in each operation.
embedded systems to ensure security. It is become the
Moreover, we compared the proposed architecture with
default choice to ensure the information integrity in
some previous designs for the KECCAK. These designs are
numerous applications. In this paper, we propose a pipelined
compared using common evaluation criteria such as the
architecture of the new algorithm SHA-3 (KECCAK). In
frequency, throughput, implementation cost, area and
addition, the proposed KECCAK architecture has been
efficiency.
implemented on Xilinx FPGA platform (Virtex-5). Its
The rest of the paper is organized as follows. The
frequency, efficiency, throughput and area have been
specification of the KECCAK cryptographic algorithm is
compared and discussed. The FPGA implementation results
given in section 2. The details of the implementation
show that the proposed architecture achieves good
KECCAK are present in section 3. In section 4, the
performance in terms of frequency and throughput.
KECCAK performances and the experimental synthesis
results are compared and discussed in terms of area,
Index Terms— Hash Functions, SHA-3 KECCAK,
frequency, efficiency and throughput. Section 5 concludes
Security, Pipeline Architecture, FPGA.
the paper.
1. INTRODUCTION
2. KECCAK HASH FUNCTION
In August 2015, the cryptographic hash algorithm was
2.1. Specification
selected by the National Institute of Standards and
Technology (NIST), when the KECCAK cryptographic
The algorithm of KECCAC is based on the sponge
algorithm was adopted [1]. The KECCAK algorithm
construction. The KECCAK hash function is the
replaced the hash function SHA-2, which had been in use
permutation f. This is applied to a fixed length state of b.
since 2009. Cryptographic KECCAK algorithm is presently
with b= r+c; c is a capacity, r is a bitrate. The higher
used in a very large variety of applications. The most
security and speed level correspond to higher values of c
common applications: financial transactions and
and r respectively. The hash procedure is as follow: first, to
e-commerce which have high security requirements [2].
get a fixed size message, the input message is padded. Then,
Currently, many implementations, for efficient VLSI
five internals steps are applied for each round. Finally, the
realization of KECCAK, have been proposed and their
squeezing phase occurs. The sponge function is composed
performances have been evaluated by using FPGA and
of two phases.
ASIC libraries [2-6].
• The Absorbing phase: the first input block of
In [7], Gholipour et al presented a review of KECCAK
length r is xored with r bits of the state. The
cryptographic algorithm and applied several technical to
transform function is applied to the state results.
enhance the performance with respect to timing, frequency
Like the previous, the next block is addition
and throughput. They conducted a comparative study
modulo 2 with the first results. This continues until
between different architectures implemented on FPGAs, in
all this input is processed [2].
terms of throughput, frequency and area.

978-1-5090-3611-0/16/$31.00 ©2016 IEEE 282 ISIVC’2016


• The Squeezing phase: the outputs contain the first r They feature simple logical operations and permutations of
blocks. They are returned from this state and the the state bits. Should be noted that the initial state is all zero
transformations are continued until all the blocks and in each round the introduced data is mixed with the
make for the output length desired are obtained. It current state.
should be noted that the final c blocks are not
directly by the input or taken as output [2]. Table 2. Value of RC[i] constant
Figure 1 illustrated the sponge function. RC[0] 0X0000000000000001 RC[12] 0X000000008000808B
RC[1] 0X0000000000008082 RC[13] 0X800000000000008B
RC[2] 0X800000000000808A RC[14] 0x8000000000008089
RC[3] 0X8000000080008000 RC[15] 0X8000000000008002
RC[4] 0X000000000000808B RC[16] 0X800000000000808B
RC[5] 0X0000000080000001 RC[17] 0x8000000000000080
RC[6] 0X8000000080008081 RC[18] 0X000000000000800A
RC[7] 0X8000000000008081 RC[19] 0x800000008000000A
RC[8] 0X000000000000008A RC[20] 0X8000000080008081
RC[9] 0X0000000000000088 RC[21] 0X8000000000008080
RC[10] 0X0000000000008082 RC[22] 0X0000000080000001
RC[11] 0X000000080000000A RC[23] 0X8000000800008008
Figure 1. Sponge Function

The state is composed of an array of 5×5 lanes. w is a


length of lane, when w ∈ {1, 2, 4, 8, 16, 32, 64}, and θ step:
C[x]=A[x,0] ⊕ A[x,1] ⊕ A[x,2] ⊕ A[x,3] ⊕ A[x,4]
(b=25w). (1)
The sponge construction is applied to KECCAK-f, so we D[x]=C[x-1] ⊕ rot(C[x+1],1)
applied the padding to the message input for obtaining the A[x,y]=A[x,y] ⊕ D[x]
KECCAK-f [r,c]. With c is capacity and r is bitrate.
All the operations on the indices are done modulo 5. A ρ and π steps:
signify the complete permutation state array, and A[x,y] B[y,2< x+3< y]=rot(A[x,y],r[x,y] (2)
show a particular lane in that state.
χ step:
The intermediate variables are B[x,y], C[x] and D[x].
A[x,y]=B[x,y]A((notB[x+1,y]) and B[x+2,y] (3)
RC[i] present the round constants. While the constants
R[x,y] are the rotation offsets. The binary cyclic shift ι step:
operation is indicated by Rot (w,r). The bit is shifted by (4)
A[0,0]=A[0,0] ⊕ RC
position i to position i + r (modulo the lane size). The
constants R[x,y] are the cyclic shift offsets and are specified 2.2. Sponge function
in Table 1.
A sponge construction also referred as sponge function is
Table 1. Constants R[x,y] of KECCAK algorithm the closest approximation to a random oracle except for the
X=3 X=4 X=0 X=1 X=2 side effects of finite memory or internal state collisions
which are absent in a random oracle. Bitrate r and capacity c
Y=2 25 39 3 10 43
are the two input parameters. This function is split into two
Y=1 55 20 36 44 6 parts. The first continent the r-bit of the state called the outer
Y=0 28 27 0 1 62 part and the second continent the c-bit which c=b-r called
Y=4 56 14 18 2 61 the inner part. To process the input message, the first step is
padded and cut into r-bit blocks, the state b-bit are
Y=3 21 8 41 45 15 initialized to zero
Using the sponge construction to process the input
message; it is decomposed into two parts. In the first phase,
Table 2 show the constants rounds RC[i]. These values all the blocks are processed iteratively by Xoring each block
are specified in hexadecimal notation for lane size 64. into the first r-bit of the current state. Second, we apply a
The hash function KECCAK-f consists of 24 rounds, fixed permutation on the value of the b-bit state. In the
there are identical. The process for each round has had five second phase; after processing all the blocks; the first r-bit
steps; step theta, step Rho, step Pi, step Chi and step Iota. of the state are returned as output, and then permutation is

283
applied. This operation is repeated until n output bits are • The Input/Output Buffer has been implemented to
produced. communicate efficiently with the external modules.
In the absorbing phase: first, the addition modulo 2 is • The Padder Unit: implements the padding operation
applied between the r-bit input message blocks and the first and the inversions per byte procedure and has an
r-bit of the state. Then, the results are interleaved with the output of 1600-bit which is the sponge function of
function f. Finally, all messages blocks are processed, the KECCAK. Then a 2-to-1 multiplexer drives the
compression function alters the second phase. output data from Padder to the primary components
In the squeezing phase: the first r-bit of the state is of KECCAK.
returned as output blocks and are also interleaved with the • The KECCAK Round is the main data path
function f. The number of output blocks can be arbitrary and component of the system architecture. The KECCAK
is chosen by the user. process requires 25 clock cycles to produce the
512-bit message digests. Each clock cycle requires
3. PROPOSED KECCAK CORPROCESSOR the previous round, as well as the constant value RC
at the start of the each round.
3.1. KECCAK pipelined hash architecture
3.2. Proposed KECCAK round
The proposed KECCAK pipelined architecture is shown
in figure 2. It consists of the Control Unit, the Input/Output The proposed KECCAK Round is shown in figure 3.
Buffer, the Padder Unit and the KECCAK Round. The round transformation includes five modules. It realizes
• The Control Unit is developed synchronize the flow the five operations: Theta, Rho, Pi, Chi, and Iota. At the
of data in the architecture, as well as data beginning of the round transformation, there is 2-to-1
communication between the Input/Output Buffer, the multiplexer for the round’s feedback.
Padder Unit and the KECCAK Round.

Figure 2. Proposed KECCAK Pipelined Architecture

284
The KECCAK-round process is as follow: the pre- Theta component θ: it takes the input message bits and
calculation of the 24 constant values and the use of registers applies addition modulo 2 between the lanes at each column.
to store them. Finally, a buffer contains the pre-calculated As a result, there are five xored columns. Then, those
constant values and feeds the corresponding value to Iota columns are left rotated by one bit and addition modulo 2
operation in each round. again with the results from previous operations. Finally, the
message data from the last XOR operations are driven to a
finally XOR stage with the component θ input lanes.
Rho component ȡ: the Rho component executes rotations
left each lane. Those rotations are different for each lane.
The rotation number per lane is obtained from the remainder
of the division between the fixed values and the length of
the lanes.
Pi component ʌ: the Pi component is a simple operation
was used instead of logic operations to modify the position
between the lanes according to the specifications. In
addition, logic operations (AND, XOR and NOT) between
the lanes are used by the component. These functions are
applied to entire rows of lanes for each row.
Chi component χ: there are five rows of five lanes, the
Chi component implement 25 NOT, 25 AND and 25 XOR
of 64-bit logic gates.
IOTA component i: the final component realizes an
addition modulo 2 between the round constant value and the
first lane (1599-1536).
For the pipeline technique, we inserted two registers in
the round transformation: the first register is inserted
between the Pi operation and Chi operation, so as to divide
the critical path in almost the half. The second register is
implemented at the end of the KECCAK round. It should be
noted that adding two registers between in combinational
Figure.3. Proposed KECCAK round pipelined architecture path results increasing the maximal frequency

Figure 4. Timing diagram for hashing process sequence

285
Figure 4 present the proposed methodology used to 317.11 MHz frequency while the KECCAK design in [11]*
perform the KECCAK round; it is to perform the padding occupied 2640 slices with 122 MHz operating frequency.
and the processing simultaneously. The application of the Although the design in [11]** increases the frequency
proposed methodology allows increasing the data processing compared to our work, the proposed design is more efficient
speed and decreasing the number of clock cycles. in terms throughput. The implementation results also show
that the proposed design is the most efficient.
4. EXPREMENTAL RESULTS AND COMPARISONS
5. CONCLUSION
The proposed KECCAK design has been described using In this paper, we proposed a new architecture for the
VHDL language, simulated by ModelSim simulator 6.6 and KECCAK hash algorithm. The proposed architecture has
synthesized with XILINX ISE 13.1. The FPGA platform been implemented on Xilinx FPGA platform. Its area,
target was XC5VFX70T from Xilinx Virtex-5 family. throughput, frequency and efficiency have been derived and
As presented in Table 3, the implementation results of compared and it is shown that the proposed architecture has
the proposed architecture are reported. The studied design the highest frequency, throughput and efficiency.
performances are: Area, Frequency, Efficiency and
Throughput. The data Throughput is calculated by REFERENCES
equation 5.
# bit × frequency [1] Morris J. Dworkin, Sha-3 standard: Permutation-based hash
Throughput = (5) and extendable-output functions. Federal Inf. Process. Stds. (NIST
# clock cycles FIPS) - 202, August 2015
[2] Fatma Kahri, Hassen Mestiri, Belgacem Bouallegue and
The Efficiency is calculated by using equation 6:
Mohsen Machhout, High Speed FPGA Implementation of
Cryptographic KECCAK Hash Function Crypto-Processor, Journal
Throughput (6) of Circuits, Systems, and Computers, Vol.25, No.4, 2016.
Efficiency =
Area [3] G. S. Athanasiou, G.-P. Makkas and G. Theodoridis, High
throughput piplined FPGA implementation of the new SHA-3
Table 3. FPGA implementation of the proposed design: Results cryptographic hash algorithm, 2014 6th Int.
Symp.Communications, Control and Signal Processing, Athens,
KECCAK Area Frequency Throughput Efficiency 21–23, pp. 538–541, May 2014.
design (Slice) (MHz) (Gbps) (Mbps/Slices)
[4] G. Bertoni, J. Daemen, M. Peeters and G. Van Assche, The
KECCAK SHA-3 submission (2011), Submission to NIST (Round
Proposed 4793 317.11 12.68 2.71 3), http://keccak.noekeon.org/Keccak-submission-3.pdf [Also see
NIST, Keccak hash function, 2014,
The hardware implementation of the KECCAK takes http://csrc.nist.gov/groups/ST/hash/sha-3.
4793 slices for 317.11 MHz frequency. This proposed [5] D. Barbara Nicholas and A. Sivasankar, Design of FPGA based
design achieves a throughput of 12.68 Gbps and 2.71 encryption algorithm using KECCAK hashing functions, Int. J.
Eng. Trends Technol, pp. 2438–2441, 2013.
Mbps/slices for efficiency.
Table 4 shows a comparison between the proposed [6] Y. Jararweh, L. Tawalbeh, H. Tawalbeh and A. Moh'd,
Hardware performance evaluation of SHA-3 candidate algorithms,
design with some previous works for FPGA
J. Inf. Secur, pp. 69–76, 2012.
implementation.
[7] A. Gholipour, S. Mirzakuchaki“High-Speed Implementation of
Table 4. FPGA implementation of the proposed design: the KECCAK Hash Function on FPGA”International Journal of
Comparison Advanced Computer Science, Vol. 2, No. 8, Pp. 303-307, Aug.,
Area Frequency Throughput Efficiency 2012.
KECCAK
(Slice) (MHz) (Gbps) (Mbps/Slices) [8] G. Provelergios, P. Kitsos, N. Sklovas, C. Koulames“FPGA-
275 259.2 0.075 0.58 Based Design Approaches of Keccak Hash Function”Digital
[9]
System Design (DSD), 2012 15th Euromicro Conference pp. 648-
[10] 393 159 0.846 2.19 653, Sept. 2012
[9] J.-P. Kaps, P. Yalla, K.K. Surapathi, B. Habib, S.
[11]* 2640 122 5.2 - Vadlamudi,S. Gurung, “Lightweight Implementations Of SHA-3
Finalists on FPGAs”, SHA-3 Conference, March 2012.
[11]** 3117 452 7.7 - [10] B. Jungk, "Evaluation Of Compact FPGA Implementations
For All SHA-3 Finalists", Third SHA-3 Candidate Conference,
Proposed 4793 317.11 12.68 2.71 March, 2012.
[11] F.-D.Pereira, E.-D. Moreno, I.-D.Sakai, A.-M. Souza
Compared to [9] and [10], the proposed architecture has "Exploiting Parallelism on Keccak: FPGA and GPU
the highest frequency and throughput. In terms of hardware Comparison" Parallel & Cloud Computing, Vol. 2 Iss. 1, pp. 1-6,
resources, the proposed architecture takes 4793 slices for Jan. 2013.

286

You might also like