INTEGRATION, The VLSI Journal: H.E. Michail, G.S. Athanasiou, G. Theodoridis, C.E. Goutis
INTEGRATION, The VLSI Journal: H.E. Michail, G.S. Athanasiou, G. Theodoridis, C.E. Goutis
art ic l e i nf o a b s t r a c t
Article history: In this paper, area-efficient and high-throughput multi-mode architectures for the SHA-1 and SHA-2
Received 23 December 2012 hash families are proposed and implemented in several FPGA technologies. Additionally a systematic
Received in revised form flow for designing multi-mode architectures (implementing more than one function) of these families is
4 February 2014
introduced. Compared to the corresponding architectures that are produced by a commercial synthesis
Accepted 7 February 2014
tool, the proposed ones are better in terms of both area (at least 40%) and throughput/area (from 32% up
to 175%). Finally, the proposed architectures outperform similar existing ones in terms of throughput and
Keywords: throughput/area, from 4.2 up to 279.4 and from 1.2 up to 5.5 , respectively.
Hash & 2014 Elsevier B.V. All rights reserved.
Authentication
Multi-mode
FPGA
1. Introduction through HMAC that is built on top of a hash function (e.g. MD-5 [8]
or SHA-1 [9]).
Due to the dramatic increase of electronic communications and However, security problems have been discovered in MD-5 and
transactions worldwide, security has become an indispensable SHA-1 functions. Specifically, the MD-5 class of hash functions has
feature of all systems and applications. A vital feature of the been totally broken [10]. On the other hand, concerning SHA-1,
security schemes that are used nowadays is authentication, which although its collision resistance has been reduced [11], the security
is achieved using cryptographic hash functions. Hash functions problems are non-critical. For that reasons, except the SHA-1, the
are used as single modules or they are included in hash-based SHA-2 hash function is expected to be adopted as a secure solution
authentication mechanisms such as the Hashed Message Authen- in security schemes (e.g. IPSec/IPv6) in coming years. Neverthe-
tication Code (HMAC) [1]. less, the US National Institute of Standards and Technology (NIST),
Furthermore, hash functions are used in the Public Key Infra- has established a competition for developing the new hash func-
structure (PKI) [2], Secure Electronic Transactions (SET) [3], tion standard (SHA-3), which was finalized in November 2012 [12].
and digital signature algorithms like DSA [4], which are used to As the transition to a new standard does not happen immediately,
provide authentication services in commercial applications such as SHA-1 and SHA-2 functions are expected to continue being used in
data interchange, electronic mail, and fund transfer. Additionally, near- and medium-future applications. In fact, NIST itself reports
hash functions are used in Web protocols such as the Secure that SHA-3 is not meant to replace SHA-1 and SHA-2, but to co-
Sockets Layer (SSL) and Transport Layer Security (TLS) [5]. exist with them, let alone that many administrators have not made
The importance of hash functions has been further increased in the jump from SHA-1 to SHA-2 yet [13,14].
recent years due to their inclusion in the Internet Protocol Security A crucial issue, arisen by the above transition from SHA-1 to
(IPSec) [6]. IPSec is a compulsory feature of the forthcoming SHA-2, is that different systems/applications have different needs
Internet Protocol version 6 (IPv6) [6] and includes encryption in terms of authentication's type. Hence, the systems have to be
and authentication schemes. Encryption is achieved through the flexible so as to be able to support more than one hash functions.
block cipher algorithm AES [7], whereas authentication is provided Although this requirement can be achieved by developing systems
where each hash function will be implemented by a separate
design, a more efficient solution in terms of area is the develop-
n
Corresponding author. ment of a multi-mode architecture that will be able to support
E-mail addresses: [email protected] (H.E. Michail),
[email protected] (G.S. Athanasiou),
with one design more than one hash functions. Such architectures
[email protected] (G. Theodoridis), [email protected] (C.E. Goutis). increase significantly the flexibility of the whole security module
URL: http://www.antcor.com (G.S. Athanasiou). allowing its use in a wide range of applications spreading from
http://dx.doi.org/10.1016/j.vlsi.2014.02.004
0167-9260 & 2014 Elsevier B.V. All rights reserved.
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
2 H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎
servers (which have to support a set of different hash-type timing constraints of modern applications. In the literature there
interactions), to end-users that communicate through many com- are numerous hardware implementations for SHA-1 and SHA-2
munication channels that, each of them, may employ different families, such as those presented in [15,16–18,31]. In these works,
hash functions for authentication purposes. novel architectures have been introduced to design solely each
In this paper, two multi-mode architectures, namely SHA-256/ hash function. However, there are significantly fewer works deal-
512 and SHA-1/256/512, are introduced. They achieve high ing with the development of multi-mode architectures for the
throughput rates, outperforming all the existing similar ones in above hash families.
terms of throughput/area cost factor. At the same time, they are Specifically, in [19–23,29,30,32] multi-mode architectures have
area-efficient. Specifically, they occupy less area compared to the been proposed for implementing in FPGA and ASIC technology the
corresponding architectures that are derived by simply designing SHA-1 and/or SHA-2 family. Besides the above works that focus on
the sole hash cores together (two/three separate cores designed as the SHA-1/SHA-2 hash family, multi-mode architectures that
one module having the same inputs and a multiplexer for selecting target on different hash functions have also been proposed. In
the preferred output and feeding them to a commercial FPGA [24] a multi-mode architecture for implementing the MD-5, SHA-
synthesis/P&R/mapping tool). The introduced designs are able to 1, and RIPEMD160 hash functions was proposed, while in [25] a
perform hashing on one message using the SHA-512 algorithm or unified reconfigurable HMAC unit that supports the MD-4, MD-5,
on two different messages when the SHA-256 or SHA-1 hash SHA-1, and RIPEMD160 hash functions has been introduced. In
function is executed. Several older (Xilinx Virtex, Virtex-II, Virtex-II [26] an HMAC unit that integrates the MD-5 and SHA-1 is
Pro, and Virtex-4) and modern (Xilinx Virtex-5, Virtex-6, and presented and implemented in FPGA and ASIC technologies.
Virtex-7) FPGA families are used for implementing the above Finally, in [27,28], ASIC implementations of multi-mode architec-
architectures. tures that include SHA-1 and SHA-2 hash functions are reported.
Moreover, a systematic design flow for producing multi-mode Although, novel multi-mode architectures have been introduced
architectures that implement more than one hash functions is in the above works, they suffer by the following problems. First, they
proposed. The inputs of the flow are the separate designs and the include algorithms such as MD-4 or MD-5 that they have been totally
algorithmic descriptions of the targeted hash functions and following broken [10,11] or they include the RIPEMD160 hash algorithm whose
a set of well-defined steps the individual designs are properly commercial use is very limited while it is not proposed by NIST.
merged to produce the final multi-mode architecture. To keep area Additionally, to the best of authors' knowledge, there are no FPGA
overhead low extensive recourse sharing is performed in the applied implementations of multi-mode architectures that support both the
steps. Also, exploiting special features of the hash functions, novel SHA-1 and SHA-2 hash families that are widely used by the current
techniques are introduced to keep the delay increase low. and it is also expected to be used by future applications. Such designs
Due to the fact that any architecture for the above hash families is are presented only in two works [27,28], which perform only ASIC
composed by similar functional blocks (adders, non-linear functions, implementations. Moreover, no systematic approach has been pro-
rotations, and logic modules), the proposed flow can be applied to posed for producing a multi-mode architecture, efficient in terms of
any RTL architecture of the SHA-1 and SHA-2 families. Thus, we apply area or delay. Instead, all the introduced architectures are produced
the proposed flow in both base and optimized hash functions' by an ad-hoc manner, making it difficult for another designer, who
designs, collect the corresponding results and also perform the has already developed sole hash cores, to follow a similar procedure
related comparisons. However, it must be stressed that the goal is and create multi-mode designs having the sole architectures as a
not to introduce general techniques for developing optimized designs starting point.
in terms of area, delay, or throughput. It is assumed that the
individual designs of each hash function have already been optimized
and then they are properly merged by the proposed flow. Further- 3. SHA-1/SHA-256 hash function background
more, the flow exploits specific features appeared in SHA-1 and
SHA-2 families and for that reason it is tailored to produce optimized A hash function, H(M), operates on an arbitrary-length message,
multi-mode architectures for them. It is not a general design flow M, and returns a fixed-length output, h, which is called hash value
that could be applied in general-purpose designs. or message digest of M. The aim of hash function is to provide a
To the best of our knowledge, it is a first time that (a) an FPGA “signature” of M that is unique. Given M, it is easy to compute h if H
of a multi-mode design was implemented that includes the SHA-1 (M) is known. However, given h, it is hard to compute M such that H
and SHA-2 hash families, and (b) a systematic design flow for (M)¼h, even when H(M) is known. Their nature is iterative and they
producing multi-mode architectures for implementing a set of hash include two stages: the pre-processing and the computation ones [9].
functions is presented. The main characteristics of the targeted hash families, namely
The paper is organized as follows. In Section 2 the related work is the SHA-1 and SHA-2 (SHA-224/256/384/512), are shown in
briefly stated. The basic background concerning the SHA-1 and Table 1. They include simple processing elements, such as arith-
SHA-2 hash families is presented in Section 3, along with a short metic computations (e.g. additions and non-linear functions) and
description of their base and optimized designs. In Section 4 the bit-level shift/rotations [9].
proposed design flow along with the production of above-mentioned As reported in the standard [9], the SHA-224 and SHA-384 hash
base multi-mode architectures is presented in detail. Continuously, in functions are exactly the same with the SHA-256 and SHA-512,
Section 5, the corresponding optimized multi-mode architectures respectively, having their outputs truncated. Hence, they are omitted
(constructed from the introduced flow) are presented. FPGA imple- from the paper's analysis. However, they are considered both in the
mentation results, along with the corresponding comparisons with design flow and the final designs, where a simple truncation of the
existing implementations, are provided and discussed in Section 6. output is accomplished in case of these two functions. Additionally, it
Finally, Section 7 concludes the paper. should be noticed that throughout the paper the terms hash function
and the hash algorithm will be used interchangeable.
Currently, there are many hardware implementations of hash A number of non-linear functions are applied on the w-bit
functions targeting high throughput rates, so as to meet the strict words that are represented as x, y, and z in the following. The
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 3
Table 1 multiple of 512 for SHA-1 and SHA-256 or 1024 for the SHA-512
SHA-1 and SHA-2 characteristics. algorithm. Since padding is a simple procedure, it is usually
implemented in software without affecting the security level of
Function j Massage Word Hash Iterations
block length value ðt maxj Þ the implementation. For more information about padding the
(k bits) (w bits) (n bits) reader is referred to the standard [9].
During parsing, the padded massage in separated in N k-bit
SHA-1 1 512 32 160 80 blocks denoted as M1, M2,…, MN with k taking according to Table 1.
SHA-256 2 512 32 256 64
SHA-224 – 512 32 224 64
Concerning SHA-1 and SHA-256 algorithms, since the 512-bit
SHA-512 3 1024 64 512 80 block can be expressed as 16 32-bit words, the first 32-bit of the
SHA-384 – 1024 64 384 80 message block i is denoted as M ðiÞ ðiÞ
0 , the next 32 bits denoted as M 1 ,
ðiÞ
and so on up to M 15 . Similar procedure is followed for SHA-512
with the exception that each M(i) is a 64-bit word.
∑j0¼ 2;3 ðxÞ ¼ ROTRα1;j ðxÞ ROTRα2;j ðxÞ ROTRα3;j ðxÞ Step 1. Message schedule preparation.
8
ð4Þ > ðiÞ
ðα1;j ; α2;j ; α3;j Þ ¼ ð2; 13; 22Þj ¼ 2 ; ð28; 34; 39Þj ¼ 3 > Mt ;
>
<
0 r t r15 ðj ¼ 1; 2; 3Þ
1
W t;j ¼ ROTL ðW t 3 W t 8 W t 14 W t 16 Þ; 16 r t r t maxj 1 ðj ¼ 1Þ
>
>
> sj ðW
: t 2 Þ s0 ðW t 15 Þ W t 7 W t 16 ;
j
1 16 r t r t maxj 1 ðj ¼ 2; 3Þ
∑j1¼ 2;3 ðxÞ ¼ ROTRα4;j ðxÞ ROTRα5;j ðxÞ ROTRα6;j ðxÞ
ð5Þ ð8Þ
ðα4;j ; α5;j ; α6;j Þ ¼ ð6; 11; 25Þj ¼ 2 ; ð14; 18; 41Þj ¼ 3
3.2. Constants K jt and initial values H(0),j ½ajjbjjcjjdjje ¼ ½T 1 jjajjROTL30 ðbÞjjcjjd ð10Þ
where || denotes the concatenation operation. According to
Constant values, K jt (0 r t r t max j 1), are used in each trans-
iteration number t, function ft(b,c,d) equals to: ft(b, c, d) ¼
formation round. Thus, the SHA-1 algorithm uses 80 32-bit
Ch(b, c, d) when 0 r t r 19, whereas ft(b, c, d)¼Parity(b, c, d)
constants, K 10 , K 11 ,…, K 179 , which are the first 32 bits of the when 20 r t r 39 and 60 r t r79, and ft(b, c, d)¼Maj(b, c, d)
fractional parts of the cube roots of the first 80 primes. Similarly, when 40 rt r 59.
SHA-256 function uses 64 32-bit constants, K 20 , K 21 ,…, K 263 , whereas The computations of the SHA-256 and SHA-512 (given below)
SHA-512 functions employs 80 64-bit constants K 30 , K 31 ,…, K 379 . are the same for both algorithms with the exception that
Concerning the initial values, the SHA-1 algorithm employs five SHA-256 performs 32 bit data processing whereas SHA-512
32-bit initial values, H 0ð0Þ; 1 , H 1ð0Þ; 1 ,…,H 4ð0Þ; 1 , which are used in the operates on 64-bit words.
first iteration (t¼ 0) of the transformation round. On the other For t¼ 0 to t maxj (j ¼2, 3)
hand, SHA-256 and SHA-512 use eight, H 1ð0Þ; j , H 2ð0Þ; j ,…,H 7ð0Þ; j (j¼ 2 T 1 ¼ hþ ∑j1 ðeÞ þ Chðe; f ; gÞ þ K jt þ W t;j ð11Þ
for SHA-256 and j¼ 3 for SHA-512), w-bit initial values (w¼32 for
SHA-256 and w¼ 64 for SHA-512). For all algorithms, the constant T 2 ¼ ∑j0 ðaÞ þ Majða; b; cÞ ð12Þ
and initial values are provided by the standard [9].
½ajjbjjcjjdjjejjf jjgjjh ¼ ½T 1 þT 2 jjajjbjjcjjdjjejjf jjg ð13Þ
3.3. Pre-processing stage
The pre-processing stage includes the padding and parsing of Step 4. Computation of the i-th intermediate hash value H(i).
the initial message M. Regarding padding, it is a procedure that
additional bits are added to M so that its size in bits to be a H ðiÞ;j ¼ a þ H 0ði 1Þ;j jjb þ H 1ði 1Þ;j jj…jje þH ði4 1Þ;j ðj ¼ 1Þ ð14Þ
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
4 H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎
H ðiÞ;j ¼ a þ H 0ði 1Þ;j jjbþ H ði1 1Þ;j jj…jjh þ H 7ði 1Þ;j ðj ¼ 2; 3Þ ð15Þ
}.
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 5
not that complex. Hence, we choose to demonstrate the intro- results in no complex designs that reduces the overall complexity
duced flow from the point that the base SHA-256/512 design of the flow, as it will be explained in the following.
is merged with the base SHA-1 design. The construction of the Regarding the transformation rounds, it is easily derived that
corresponding optimized designs is accomplished similarly, fol- only two working variables are modified during each iteration,
lowing the introduced flow. The resulted optimized architectures whereas the remaining ones stay intact. Particularly, in the SHA-1
are presented in Section 5. algorithm, only the variables a and c are modified, and in SHA-256
and SHA-512 algorithms only the variables a and e are computed.
This feature is important because it reduces considerably the
4.1. Special features of SHA-1, SHA-256, and SHA-512 complexity of the flow and the area of the final multi-mode
hash algorithms architectures. Hence, the effort is mainly focused on merging a
few numbers of sub-circuits and particular those that correspond
Taking into account Eqs. (8)–(13), which describe the transfor- to the modified variables.
mation rounds and message scheduling procedures, it is easily Additionally, the operations of the transformation rounds
derived that the major features of the targeted hash functions are include simple modulo 2w additions, simple bitwise logical com-
the similarity and the relative simplicity of the performed compu- putations in the non-linear functions, and rotation/shift operations
tations. The similarity of the computations is a general feature that with fixed amount of rotating/shifting bits. Hence, exploiting the
holds for both families, while the simplicity of the computations associativity and commutativity laws of addition and the low delay
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
6 H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎
of the non-linear functions and shift/rotations, the merging of each hash function has already been developed and we use it as
more than one function becomes more flexible. In other words, the input to the flow.
above features allow the reuse of addition and/or non-linear The flow consists of four main stages. In the first stage, an
functions to compute the modified variables of the other hash initialization takes place. Specifically, the acknowledgment of the
functions. In that way, the delay overhead on the multi-mode input designs' pipeline stages is performed, so as to repeat
design is reduced and resource sharing is easily applied. The above the next stages the same number of times. Additionally, the input
facts are also valid in the case of message schedule procedures of designs are compared and the bigger graph in terms of the
the targeted algorithms, because they have similar features as employed I/Os and computational resources is selected to be used
those of the transformation rounds. as the base design.
In the second stage the development of the multi-mode trans-
formation round of each pipeline stage takes place. To achieve this,
4.2. Overall structure of the systematic design flow an iterative procedure is followed considering two functions
each time. In particular, we use the design of the most complex
In Fig. 4 the overall structure of the proposed flow is illustrated. transformation round as the base design and we try to merge the
The inputs are the targeted hash algorithms and their individual design of the second function with it. After the development of
designs in the form of dependency graphs (block diagrams) where the multi-mode transformation round for the first two algorithms,
each node denotes a hardware resource, whereas an edge corre- the latter is used as base, and the round of a third function is
sponds to an interconnection among nodes or primary I/Os and chosen for merging. This procedure is repeated for all remaining
nodes. The output is a multi-mode architecture that implements a functions.
set of hash functions. Next, the development of the multi-mode message schedule
It must be mentioned that the goal of the introduced flow is not unit takes place, where a similar approach to that applied for
to propose techniques for developing an optimized design for each producing the multi-mode transformation rounds is followed.
one of the targeted hash functions. We assume that the design of Finally, at the fourth stage, the development of the control unit
is performed. Based on the developed multi-mode round and
message schedule units, the required control signals are specified.
Thus, the FSMs of each design are properly modified to develop
the FSM of the multi-mode architecture.
In the following sections, the above stages are presented in
details by a running example using the SHA-1 and SHA-256/512
algorithms. The similarities between SHA-256 and SHA-512 func-
Stage 1: tions are many. So, the creation of the multimode design is not
Initialization that complex. Hence, we choose to demonstrate the introduced
flow from the point that the SHA-256/512 is merged with the
SHA-1 design.
The inputs of the flow are the SHA-1 and SHA-256/512 designs
with 4 stages of pipeline. As it was mentioned above, in order to
Stage 2: clearly describe the flow, the individual designs (base designs)
Multi-mode derived straightforwardly based on the description of the corre-
transf. round sponding algorithms. Based on Eqs. (9)–(14) and the Sub-section 4.1,
Construction of the Multi-mode Paths the designs of the transformation rounds can be easily derived. The
corresponding graph for SHA-1 is shown in Fig. 1(a). The round's
graph of SHA-256/512 is almost the same with the one shown in
Fig. 1(b). As mentioned above, SHA-256 and SHA-512 are extremely
similar. Thus, the multi-mode transformation round of SHA-256/512
Stage 3: is almost identical with the one of SHA-256, having however double-
Wt
Multi-mode sized data-path (64-bits). Additionally, the Σ0 and Σ1 boxes include
Wt unit the non-linear functions of both the functions. More details will be
Construction of the Multi-mode Paths available in the following sub-sections.
As shown in Fig. 4, the initialization stage includes three steps.
Firstly, the input graphs are compared and the bigger one in terms
of the employed I/Os and computational resources is selected to be
used as base. It is obvious that the more complex is the SHA256/
Stage 4:
512 one. Hence it is considered as the base design. Then the
Multi-mode
control unit designs' number of pipeline stages is acknowledged. Hence, the
whole procedure of the design flow will be repeated 4 times,
equally to the number of the pipeline stages. However, it must be
stressed that the introduced flow can be applied to any architec-
ture independently of the number of the pipeline stages.
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 7
pipeline stage. This procedure is divided in three steps. First, the ignoring the assignment of primary I/Os. Then, the assignment of
development of the sub-circuits (circuit paths) required to realize the primary I/Os is performed, and finally the possible different
the computations of the transformation rounds takes place, word lengths of the involved algorithms are handled (Fig. 5).
Identical paths in y
the bigger round? 4.4.1.1. Multi-mode paths development. Initially, the circuit paths of
the smaller graph (i.e. SHA-1) are ranked in terms of their contribution
n on the delay and occupied area. Then, the first path of the list is
selected and we try to merge it with the resources existed in the base
Create new n Similar paths design, which in our case is that of SHA-256/512. Afterwards, the
path in the bigger procedure is repeated choosing the next path of the list until all paths
round?
of the small graph to be included in the base design.
y The merging of the circuit paths is performed by examining if
Create candidate multi-path(s) and check for there is a same or similar path in the base graph and adding the
resource sharing with the bigger round required steering logic (e.g. multiplexers). If there is no similar
path then a new circuit path is inserted in the base design with
the additional required circuitry. By similarity between two circuit
Establish resource sharing (steering logic) with the
paths it is meant that there is one to one correspondence
regarding the circuit modules and interconnections of the paths,
in other words the graphs of the two paths are isomorphic. To
Are there paths
y avoid overlong presentation, only the development of the first
in the list ? multi-mode transformation round is used as a running example.
Studying the two input transformation rounds, it is easily
n derived that the first selected path of SHA-1 is that shown in
Circuit paths development
Fig. 6(a), whereas investigating the graph of SHA-256/512 (base
graph) two identical paths exist as it is shown in Fig. 8(b) and (c).
For every candidate multi-path identify the I/O positions on the initial paths The split signal is omitted for clarity reasons. Thus, there are two
candidate multi-paths for merging the first selected path of SHA-1
with the current base design. Because the assignment of the inputs
Apply in the y
candidate
Same I/Os on the and outputs of the candidate paths is ignored during the first sub-
initial paths?
multi-paths I/Os assignment stage of the flow, the two candidate multi-paths are shown in
Fig. 7 using general names for the I/Os. It should be mentioned
n
that in Fig. 6(a) the Ch(d, c, b) function is used as a non-linear
function because this is executed in the first 20 iterations which
intact I/O positions during y Apply in the are assigned to the first pipeline stage.
candidate
?
multi-paths
Then, the second path (Fig. 8(a)) of SHA-1 is selected from the
list. Studying, the graph of SHA-256/512 we also found an identical
path (Fig. 8(b)) and by combining it with the currently selected
n
path of SHA-1 a third candidate multi-path is produced, as shown
Steering logic: intact I/Os in Fig. 8(c). Although this procedure is repeated for all paths of
SHA-1 graph, we will use the above paths to present the next steps
Adopt the most efficient multi-path of each group of the flow. The next step is the study of the generated multi-paths
with the paths existing in the base design to perform resource
Word-length handling sharing and keep the delay penalty low.
Different Width?
n
4.4.1.2. Resource sharing with the base design. The two selected
y paths of SHA-1 graph (Figs. 6(a) and 8(a)) are used together for
the computation of the aSHA-1 variable, as it is shown in Fig. 1.
In addition, they are used for the production of the three candidate
multi-paths shown in Figs. 7 and 8(c). Specifically, the first path of
Possible multiple SHA-1 graph (Fig. 6(a)) is included in the first two multi-paths
n
instances of the smaller (Fig. 7), whereas the second path of SHA-1 graph (Fig. 8(a)) is
hash graph
included in the third candidate multi-path (Fig. 8(c)).
Therefore, the computation of the aSHA-1 variable can be
y
implemented by combining either the first and third multi-paths
Enable feature Add appropriate resources or the second and third multi-paths. To keep the delay and area
overheads low, the pair of multi-paths, which will be selected, has
to be properly merged with the resources existing in the base
Fig. 5. Stage 2: multimode transformation round. design (SHA-256/512 transformation round).
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
8 H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎
X1
+ Y2
Y1 + + T2
Y2
+ + +
+ +
T1
at-1
et at
Fig. 6. The first selected path of SHA-1 graph (a), identical paths on SHA256/512 graph (b) and (c)
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 9
Fig. 8. The second selected path of SHA-1 graph (a), identical path on SHA256/512 graph (b), the third candidate multi-path (c).
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
10 H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Fig. 10. First (a) and second (b) multi-paths after of I/Os assignment.
Wt-1
+
Kt-1
Ch(x,y,z)
In5
+
+
MUX In6
+
+
SHA-512 /
SHA-256
at
SHA-1 at
Fig. 11. Merged SHA-1/256/512 multi-paths with resource sharing and I/Os
assignment. Fig. 12. Multi-mode transformation round of the first pipeline stage.
the first pipeline stage of the architecture of Fig. 2 is produced and It should be mentioned that the multi-mode transformation
it is depicted in Fig. 12. round of the second and fourth pipeline stages are identical. This
Repeating the above procedure the multi-mode transformation happens, because in these pipeline stages the assigned transfor-
rounds for the second, third, and fourth pipeline stages are produced mation rounds of SHA-1 function are identical. Additionally, due to
and illustrated in Fig. 13. the fact that the Parity(x,y,z) non-linear function is used in these
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 11
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
12 H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎
inserted sequentially (i.e. firstly the message block of the first The control logic for the final multi-mode hash design is
message and then the message block of the second one). This way constructed by considering the control unit with the greatest
the concurrent hash computation would be impossible. Instead, an number of states and modifying it so as to be able to control
interleaving of the 16 message dependent W values of each block all the incorporated hash function algorithms. Beyond that, the
is performed. Specifically, in each of the 16 64-bit positions of the resulted FSM, along with the 4 counters, are designed to perform
shift register, two 32-bit W values are stored, namely the 32-bit W On-the-Fly functionality. Specifically, each one of the 4 pipeline
value of the first message block in the 32 most significant bits of stages (including its transformation round, its W computation unit
the position and the corresponding 32-bit W value of the second and the corresponding counter module) can realize different hash
message block in the remaining 32 bits of the position (less computations and perform them concurrently. For example, the
significant bits). Hence, at each clock cycle, two W values are fed first stage can perform a SHA-512 computation while, at the same
in the transformation round, one from the first message block and time, the second one can perform 2 concurrent SHA-1 or SHA-256
one from the second. Consequently, as it can be observed in Fig. 13, ones. This way, the pipeline structure, together with the multi-
the calculation of the rest W values by the Wnext Logic block is mode principle, is fully exploited. This is achieved by determining
performed pairs (one for the first hash computation and one specific flag bits that indicate the current, previous and forth-
for the second one), in order to be stored in the same manner as coming hash functions. By exploiting this information in the
described above. resulted FSM, through flag-bits, issues of stalling the hash compu-
tation of a stage in order for the next one to finalize its computa-
tion, when the second one needs to iterate more times, are
4.6. Stage 4: multi-mode control unit overcome.
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 13
over-length text, in this section we present only the, more the multi-mode rounds includes a Pre- and Post-Computation
complex, SHA-1/256/512 architecture. stage, separated by a register. It must be stressed that in all the
above figures, the split signal (responsible for choosing between
5.1. Multi-mode transformation rounds 2 32-bit or 64-bit operations) is omitted in the detailed pre-
sentation of the rounds for clarity reasons. However, it exists and
As the design flow imposes, firstly, we constructed the multi- its responsibility is the same as described in Section 3.
mode transformation rounds. The input designs were 4-staged A key point of the resulted optimized multi-mode transforma-
pipelined. Hence, the flow was repeated 4 times, one for each tion rounds is the splitting of the Carry-Save Adders (CSAs) of
pipeline stage. The resulted multi-mode rounds are depicted in the second and fourth rounds (Fig. 17). This happens during the
Figs. 15–17. Similarly to sole optimized architectures, each one of “resource-sharing” sub-stage of the flow and it is accomplished by
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
14 H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Fig. 17. Second and fourth multi-mode round of optimized SHA-1/256/512 architecture.
splitting the carry-save tree from the final addition of the CSA, in control unit with the greatest number of states is considered and is
order for the first to be used as Par non-linear function when SHA- modified so as to be able to control all the incorporated hash
1 function is selected. This choice slightly affects the overall function algorithms.
round's delay. However increases in the area gains due to the fact
that (a) extra logic for the non-linear function is not added and
(b) the certain path's output had to assigned to different output 6. Experimental results
register (depending on the selected function) during the flow's
“I/O assignment”, leading to more steering logic which increases The introduced SHA-256/512 and SHA-1/256/512 multi-
not only the area but the delay as well. Due the same reasons, in mode architectures were captured in VHDL and implemented in
the third multi-mode round (Fig. 16), an extra Maj box is included. a wide range of Xilinx FPGAs. Specifically, six Xilinx families are
considered, three older ones: Virtex (xcv1000-6FG680), Virtex-II
5.2. Multi-mode message scheduling units (xc2v6000-6FF1517), and Virtex-4 (xc4vlx100-12FF1148), and
three modern ones: Virtex-5 (xc5vlx155t-3FF1136), Virtex-6
Similarly to the procedure followed in the construction of (xc6vlx240t-3FF784), and Virtex-7 (xc7v855t-3FFG1157). It should
multi-mode base designs, after the multi-mode transformation be mentioned that the Virtex and Virtex-II technologies were
rounds, the multi-mode Wt units take place. Due to the parametric selected only for comparison with existing similar works.
loop unrolling exploited in the initial optimized designs, two W The XST synthesis tool of Xilinx ISE Design Suite (version 13.1)
values are simultaneously fed per clock cycle in a transformation was used for mapping the designs to the FPGA devices. The
round. Thus in the initial, optimized, Wt units two W values are functionality of the implementations was initially verified via
produced at the same time. Therefore, the corresponding Wnext Post-Place-and-Route simulations using the MentorGraphics's
logic is doubled. ModelSim simulator. A large set of test vectors, apart from those
The above fact does not affect the application of the introduced provided by the standard, were used. Also, downloading to
design flow. Following the stages as reported in Section 4, we development boards, additional functional and timing verification
constructed a Wt unit able to support the process of either of two were performed.
independent 512-bit message blocks (when SHA-1 or SHA-256 The quality of the introduced architectures was measured in
is selected) or one 1024-bit message block (when SHA-512 is terms of frequency, throughput, area, and throughput/area cost
selected). The unit (Fig. 18), in any case, produces two 64-bit Wt factor. Regarding throughput of each hash function, it is calculated as
values per clock cycle. If the SHA-512 function is selected, then this
#bits f
quantity represents one 64-bit Wt value. If the SHA-1 or the SHA- Throughput ¼ ð16Þ
#cycles
256 function is selected the above quantity includes two 32-bit Wt
values, each one correspond to two independent 512-bit message where # is the number of bits of incoming message (512 bits for SHA-
blocks. 1, SHA-256 and 1024 for SHA-512), f is the operating frequency, and
#cycles is the clock cycles spending in each case (20/10 for SHA-1,
5.3. Multi-mode control units SHA-512 and 16/8 for SHA-256, base/optimized architectures).
As mentioned in Section 4, there is a word-length mismatch
Finally, the construction of the control logic for the final multi- between SHA-1, SHA-256, and SHA-512 functions. Specifically,
mode hash design takes place. Similarly to the base designs, the the SHA-512 function operates on 64-bit data, while the SHA-1
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 15
Table 2
Frequency of the base and optimized architectures.
and SHA-256 ones operate on 32 bits. The above is handled, in by ISE design suite is presented, and finally comparisons with
favor to the proposed multi-mode architectures by merging two existing similar works are offered.
SHA-1 (or SHA-256 depending on the initial selection) operations
in the data buses of SHA-512. This is achievable due to the fact that 6.1. Evaluation of the proposed architectures
the majority of the included operations of hash functions (Section
4.1) are bit-wise. For the cases that it does not hold, appropriate Table 2 presents the frequency of the individual and multi-
modifications are made (see Sub-sections 4.4 and 4.5). As a result, mode designs in representative Xilinx technologies. The second,
two independent message blocks are able to be processed, when third, and fourth columns correspond to the separate designs of
either SHA-1 or SHA-256 function is selected. Hence, the through- SHA-1, SHA-256, and SHA-512 functions, respectively, whereas the
put value of SHA-1 and SHA-256 functions, regarding the pro- last two columns correspond to the multi-mode SHA-256/512 and
posed multi-mode architectures, is calculated by doubling the SHA-1/256/512 architectures. As it was expected the following can
value that is calculated by Eq. (16). be put forth:
Below, the experimental results are presented in details. First,
we present and discuss the results for the proposed architectures The frequency of the multi-mode architectures is lower than
in terms of throughput and area, then a comparison between the the frequency of the separate designs due to the additional
designs produced by the proposed flow and the designs produced logic that is inserted in the multi-mode architectures.
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
16 H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Fig. 19. Area comparisons: (a) base designs and (b) optimized designs.
The optimized multi-mode designs outperform the corresponding Figs. 20 and 21 illustrate this comparison for the proposed
base ones. This means that although the separate optimized SHA-256/512 and SHA-1/256/512 multi-mode architectures.
designs, which are used as input in the proposed flow, are more Concerning the base SHA-256/512 multi-mode design (Fig. 20(a))
complex than the base ones, the flow works efficiently without in Virtex-6 technology, the throughput of the SHA-256 function
inserting large logic that could make the optimized multi-mode (7.6 Gbps) is improved by 65.3% over the individual implementation
designs worst compared to the base ones. of this hash function (4.6 Gbps). This happens because in the multi-
The frequency is improved substantially when modern tech- mode architecture two SHA-256 message blocks are processed
nologies are used. concurrently. Regarding the throughput of SHA-512 function in the
base SHA-256/512 multi-mode design, it equals to 6.1 Gbps, while
However, to evaluate the quality of the produced multi-mode the SHA-512 one achieves 6.3 Gbps, which corresponds to a 3.5%
architectures the area and throughput factors must be studied. reduction. However, as it was mentioned above, the area of the
Fig. 19 depicts the area of the base and optimized designs. base multi-mode SHA-256/512 design is improved by 34.9%. Con-
For each FPGA family, the first three bars corresponds to the area sequently, the proposed multi-mode SHA-256/512 architecture
of the individual (single-mode) designs, the fourth and fifth bars improves the area factor by 34.9%, while the throughput of
correspond to the sum of the area of the considered sole designs, SHA-256 function is improved by 65.3% and the throughput of the
whereas the last two bars correspond to the area of the proposed SHA-512 function is slightly reduced by 3.5%.
SHA-256/512 and SHA-1/256/512 multi-mode architectures. For the case of the optimized SHA-256/512 multi-mode design
Studying the Virtex-6 implementations (Fig. 19), it is derived in Virtex-6 technology, the area is improved by 36.9%, as men-
that the area of the base SHA-1/256/512 architecture (3787 slices) tioned above. Also, the throughput of the SHA-256 function is
is reduced by 28.6% compared to the total area of the three improved by 64.7%, whereas the throughput of the SHA-512
separate designs (5302 slices). Concerning the optimized designs, function is reduced by 14.3%. The larger throughput reduction of
the area of the SHA-1/256/512 multi-mode architecture is SHA-512 compared to the throughput reduction of the base
4129 slices, whereas the total area of the separate designs is designs happens because the optimized designs are more complex
6177 slices, which corresponds to 33.2% area reduction. Also, the than the base ones. Thus, more extra logics are required to develop
area of the base SHA-256/512 multi-mode architecture (3179 the multi-mode architecture, which results in more frequency
slices) is reduced by 34.9% compared to the total area of the SHA- degradation. Mentioning that the SHA-512 design was used as the
256 and SHA-512 designs (4290 slices), while in the optimized initial design by the proposed flow. Studying the implementation
SHA-256/512 design the area reduction is 36.9%. Similar conclu- results of SHA-256/512 multi-mode architecture in the other FPGA
sions are derived regarding the implementations in the other families, similar outcomes are also derived.
families. Concluding, the major outcome is that in most of the cases the
Thus, the first outcome is that a remarkable resource sharing is proposed flow and multi-mode implementations achieve an area
achieved and the area of the produced multi-mode architectures is reduction of more than 30% compared to the total area of the
reduced by 30% about compared to the total area of the individual individual design. Also, the throughput of the SHA-1 function is
designs. improved up to 29% (base SHA-1/256/512 in Virtex-4), where for
However, as it is shown in Table 2 the frequency of the multi- the SHA-256 function the throughput improvement is up to 67%
mode architectures is lower than the frequency of the individual (base SHA256/512 in Virtex-5) and up to 45% (optimized SHA-1/
designs, thus a comparison in terms of throughput is required. 256/512 in Virtex-6). Finally, the throughput of the SHA-512
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 17
Fig. 20. Multi-mode SHA-256/512 architecture – throughput comparisons: (a) base designs, and (b) optimized designs.
Fig. 21. Multi-mode SHA-1/256/512 architecture – throughput comparisons: (a) base designs, and (b) optimized designs.
function is reduced from 3% (base SHA-256/512 in Virtex-4) up to 6.2. Comparison with commercial synthesis tool (Xilinx ISE)
27%(optimized SHA-1/256/512 in Virtex-7). Though, there is a
throughput reduction for the SHA-512 function, the final SHA-1/ A major question that arises regarding the efficiency of the
256/512 is capable to support both the three hash functions with proposed flow and the produced multi-mode designs is whether it
one low-area design. is possible to produce better multi-mode designs if the individual
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
18 H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 19
Fig. 22. Area comparisons: (a) base SHA-256/512, (b) base SHA-1/256/512, (c) optimized SHA/256/512, and (d) optimized SHA-1/256/512.
Fig. 23. Throughput/area comparisons – base designs: (a) SHA-256/512 and (b) SHA-1/256/512.
concern SHA-1/256, SHA-1/256/512, SHA-256/512, and SHA-384/ Indicatively, for SHA-256/512, regarding throughput, the improve-
512 multi-mode architectures implemented in FPGA technology. ments are from 4.2 (SHA-512 – Virtex-II) to 23.8 (SHA-256 –
However, these deigns belong to the category of the optimized Virtex-II), while concerning throughput/area, the improvements are
designs since techniques such as pipeline and loop unrolling have from 1.2 (SHA-512 – Virtex-II) to 5.5 (SHA-256 – Virtex).
been applied to improve performance. Thus, the proposed opti- The reason why the introduced architectures achieve signifi-
mized multi-mode architectures are used for comparisons. cantly better results in terms of throughput compared to the
Tables 7–10 present the comparisons in terms of frequency, others is twofold. Firstly, the proposed multi-mode architectures
area, throughput, and throughput/area metrics. As it can be seen, have their transformation round unrolled-by-2 and, in parallel,
the proposed architecture outperforms all the existing ones in they are 4-stage pipelined. Hence, the denominator of Eq. (16) is
both throughput and throughput/area metrics. divided by 4, compared to the initial number of the function's
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
20 H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Fig. 24. Throughput/area comparisons-optimized designs: (a) SHA-256/512 and (b) SHA-1/256/512.
Table 7 Table 10
Comparison of proposed optimized SHA-1/256 architecture to similar existing ones. Comparison of proposed optimized SHA-384/512 architecture to similar existing
ones.
Techn. Ref. Freq. (MHz) Area (slices) Throughput (Gbps)
Techn. Ref. Freq. (MHz) Area (slices) Throughput (Gbps)
1 256
Virtex-4 [32] 182.3 1731 2.3
Virtex-5 [29] 227 371 1.4 1.8 Prop. 119.6 7224 12.2
Prop. 159.4 2104 8.2 10.2
SHA-256 SHA-512
1.20
Table 8
0.99
Comparison of proposed optimized SHA-1/256/512 architecture to similar existing 1.00
ones.
0.80
0.80 0.75
Mbps/Slice
Techn Ref. Freq. (MHz) Area (slices) Throughput (Gbps) 0.67 0.67
0.60
0.60
1 256 512
0.40 0.34 0.34
Virtex-6 [30] 271 251 0.067 0.064 0.046
Prop. 124.7 4129 12.8 15.9 12.9 0.22 0.20
0.20 0.14
0.11
0.00
[20] [23] Proposed [19] [21] Proposed
Table 9
Virtex Virtex-II
Comparison of proposed optimized SHA-256/512 architecture to similar existing
ones. Fig. 25. Throughput/area comparisons between proposed optimized SHA-256/512
architecture and similar existing ones.
Techn. Ref. Freq. (MHz) Area (slices) Throughput (Mbps)
256 512
iterations. Secondly, regarding SHA-256, as mentioned before, two
Virtex [20] 50 2951 400 320 independent input blocks are able to be consequently processed,
[23] 53 2530 848
leading to doubling the throughput that is calculated by Eq. (16).
Prop. 40.3 6911 5158 4127
Finally, in Fig. 25, the throughput/area comparisons between
Virtex-II [19] 74 2384 291 467
the proposed optimized SHA-256/512 architecture and the similar
[21] 81 1938 1296
Prop. 54.1 6968 6925 5540
existing ones are provided, illustrating the fact that the proposed
designs outperform against the competition.
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i
H.E. Michail et al. / INTEGRATION, the VLSI journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 21
7. Conclusions [26] M.-Y. Wang, C.-P. Su, C.-T. Huang, C.-W. Wu, An HMAC processor with
integrated SHA-1 and MD5 algorithms, in: Proceedings of Asia and South
Pacific Design Automation Conference, ASP-DAC 04, 2004, pp. 456–458.
In this paper, area-efficient and high-throughput multi-mode [27] R. Ramanarayanan, S. Mathew, F. Sheikh, S. Srinivasan, A. Agarwal, S. Hsu,
architectures regarding the SHA-1 and SHA-2 families were H. Kaul, H. Anders, V.M.M. Erraguntla, R. Krishanurthy, 18 Gbps, 50 mW
proposed and implemented in several FPGA technologies. These reconfigurable multi-mode SHA Hashing accelerator in 45 nm CMOS, in:
Proceedings of the ESSCIRC, 2010, pp. 210–213.
architectures are able to realize more than one function, while [28] J. Docherty, A. Koelmans, A flexible hardware implementation of SHA-1 and
their frequency and throughput degradation (caused by merging SHA-2 hash functions, in: Proceedings of 2011 IEEE International Symposium
of separate designs) are kept significantly low. Compared to the on Circuits and Systems, 2011, pp. 1932–1935.
[29] Helion Tech., Fast hash core family for Xilinx FPGA. Data sheet available from:
corresponding architectures that were produced by a commercial 〈http://www.heliontech.com/hash.htm〉 (accessed December 2013).
synthesis tool (Xilinx ISE), the proposed ones are significantly [30] Helion Tech., Tiny hash core family for Xilinx FPGA. Data sheet available from:
more area-efficient and at the same time significantly better in 〈http://www.heliontech.com/hash.htm〉 (accessed December 2013).
[31] A.T. Hoang, K. Yamazaki, S. Oyanagi, Three-stage pipeline implementation for
terms of throughput/area. Additionally a systematic design flow
SHA2 using data forwarding, in: Proceedings of 2008 International Conference on
for producing multi-mode architectures of the above two families Field Programmable Logic and Applications, FPL 2008, 8–10 September 2008,
is introduced. Finally, the proposed multimode architectures out- pp. 29, 34.
perform the previously proposed ones significantly, in terms of [32] A.T. Hoang, K. Yamazaki, S. Oyanagi, Pipelining a multi-mode SHA-384/512
core with high area performance rate, IEICE Trans. Inf. Syst. E92.D (10) (2009)
throughput and throughput/area. 2034–2042.
References
Please cite this article as: H.E. Michail, et al., On the development of high-throughput and area-efficient multi-mode cryptographic hash
designs in FPGAs, INTEGRATION, the VLSI journal (2014), http://dx.doi.org/10.1016/j.vlsi.2014.02.004i