Sensors 22 05028 v2
Sensors 22 05028 v2
Article
A SHA-256 Hybrid-Redundancy Hardware Architecture for
Detecting and Correcting Errors
Ignacio Algredo-Badillo 1,† , Miguel Morales-Sandoval 2,† , Alejandro Medina-Santiago 1,† ,
Carlos Arturo Hernández-Gracidas 3,† , Mariana Lobato-Baez 4,† and Luis Alberto Morales-Rosales 5, *,†
1 Department of Computer Science, CONACYT-National Institute for Astrophysics, Optics and Electronics,
Puebla 72840, Mexico; [email protected] (I.A.-B.); [email protected] (A.M.-S.)
2 Center for Research and Advanced Studies of the IPN-CINVESTAV, Unidad Tamaulipas,
Ciudad Victoria 87130, Mexico; [email protected]
3 Facultad de Ciencias Físico Matemáticas, CONACYT-Benemérita Universidad Autónoma de Puebla,
Puebla 72570, Mexico; [email protected]
4 Instituto Tecnológico Superior de Libres, Libres, Puebla 73780, Mexico; [email protected]
5 Facultad de Ingeniería Civil, CONACYT-Universidad Michoacana de San Nicolás de Hidalgo,
Morelia 58030, Mexico
* Correspondence: [email protected]
† These authors contributed equally to this work.
may modify, replace, or damage data, causing accidents, financial losses, or different kinds
of problems. Besides, faults are inherent in some hardware architectures, or they can be
generated by their environment, being classified as transient or permanent. On the one
hand, transient faults are considered as single-event upsets(SEU) when there is a change
of logic state or multiple transient faults are triggered by several SEU. On the other hand,
permanent faults cause defects in system behavior and produce halts or wrong results [1].
Nowadays, security services for integrity, confidentiality, and authentication must
consider faults, which are more critical if cryptographic algorithms are taken into ac-
count, due to their behavior focused on several iterations and processing all message data.
In this case, the type of cryptographic algorithms known as hash functions (because they
generate a hash from some message constituted by bits, bytes, words, images, videos,
text pages, etc.) present high diffusion of all the bits in processing time. The desirable
characteristics of these algorithms can be greatly affected by faults, since changing a single
bit will produce a totally different hash.
Moreover, providing an autonomous-decision mechanism to determine if the data
integrity calculation has been appropriately carried out, to assure system consistency, is a
key design element of digital communication systems [2].
In this sense, faults and errors caused by hardware noise or data transmission al-
terations, generated either by the channel quality or by external attacks, should be de-
tected quickly and efficiently for a fault-detection module to be considered suitable for
real-time applications. As an important remark, when the input and output data differ,
it means an error has occurred. For instance, in message-passing communications, the
presence of noise can affect the channel during transmission of bits, which can be contami-
nated by errors. Besides, errors can appear from a miscalculation in the microprocessors or
adverse scenarios such as attackers or physical damage. An example of this is when a bit
with value 0 is changed to 1, or vice versa, in the payload.
Secure Hash Algorithm (SHA) functions offer integrity, although they can be employed
to provide authentication. Besides, the algorithm does not include verification mechanisms
for its inside operations and future fails at the system level.
Noise and possible failures in the system are critical within the design of hardware
implementations [3–5]. For instance, several works have demonstrated that noise and
interference affect the operation of FPGAs, such as it happens in the presence of magnetic
noise in the white noise region [6]. When the voltage supply is unstable, this can nega-
tively affect the immunity of the I/O buffer by about 16.8 dB, according to Toan et al. [7].
Therefore, we considered developing hardware modules capable of assuring integrity,
security and data quality with a system-level fault detection mechanism.
Traditional electronic systems designed for fault tolerance are based on redundancy [8,9].
Applications where failure is critical use redundant pieces of hardware and software.
In such systems, time redundancy, information redundancy, and hardware redundancy can
also be considered.
Best practices in data integrity consider validation to prevent the entrance of invalid
data, perform error detection in transmissions, and execute security measures against data
modifications or for loss prevention [8]. SHA-2 is a broad family of solutions to assure data
integrity, and the standard [9] defines several configurations for these cryptographic hash
functions, generating hashes with 256, 384, and 512 bit sizes. The SHA-256 algorithm is
widely used to balance security and fast implementation for contexts that require a real-time
response based on hardware. Current hardware implementations of SHA-2 are focused
on satisfying those characteristics, but there are issues regarding real-time balance among
processing, efficiency, and cost, so diverse researches and analysis should be pursued.
Cryptographic algorithms can be used for different security related tasks such as
confidentiality, authentication, integrity, and non-repudiation. Among the most important
algorithms, we have hash functions used in the blockchain, digital signature, digital
certificates, and other applications. SHA-2, designed by the National Security Agency
(NSA), is a set of cryptographic hash functions, which includes important changes with
Sensors 2022, 22, 5028 3 of 23
respect to its predecessor (SHA-1). It consists of six hash functions and digests of 224,
256, 384, or 512 bits. While SHA-256 is a hash function with 32-bit words, SHA-512
processes 64-bit words instead. The two variations have identical structures and differ
only in the amounts of displacement, additive constants, and rounds. On the other hand,
we can consider SHA-224 and SHA-384 as truncated versions of SHA-256 and SHA-512,
respectively, which are calculated using different initial values. Other truncated versions of
SHA-512, also with different initial values, are SHA-512/224 and SHA-512/256.
In this work, we identify an open problem: How to ensure a fast and reliable fault
detection architecture even in adverse situations, which can be integrated into several
environments without representing an overload to the system? To answer this question,
first, we should be able to solve two main issues: (i) How to efficiently determine whether
an error occurs when data integrity is calculated?; and (ii) how to prevent the diffusion of
an error caused by a miscalculation of data integrity?
Traditionally, state-of-the-art techniques intended to answer these questions base their
approach on redundancy by including only one of the following main elements: hardware,
time, or information [10,11]. Hence, in hardware redundancy, the physical ciphering
modules (processors, memories, multipliers, etc.) are replicated, generating several hashes
at the same time. Time-redundancy architectures execute the same operation for providing
multiple copies of the hashes at different times. Finally, information redundancy adds
check data to verify hash correctness before using it and, in some cases, even allowing the
correction of some erroneous hash [12–14]. Each architecture implementing some kind
of redundancy presents advantages and disadvantages concerning power consumption,
amount of hardware resources, and performance (throughput and efficiency). Hence, to
provide a fault-detection architecture capable of being used in several environments, it is
essential to balance system overload in terms of energy consumption, processing delays,
and performance, among other parameters.
In this paper, we propose an innovative hybrid-redundancy hardware architecture for
the cryptographic algorithm SHA-2. This architecture is designed to improve error and
fault detection. In the following, we list our main contributions:
• To the best of our knowledge, pipelining has not been previously used to perform
different hash calculations with a redundancy topic. Pipelining is traditionally used
to process several instructions or operations at once, and its use in redundancy and
cryptography is a relevant part of the innovation of this work. Therefore, we present
a hybrid architecture, which presents a 3-stage pipeline structure, implementing
hardware and time redundancy. We analyzed hardware resources and performance in
the distribution of the processed elements within each stage and the balance of critical
path. Besides, we increased CPU performance at a certain clock speed, defining a
transformation in data flow in several sequential phases where the input of each stage
is the output of the previous one;
• From a pipeline design perspective, data transmission is improved by eliminating
related risks and conflicts, using registers in each stage for processing the hash func-
tions, which perform multiple passes to the data to obtain their final hash. By adding an
error verification module, based on a voting scheme, we were able to determine whether
an error occurred or, otherwise, the output data were correct. Designing an architecture
with a pipeline scheme and redundancy processes is not straightforward since several
design problems, related to data dependencies, must be solved along the way;
• Another contribution is the creation, analysis and demonstration of both fault tol-
erant and non tolerant architectures, which lays the ground for future research and
studies towards understanding the advantages and disadvantages they present, also
allowing comparison with other works. The selection of the architecture, along with
its design and implementation, were evaluated and analyzed using various FPGA
technologies (Spartan-7, Artix-7, Kintex-7, and Virtex-7). The Virtex-7 implementation
was the one that achieved the best results of all the alternatives, with a throughput
of 441.72 Mbps and 2255 LUTs, presenting an efficiency of 195.8 Kbps/LUT. It is true
Sensors 2022, 22, 5028 4 of 23
that the examined technologies are produced by the same company; however, it must
be emphasized that each of them is different from the rest, as mentioned in [15].
Hence, regardless of whether they are from the same family or not, their logic blocks
can be quite different in their structure.
The content of this paper is divided as follows. Section 2 presents a background
about the SHA-2 hash function, fault-detection schemes and pipeline. In Section 3, several
related works are presented and explained. Next, Section 4 presents the proposed hardware
architecture for fault-detection. We present the results and their comparison with other
techniques in Section 5. The discussion of the results is presented in Section 6. Finally, the
main conclusions drawn from the results of this paper are presented in Section 7.
2. Background
This section presents some definitions about the SHA-2 algorithm and error detection/
correction. The description of the fault-detection context for applications such as cyber-
physical systems, vehicular communication networks, and Industry 4.0, to assure higher
performance, is presented as well. Besides, we describe FPGA optimization techniques for
hardware design.
2.2. Fault-Detection
Fault tolerance can be understood as the capability of a system to perform regularly,
even in the appearance of errors. For example, a crash in the resources or the presence of
faults [16]. A fault-detection mechanism, in general, should have the following features:
• A reliable integrity calculation process, based on hash assessment, capable to deter-
mine whether the information is faulty (no matter the reason) or reliable (i.e., if the
computed hash is valid). The reliability of the result of the fault-detection mechanism
is basic for decision making;
• A light computation process that avoids excessive energy consumption. This is
important, especially when the implementation is made on a device with limited
resources, such as a wireless sensor or a mobile phone;
• A fast assessment of data validity, which makes the mechanism applicable in real-life
situations, where a fast response to errors is mandatory. An in-time error-detection
mechanism allows taking fast actions like re-fetching the corrupted data or flagging
the possibility of a compromised communication channel.
3. Related Work
Several works have been proposed to prevent and detect transient errors or faults
for applications at the hardware level. In this sense, implementations of hash-function
have been used to provide integrity and reliability over the data transmitted and generated
Sensors 2022, 22, 5028 5 of 23
for communications. Nevertheless, given the way SHA algorithms work (which consists of
a high-diffusion iterative structure), the presence of small errors in the hash calculation will
result in several ones in the final hash [1]. In the following, some outstanding works that
tackle this problem are presented.
Ahmad and Das proposed, in [1], an error detection approach using parity codes and
hardware redundancy. Their performance metrics indicated that their limitations were
hardware overhead and short delay overhead. They did not introduce a pipeline method
to optimize the architecture. Besides, their scheme did not carry out the correction of the
detected error.
Bahramali et al. proposed, in [17], a fault diagnosis method for SHA-1 and SHA-512,
focused on detecting transient and permanent faults. The proposal used time redundancy
and a pipelining method. Their work included, in the critical path of the SHA hash
functions, the addition operation, then they used a subtraction as the inverse function.
This process was used for redundancy, detecting both permanent and temporary faults.
They observed that traditional time redundancy approaches do not have this capability
because they only detect transient faults.
A Totally Self-Checking (TSC) design was introduced for the SHA-256 hash func-
tion by Michail et al., in [18], suitable for harsh environments, focused on detecting er-
roneous bits (odd and even numbers), if they are appropriately propagated. They ob-
served that ASIC technologies are more adequate than reconfigurable devices, such as
FPGAs, for TSC. Besides, in [19], they presented a comparison between two hash functions,
SHA-1 and SHA-256. In this case, they analytically described the development process for
each component of TSC as compared to the Duplicated-with-Checking (DWC) alternative.
Their conclusion was that, in comparison with DWC architectures, TSC cores have more
benefits in terms of area, throughput/area, and power consumption.
Kahri et al., proposed, in [20], an FPGA fault detection scheme to protect the SHA
algorithm against attacks based on hybrid (hardware and time) redundancy. The trade-off
between security and cost of implementation is an advantage of this approach.
Algredo-Badillo et al. [21] proposed a pipeline scheme for the AES symmetric cryp-
tographic algorithm, unlike the architecture proposed in this paper, which is focused on
the SHA-2 cryptographic hash function. Additionally, while the AES architecture uses a
structure with 5-pipeline stages, our architecture uses 3-pipeline stages.
algorithm, where the key idea is to compute three times the message digest. Additional clock
cycles are necessary for initialization, data storing and output generation.
selS1
32 WOUT 256
WT
32 NBUFF
Message Schedule
NewBuffer H0F
256
32
BUFF 192
256 64
256
CH, MAJ,
TA
BUFF BG1, BG0 H0F BUFF
CH, MAJ, BG1, BG0 32
Adders
256 Functions 128 TE
32
KT Adders
32
ADDKT ADDKT
Memory64x32
6 enQ enQ 256
THASH
selH0 selH0 Message
256 256 Digest
Figure 1. Iterative hardware architecture diagram without fault detection or pipeline structure.
3-STAGE HW ARCHITECTURE
32-bit Word
32
Message Digest
Processing
selH0
Module 256
Error
enQ
Hashing
ADDKT
Control 6
RST
Module Control bus
Ready
3 Ready
Reg
On the one hand, the processing module takes 32-bit words from the given 512-bit
message block and executes several computations at the same time throughout the pipeline
structure. At the end of the 196 or 195 clock cycles (depending on the case), this module has
three registers with the hashes of the three pipelines, which are compared for determining
whether an error occurred or not. The proposed hardware architecture takes into account
three cases: (a) if the three hashes are different, then an error is produced, the error signal
is set, the message digest is incorrect, and it must be ignored; (b) if two hashes are equal,
then a fault is detected, but the architecture corrects and generates the exact output, the
error signal is reset (indicating there is no error) and the message digest is correct; and (c) if
the three hashes are equal, then there is no error, the error signal is reset, and the message
digest is correct. In summary, if an error is detected, this can be solved or not; if this is
not solved, then the 256-bit output hash of this module must be ignored; otherwise, the
output hash is correct. On the other hand, the Control Module enables the dataflow of the
first module. In the next subsections, more details are provided for both modules.
PROCESSING MODULE
32
WT WT 32
32 512
32 32 32
32 Reg SmallSigma0 Adders 1S-HASH
3
256 2S-HASH
480
32 32 3S-HASH
SmallSigma1 en
32 H0F
NBUFF
256 NewBuffer
256
BUFF BUFF
256
192
64
256
MAJ 256
MAJ
TE THASH
MAJ_Function
96 32 32
32
BG0 BG0
BigSigma0
32 32
32
KT
32
ADDKT ADDKT ADDKT
6 Memory64x32
6 6 Message
enQ 3
enQ Voting 256 Digest
enQ 256
Internal Module
selH0 selH0 selH0 en
Control
Logic
Error
256
H0F
Control Bus
3
Figure 3. Block diagram of the processing module for the hybrid–redundancy hardware architecture.
The general process for a message block can be explained as follows (see Figure 4):
• In the first clock cycle, the block is fed to the processing module and the first stage,
Functions&Adders1 , is computed (six functions and the sums);
• In the second clock cycle, the output of the Functions&Adders1 stage is input to the
Adders2 stage and the first stage is fed again with the same block (replicated block);
• In the third clock cycle, the output of the Adders2 stage is input to the Adders3&VM
stage, the output of the Functions&Adders1 stage goes to the Adders2 stage, and the
first stage is fed finally with the replicated block;
• From the fourth cycle to the 194-th cycle, the three repetitions of the same message
block will be moving from the first to the second stage, from the second to the third
stage and from the third to the first stage. This is focused on calculating the 64 rounds
of the algorithm for each replication of the message block;
• Finally, in the 195-th clock cycle, the three hashes are obtained and the outputs can be
generated by comparing, evaluating and selecting the registers.
The previous process is executed for each message block for which its hash value must
be obtained. The replications of a second block (or remaining blocks) of the message can be
entered, maintaining the pipeline as busy and improving performance and efficiency with
a latency of only 192 clock cycles, which represents the best case for metrics and results.
Next, each stage is described to explain the inputs, outputs and detailed behavior of
the architecture, where each stage executes different tasks (see [9]):
In this stage, for message scheduling (at the top left of Figure 3), a register is used for
storing 512 bits (sixteen 32-bit words) during sixteen clock cycles, for the first sixteen rounds.
Later, for the remaining forty-eight rounds, it is necessary to compute new 32-bit words
and functions (SmallSigma0 σ0256 and SmallSigma1 σ1256 ) and adders are implemented.
Sensors 2022, 22, 5028 8 of 23
In this same stage, several temporary variables are computed (CH, Maj, SG0, and
SG1), which are used in the next stage for computing the new state buffer (NBUFF).
The input for these functions is the initial hash or intermediate hash from the previous round.
selH0
selMux
Concatenation
3
enSB1
enSB2
enSB3
enQ
Reg Reg
VOTING MODULE
1S-MD 256
finalMD
2S-MD 256
selFH
Comparator compA
H0R 256 (equality)
256
256 1S-MD
Reg
en
Initial State
Buffer compB
Comparator
(constants) 256 (equality)
256 2S-MD
Reg
en
Concatenation 256
The Voting Module generates: (1) the H0F bus for intermediate message digest or initial
message digest; (2) the final Message Digest bus for the digest of the complete message;
and (3) the Error signal for providing the flag informing whether the final message digest is
correct or not. This flag is generated accordingly to compare pairs of message digests from
each stage (see Table 1). This shows that the compA signal compares 1S-MD and 2S-MD
(message digests from the first stage and the second stage, respectively), and it is high if
they are equal and low otherwise. compB compares 1S-MD against 3S-MD and compC
compares 2S-MD against 3S-MD.
Table 1. Resulting signals of the comparators for each pair of stage message digests.
The truth table for generating Error and selMUx signals and selMUx are shown in
Table 2, where the first signal is low when two or three stage message digests are equal, and
the second one selects 1S-MD or 2S-MD, which is the output bus for the Message Digest
from the proposed hybrid architecture. 3S-MD is ignored because the possible outputs of
the message digest indicate that selecting 1S-MD or 2S-MD is sufficient for generating the
correct output bus. This analysis shows that logic is simplified due to the selection of only
two buses (multiplexer and control logic are simpler).
compA compB compC Error Posible Output of the Message Digest selMux Message Digest
0 0 0 1 X X Don’t care
0 0 1 0 2S-MD or 3S-MD 1 2S-MD
0 1 0 0 1S-MD or 3S-MD 0 1S-MD
0 1 1 1 X X Don’t care
1 0 0 0 1S-MD or 2S-MD 0 1S-MD
1 0 1 1 X X Don’t care
1 1 0 1 X X Don’t care
1 1 1 0 1S-MD, 2S-MD or 3S-MD 0 1S-MD
The architecture proposed in this work provides some level of inner fault-tolerance.
It means that if two hashes are equal and the other one is different, then a fault has oc-
curred; in this case, the architecture is able to catch this fault and correct it; hence, the
Error output signal is reset and Message Digest has one validated hash. This situation
happens three times: cases “001”, “010”, and “100”, when one comparator is high.
Sensors 2022, 22, 5028 10 of 23
If the three hashes are different, case “000” happens, then a fault is discovered, the ar-
chitecture cannot recover, Error output signal is set, and Message Digest must be ignored.
Moreover, there are three states that cannot be reached: cases “011”, “101”, and “110”,
because they indicate contradictions; for example, case “011” means that 1S-MD and 2S-MD
are not equal, but the pair 1S-MD and 3S-MD is equal, in addition to the pair 2S-MD and
3S-MD, which cannot be true. In these situations, the Error signal is set, and the output
Message Digest does not care (X is Don’t care condition). Finally, in case the three hashes
are equal, case “111”, the fault is not discovered, then the Error signal is reset, and the
Message Digest is correctly computed.
CONTROL MODULE
Count<"001110"
Hashing selS1=0 selS1=0 selS1=0
Count<"111111"
rstC=1
Ready=1
Hashing
Full Full ==1
More More More
Message2 Message1 END R64C R64B R64A R17to63C R17to63B R17to63A
Blocks3 Blocks2 Blocks1
Count
==
selS1=0 selS1=0 Ready=1 selR=1
"111111"
selH0=1 selH0=1 selH0=1 rstC=1
Control enR=1
Bus Hashing==0
Control Bus = selR | selFH | enR DEFAULT
3
VALUES
ADDKT
ADDKT=Count selFH=0
6 Count
selH0=0
enQ rstC Counter&Flag selS1=1
enQ selR=0
(0 to 63)
rstC=0
selH0
enR
Ready=1
Ready
The logic of the Control Module is the preloading of the data (configuration values)
and the loading of the input message (512-bit blocks), tasks executed by the states from
Pre − loadA to LoadC. After the first round is executed (states R1A, ..., R1C), rounds from
2 to 16 (states R2to16A, ..., R2to16C), rounds from 17 to 63 (states R17to63A, ..., R17to63C),
and the final round (states R64A, ..., R64C). Next, two situations can happen: a) there are
more 512-bit blocks from the same message, hence the process starts by using an inter-
mediate hash (states MoreBlocks1, ..., MoreBlocks3), or b) the complete message has been
processed and there are no more 512-bit blocks (states Full Message1 and Full Message2),
thus the automaton is initialized and the new state is Begin.
The Control Module manages: (a) the memory that stores the round constants through
ADDKT addresses; (b) the multiplexer, which selects the initial hash or the intermediate
hash through selH0 and enQ signals; (c) the multiplexer and registers for selecting and
saving the three computed hashes through selFH, H0R, and enR signals; (d) the counter
for the 64 rounds defined by the algorithm through the rstC signal; and (e) the multiplexer
for obtaining external or internal 16-bit words through the selS1 signal.
brid architecture with error correction. Once the iterative architecture presents optimal
values (which occurs when there is a trade-off between a minimum number of clock
cycles/hardware resources and high throughput) the critical paths and long paths are
analyzed to decide where to place the registers of the pipelined stages. This process is not
obvious or straightforward since the algorithm has an iterative process at the data level
(data are operated in different branches). At the block level, each block is processed during
64 rounds as defined by the standard. At the message level, the full message is split into
blocks, where each block result is chained to generate a fixed-length hash, regardless of the
size of the original message. Additionally, the architecture is designed and modified to im-
plement two redundancy schemes: time and hardware (hence the term hybrid architecture)
using an error-correction scheme. Then, several designs are required to find the optimum,
focused on the same type of trade-off analysis of the iterative design.
Different architectures were designed to achieve the optimal trade-off, which is decided
until each of them was implemented in VHDL and evaluated on FPGA devices, as shown
in Figure 8.
Figure 8. Methodology for developing and evaluating the hybrid hardware architecture.
P1 (t) = p
Z Z (1)
W1 (t) = P1 (t)dt = pdt = pt + c1 .
At this point, power consumption is constant for an instant and over time, while
energy consumption is variable (pt + c1 ) according to time. B requires a latency l to ob-
tain a correct hash value, which corresponds to a message block. The replicas of this
architecture B are labeled as B1 , B2 , B3 , . . ., etc. Additionally, the SHA-256 hybrid redun-
dancy hardware architecture (see Figure 3) is labeled as B(h) and its replicas are labeled
Sensors 2022, 22, 5028 12 of 23
This proposal, as can be seen, sacrifices energy and power consumption because all N
Bi modules are present; hence power consumption and energy are high and depend on N,
as can be verified in Equation (2).
Phw (t) = N p
Z Z (2)
Whw (t) = Phw (t)dt = N pdt = N pt + chw .
Case (B) Architecture for time redundancy. As opposed to the previous case, this
proposal processes a single data block, using a single module B, during N time (included
and represented by the time redundancy, named N). This is required in module (B1 ) from
iteration 1 to iteration N (the latency needed to process each data block is represented
by each time segment), which can be verified in Figure 10. This proposal requires fewer
hardware resources; however, throughput decreases as a consequence.
The advantage is that a module or instance does not consume a high power p, although
the required energy is high as time goes by, since it requires more usage time, depending
on latency and the number of redundant computations (this is modeled in Equation (3)).
This situation represents a drawback, because throughput is decreased.
Sensors 2022, 22, 5028 13 of 23
Pti (t) = p
Z Z (3)
Wti (t) = Pti (t)dt = pdt = pt + cti .
Figure 10. Time redundancy requires one module (B1 ) to be sequentially executed in N clock cycles.
Case (C) Architecture for hybrid redundancy.The proposed hybrid architecture defines
(h)
an instance B1 that operates several times with the same data block (for example, N times),
reporting a greater latency l (h) when compared against l. Unlike case A, where N blocks
(h)
are used, this architecture requires only one instance (B1 ) to be used during a given time;
(h)
and as opposed to case B, where a block is used several times, block B1 is occupied just
once in this architecture (with its corresponding latency). The corresponding latency is
proportional to N, which can be verified in Figure 11.
( h)
Figure 11. Hybrid redundancy in our proposal needs one block B1 , requiring l (h) clock cycles (latency).
In this case, there is a power consumption p required by the device. Energy consumption
can be inferred from the time a data block takes to be processed.
Phy (t) = p
Z Z (4)
Why (t) = Phy (t)dt = pdt = pt + chy .
Many factors affect energy and power consumption, such as resources for placing,
routing, temperature, designs, devices, etc. In a constant scheme, if those factors do
not affect, consumption is very similar to case B. Nonetheless, this alternative presents a
number of advantages, which are that because the internal processes are shorter, the data
travels shorter paths (it is easy to verify this behavior by observing the critical path and the
minimum clock period). To summarize, this architecture, Bh , has an energy consumption
similar to case B but with shorter critical paths. On the other hand, case A generally
shows minimum energy consumption and poor performance; however, passive and active
hardware resources require the same energy, regardless of whether the instances or blocks
have a standby process. It must be noted that, if they are put in a standby state, then
there is no need to have several instances of the block; hence, there would be a noticeable
performance gain for most of the applications. It is important to highlight that an unrolled
architecture can improve the results of this proposal, although this must be further analyzed.
Sensors 2022, 22, 5028 14 of 23
Data_block_size
Throughput = ; (5)
(Clock_time)(Clock_cycles)
Throughput
E f f iciency = . (6)
Number_o f _Slices
Next, two types of analysis of results are presented: (1) the comparison between the
simple SHA-256 architecture and the hybrid SHA-256 architecture with fault detection and
(2) the comparison with related works.
The iterative structure in the simple SHA-256 architecture (see Figure 1) processes
a 512-bit message block during 64 clock cycles. This architecture does not detect faults
and is a basic element for the design and development of the fault detection SHA-256
hybrid-redundancy structure, which has a 3-stage pipeline architecture.
For the comparison between the two SHA-256 architectures, i.e., the simple and
the hybrid one with fault detection, implementation results appear in Table 3 for the
former, and Table 4 for the latter. These tables show the hardware resources required for
the corresponding implementation (LUT, LUTRAM, FF, IOB, and BUFG), along with the
physical resources needed in terms of clock period (the tool itself provides them all); used
data size and latency, both of them characteristics of the design, are also shown. Finally,
the performance measures, throughput and efficiency, shown in Equations (5) and (6),
respectively, are presented as well. The critical path time corresponds to the clock period,
Sensors 2022, 22, 5028 15 of 23
i.e, the minimum required clock period used to determine the maximum clock frequency,
which is, by definition, the inverse of the minimum clock period; this establishes the
throughput and efficiency values of the diverse implementations.
There are two points worth mentioning, according to design and implementation.
On the one hand, the proposed designs reflected three constant parameters: (1) latency is
195 clock cycles for tolerant architecture and 64 clock cycles for non-tolerant architecture;
(2) data size is 512 bits for both architectures; and (3) tolerant architecture has 293 in-
put/output ports (I/Os) and non-tolerant architecture has 292 I/Os. On the other hand, the
implementations of those architectures show two constant parameters: (1) the number of FF
is 2516 for tolerant architecture and 753 for non-tolerant architecture; (2) 64 LUTRAMs are
used in both architectures; and (3) tolerant architecture requires 2255 LUTs and non-tolerant
architecture needs between 1338 and 1350. The results of the implementation depend on
the manufacturer’s FPGA architecture and the tool (and algorithms) for synthesis, place
and route.
Table 3. Implementation results for the simple SHA-256 hardware architecture with iterative structure.
Table 4. Requirements on FPGA for the SHA-256 hybrid-redundancy hardware architecture for
detecting faults using a 3-stage pipeline structure.
The hybrid architecture occupies approximately 68% more LUTs than the iterative
architecture. It also uses 2.34 times more FFs (they are required for the registers of each
pipeline stage and other extra modules needed for error correction). Both architectures
use the same amount of LUTRAM because they store 64 round constants, and the hybrid
architecture requires one additional IOB (for the Error signal) and one more BUFG.
The design of the hybrid architecture tries to balance the paths of each stage of the
pipeline structure, and several modules are added to the simple architecture, where the
critical path time is shorter except for the implementation in Spartan-7. Even though
both architectures process 512-bit message blocks, they differ in latency since the simple
architecture requires 65 clock cycles, while the hybrid one uses 195.
Table 4 presents unreal throughput, which is computed based on the 1536-bit size
of the input data (three 512-bit message blocks), although the 512-bit bus is the real size.
This throughput would apply if the hybrid architecture were used in a 3-stage pipeline
Sensors 2022, 22, 5028 16 of 23
structure with three communication lines (three 512-bit message blocks, i.e., an input
data size of 1536 bits), but in the proposed situation, the hybrid proposal is utilized for
detecting and correcting faults, processing a 512-bit message block using a 3-stage pipeline
architecture. The iterative architecture performance is very similar to the unreal throughput
and efficiency of the hybrid proposal, which is achieved for the design of the simple
architecture (see Figure 1), requiring fewer modules and presenting a balanced critical path.
Considering a real situation, the throughput of the simple architecture is almost three
times higher than that of the hybrid architecture, and this also occurs for both efficiencies
(Mbps/LUT and Mbps/FF).
The results of the comparison with related works are shown in Table 5. We are
conscious that, since different FPGA technologies are used, this comparison cannot be
considered to be fair. These results, however, provide us useful insights on the design
techniques as well as the implementation results. All the works reported architectures on
Virtex technology, excepting [1], which uses Altera technology. The presented implementa-
tions are SHA-1, SHA-256, and SHA-512 functions. Most works present two architectures
(with and without fault detection). Disregarding [20], the architectures with fault detection
implement hybrid redundancy (each work uses two types of redundancy), where hardware
redundancy is a common technique. In our case, these architectures are named simple
(without fault detection) and hybrid (with fault detection).
algorithm as in our proposal, even though they used Virtex-5 technology. They required
more than 4 times the hardware resources and their implementation operated at lower
frequency, although they reported the highest throughput. While similar throughput would
be expected, this was not the case, and details about latency were not provided, hence no
comparisons could be made on this point.
Figure 12. Modifications made to the hardware architecture in FPGA, only to evaluate, in a synthetic
context, by injecting errors.
Using the proposed hardware architecture without modifications and the three mod-
ified architectures, the vectors were used to execute four different runs, i.e., 20,000 tests
were evaluated. The hybrid architecture without modifications (this one was implemented
in an FPGA) was tested in the first epoch; for this purpose, one complete epoch of the
dataset was applied and analyzed by the test bench (in this case, the 32-bit sub-vector
was not utilized). The three remaining epochs were applied to the three variations of the
hybrid architecture; in these cases, 15,000 communication faults were simulated using the
32-bit sub-vectors. These results showed that all faults (i.e., 100% of them) were detected
using the hybrid architecture.
The four runs were essential for analyzing error injection since cryptographic algo-
rithms have high diffusion. It means that a slight modification (from 0 to 1 or vice versa) in
a single bit can generate a different value in the output bus (most bits change their value).
Therefore, two cases of error injection were analyzed: stationary and permanent cases.
For the stationary case, the results of this analysis showed us that in case one or two regis-
ters were altered, it was possible for the architecture to recover 100% (the majority prevailed).
Sensors 2022, 22, 5028 18 of 23
However, in case four registers were altered, the architecture was not able to recover (this
happened because there were different contents in most of the registers). Hence, the output
signals “ready but with an error” were sent, which meant precisely the system did not recover;
in such a case, the result must be recalculated or simply cannot be trusted.
For the permanent case, the same logical operators were used for generating perma-
nent errors, which were maintained. Here, the test vectors were changed as in real contexts.
In this case, there were situations where output signals were used to emphasize it was not
possible for the architecture to recover from the error, and that, although the output was
ready, it was not correct, which also meant it had to be recalculated. In three runs, 500 test
vectors were used, and the architecture recovered whenever three or more registers, at
the end of the pipeline, matched (majority). Persistent failures occurred when there was
a signal introducing errors (this changed when the architecture was able to generate the
required majority since this was the only way it could recover from the error).
The results of these evaluations showed that both stationary and permanent errors
were determined in 100% of the cases, thanks to the hybrid architecture, although the archi-
tecture may or may not recover, depending on the similarity of the registers. Because the
modifications derived from the injection depended on the high cryptographic diffusion as
well as on which bit was corrupted (since this, at the same time, depended on whether it
is affected by the logical operation or not), it did not always happen that it resulted in a
modification of the output.
In the temporary injection experiments, the test vectors with error were applied in one
or two clock cycles, whereas in the permanent injections they were applied using three or
more clock cycles. As a consequence, in the former case, these vectors altered not more than
two stage registers in the pipeline structure; this change depended on the high diffusion,
whether the local operation modified the bits, and whether the injection affected the stored
hashes at the end of the pipeline (see Table 6).
Type of Error Number of Test Vectors Epochs Detection Ratio Correction Ratio
Without error injection 5000 1 100% N.A.
Stationary (fixed values applied in 1 or 2 clk cycles) 5000 3 100% 100%
Permanent (random values applied in ≥3 clk cycles) 500 3 100% 34.2%
Fault detection reached 100%, but recovery was about 34.2%. The correction ratio
increased, provided that fault was injected in just three clock cycles in different modules
of each pipeline stage. Whenever the injection increased, the correction ratio decreased
in contrast. The novel approach proposed in this work allowed us to obtain these results,
also providing a more reliable strategy for detecting and correcting faults.
6. Discussion
We emphasize that the proposed architecture maintains the complete calculation of
the SHA-2 algorithm, without modification, but the hardware resources of the architecture
are used iteratively with feedback, maintaining the standardized security levels established
by different state-of-the-art analysis. The architecture proposed in this manuscript is not
supposed to be lightweight since it consists of a completely unrolled round, which is used
64 times in the case of the non-tolerant architecture (64-cycle latency) and 192 times in the
case of the tolerant architecture (included in the 195-cycle latency). The first architecture
is iterative and is used to obtain the hash in a common way, while the latter computes
three hashes, by using the same modules and checking whether they are correct or an error
occurred; in the latter case, it checks if it is possible to recover from the error (by means of
majority vote), otherwise, the Error output signal is set.
Figure 13 shows the critical paths in each stage, which implies it is not possible to
have large clock frequency values, because of the existing large paths. Red lines indicate a
path where hardware resources appear for both place (hardware for algorithm logical and
Sensors 2022, 22, 5028 19 of 23
arithmetic calculations) and route (routing hardware) processes. Large paths can be reduced
if more stages are used, which would need more pipeline registers and more proportional
registers to store final hashes, depending on the number of stages; also, the voting module
would increase complexity. For example, [21] describes a tolerant architecture for the
symmetric AES-128 algorithm, where it is shown how complex the voting module becomes
when a 5-stage pipeline is used, also showing there are options to reduce this combinatorial
complexity. Increasing the number of pipeline stages is not the intention of this work, and
this can be part of the future work, where exploring the relationship between stages and
performance results can be an important part of the analysis; the intention of this paper is
to explore, lay the ground, and innovate the pipeline technique in a redundancy process
in the SHA-256 algorithm, which, to the best of our knowledge, has not been used in
hash functions. Most pipeline systems achieve higher frequencies because all the dataflow
allows dividing into a large number of stages, improving high throughput and frequencies;
however, when prediction problems, jump/no-jump situations, halting, control risks, data
risks etc. appear, a lot of the power of pipelining is lost. In cryptographic cases, such as that
of SHA-256, there is a higher data dependency [23], which leads to zero advantages from
using pipelining; in this situation, and in our case, it is not possible to apply pipelining to
increase throughput, because high data dependency returns the system to the worst case
scenario of pipelining because of data halt (i.e., the ciphering of a block must be finished
before starting ciphering the next one; in other words, 64 rounds must be completely
applied to cipher a block before starting the 64 rounds needed to cipher the next one).
Figure 13. Critical paths in the architecture for each pipeline stage. Red lines indicate a path where
hardware resources appear for both place and route processes.
The maximum frequency for some Virtex-7 devices is 700 MHz; however, a design
will hardly reach that frequency since we should consider that the designs for some appli-
cations must use different configurations and number of gates with intermediate registers.
The truth is that FPGA designs have different complexity, the algorithms that can be im-
plemented are very diverse, and the architecture implementations reduce the maximum
frequency range of the device [24]. Cryptographic solutions commonly represent a bottle-
neck; therefore, the need for research such as the one presented here, where the analysis
shows the scope of our proposed architecture, which can operate according to its maximum
frequency, performance, hardware requirements, and efficiency.
Traditionally, a pipeline structure has many stages, which means that high frequencies
and high processing capabilities are achieved. Pipelining can hardly be used in cryp-
Sensors 2022, 22, 5028 20 of 23
7. Conclusions
In this work, we present a novel hybrid architecture for SHA-256 implemented on a
3-stage pipeline structure that provides fault tolerance with iterative and feedback processes.
Pipelining is traditionally used to process several lines of communication or several in-
structions at the same time, but for cryptographic algorithms it cannot be applied in
a general way since there are three problems: (1) there is a high dependency on data;
(2) there are iterative processes requiring several rounds; and (3) there is a need to encrypt
one block to encrypt the next one (hash functions, CBC mode, blockchain, etc.), resulting in
the pipeline not giving a clear advantage. In our proposal, we used pipelining to encrypt
a block, solving the three previous problems within the same communication line and
waiting to finish encrypting a block to start the next one, focusing on redundancy tasks.
Besides, we analyzed hardware resources and performance to balance the critical path. It is
important to highlight that fault tolerance increases the requirement of hardware resources
for adding pipeline registers, and latency is higher because several stages, at different
iterations, must be executed, reducing throughput and efficiency due to such an increment.
Our architecture reported a throughput of 441.72 Mbps and 2255 LUTs, and presented an
efficiency of 195.8 Kbps/LUT.
8. Future Work
As a future work, we suggest exploring a lightweight architecture, where the round
is compacted into a Von Neumann architecture. One example of this is the architecture
shown in [26], where latency is considerably increased and performance is reduced, but the
use of hardware resources is improved, also reducing energy consumption. In that sug-
gested future work, redundancy should be explored along with pipelining and lightweight
architectures, to carry out research concerning the different design and implementation
parameters, as well as to explain the behavior due to the combination of these techniques.
Besides, we will focus on reducing power-draining by considering the idea developed
by Pal and Mukherjee in [27]. They proposed a logic that when there was no fault in
the system, instead of activating all of the modules, they only activated two of them.
When any mismatch was found between these two modules, they activated the spare
modules; their aim was to save overall power consumption and maintain system reliability.
We will also evaluate the idea proposed by Wang and Liu [28] to improve the voting module.
Sensors 2022, 22, 5028 21 of 23
They recognized the cost of hardware redundancy in their work, so they proposed a scheme
where less hardware was required. They used an algorithm to calculate the weight of the
sensor signal based on the difference between the sensor signal and the analytical R-value,
so that the voting value is closer to the actual value.
Author Contributions: Conceptualization, I.A.-B. and L.A.M.-R.; Data curation, A.M.-S.; Formal
analysis, I.A.-B., M.M.-S., A.M.-S., C.A.H.-G., M.L.-B. and L.A.M.-R.; Funding acquisition, I.A.-B.,
A.M.-S., C.A.H.-G. and L.A.M.-R.; Investigation, M.M.-S., M.L.-B. and L.A.M.-R.; Methodology, I.A.-B.
and L.A.M.-R.; Software, I.A.-B. and M.M.-S.; Supervision, I.A.-B. and L.A.M.-R.; Validation, I.A.-B.,
M.M.-S., C.A.H.-G., M.L.-B. and L.A.M.-R.; Visualization, A.M.-S. and C.A.H.-G.; Writing—original
draft, I.A.-B. and L.A.M.-R.; Writing—review & editing, M.M.-S., A.M.-S., C.A.H.-G. and M.L.-B. All
authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the Mexican National Council for Science and Technology
(CONACYT) through the Research Projects 882, 278, and 613.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the
study’s design, in the collection, analyses, or interpretation of data, in the writing of the manuscript,
or in the decision to publish the results
Abbreviations
The following abbreviations are used in this manuscript:
A, B, . . . , H Working variables
ADDKT Address of round constant
AES Advanced Encryption Standard
B SHA-256 simple hardware architecture
B(h) SHA-256 hybrid redundancy hardware architecture
Bk Replica k of B
(h)
Bk Replica k of B(h)
(256)
BigSigma0 Σ0 function
(256)
BigSigma1 Σ1 function
(256)
BG1 Output of Σ1 function
(256)
BG0 Output of Σ0 function
BUFF Buffer (state) for working variables A, B, . . . , H
CH_Function Ch function
CH Output of Ch function
CBC Cipher block chaining (CBC)
ECB Electronic codebook (ECB)
CFB Cipher feedback (CFB)
OFB Output feedback (OFB)
CTR Counter (CTR)
DES Data Encryption Standard
ECC Elliptic-curve cryptography
enQ Enable for storing final hash value
enR Enable register for initial hash
enSB1 Enable for stage block 1
enSB2 Enable for stage block 2
enSB3 Enable for stage block 3
HOF New initial hash value
H0R Initial hash register (it changes if there is more than one message block)
Sensors 2022, 22, 5028 22 of 23
l Latency of B
l (h) Latency of B(h)
N Number of redundancies, where N ∈ Z+
NBUFF New buffer value
Kbps/LUT Kilobits per second per LUT
KT Round constant value
MAJ_Function Maj function
MAJ Output of Maj function
Mbps Megabits per second
MD5 Message-Digest Algorithm 5
PEA Processing Element A
PEB Processing Element B
Reg Pipeline registers
RSA Rivest, Shamir y Adleman Algorithms
RST Reset
selFH Selector of final hash
selH0 Selector of Initial Hash
selMUX Selector of Output hash value
selS1 Selector of external/internal message block
(256)
SG1 Output of σ1 function
(256)
SG0 Output of σ0 function
SHA Secure Hash Algorithm
SHA-256 SHA with 256-bit hash
(256)
SmallSigma1 σ1 function
(256)
SmallSigma0 σ0 function
TA Temporary working variable A
TE Temporary working variable E
THASH Temporary intermediate hash value
VHDL Very high speed integrated circuit hardware description language
WT Word for each round
1S-HASH 1-stage final temporary hash value
2S-HASH 2-stage final temporary hash value
3S-HASH 3-stage final temporary hash value
References
1. Ahmad, I.; Das, A.S. Analysis and detection of errors in implementation of SHA-512 algorithms on FPGAs. Comput. J. 2007, 50,
728–738. [CrossRef]
2. European Commission . High-Level Expert Group on Artificial Intelligence—Ethics Guidelines for Trustworthy AI; European Commis-
sion: Brussels, Belgium, 2019; 41p. Available online: https://www.aepd.es/sites/default/files/2019-12/ai-ethics-guidelines.pdf
(accessed on 14 June 2022).
3. Benfica, J.; Green, B.; Porcher, B.C.; Poehls, L.B.; Vargas, F.; Medina, N.H.; Added, N.; de Aguiar, V.A.P.; Macchione, E.L.A.;
Aguirre, F.; et al. Analysis of FPGA SEU sensitivity to combined effects of conducted EMI and TID. In Proceedings of the 2016
Asia-Pacific International Symposium on Electromagnetic Compatibility (APEMC), Shenzhen, China, 17–21 May 2016; Volume 1,
pp. 887–889. [CrossRef]
4. Shum, W. Glitch Reduction and CAD Algorithm Noise in FPGAs. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 2011.
Available online: https://tspace.library.utoronto.ca/bitstream/1807/31442/3/Shum_Warren_201111_MASc_thesis.pdf (accessed
on 14 June 2022).
5. Kotipalli, S.; Kim, Y.B.; Choi, M. Asynchronous Advanced Encryption Standard Hardware with Random Noise Injection for
Improved Side-Channel Attack Resistance. J. Electr. Comput. Eng. 2014, 2014, 837572. [CrossRef]
6. Traoré, P.S.; Asfour, A.; Yonnet, J.P. Noise analysis of a high sensitivity GMI sensor based on a Field-Programmable-Gate-Array.
Sens. Actuators Phys. 2021, 331, 112972. [CrossRef]
7. Nguyen, V.T.; Dam, M.T.; So, J.; Lee, J.G. Immunity Characterization of FPGA I/Os for Fault-Tolerant Circuit Designs against
EMI. Adv. Electr. Comput. Eng. 2019, 19, 37–44. [CrossRef]
8. Aljawarneh, S.; Alkhateeb, F.; Maghayreh, E.A. A Semantic Data Validation Service for Web Applications. J. Theor. Appl. Electron.
Commer. Res. 2010, 5, 39–55. [CrossRef]
9. National Technical Information Service. FIPS 180-2—Secure Hash Standard; U.S. Department of Commerce/NIST: Springfield, VA, USA,
2002. Available online: http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf (accessed on 10 October 2021).
Sensors 2022, 22, 5028 23 of 23
10. Qasim, S.M.; Abbasi, S.A.; Almashary, B. A review of fpga-based design methodology and optimization techniques for efficient
hardware realization of computation intensive algorithms. In Proceedings of the 2009 International Multimedia, Signal Processing
and Communication Technologies, Aligarh, India, 14–16 March 2009; pp. 313–316.
11. Singh, R.; Rajawat, A. A Review of FPGA-based design methodologies for efficient hardware Area estimation. IOSR J. Comput.
Eng. 2013, 13, 1–6. [CrossRef]
12. Dubrova, E. Time Redundancy, Fault-Tolerant Design; Springer: New York, NY, USA, 2013; pp. 137–155, ISBN 978-1-4614-2113-9.
[CrossRef]
13. Johnson, B.W. The Design and Analysis of Fault Tolerant Digital Systems; Addison-Wesley: Boston, MA, USA, 1989; ISBN 9780201075709.
14. Koren, I.; Krishna, C.M. Fault-Tolerant Systems; Elsevier: Amsterdam, The Netherlands, 2010; ISBN 9780080492681.
15. Cofer, R.C.; Harding, B.F. Rapid System Prototyping with FPGAs: Accelerating the Design Process; Elsevier: Amsterdam,
The Netherlands, 2006.
16. Harikrishna, B.; Ravi, S.F. A survey on fault tolerance in FPGAs. In Proceedings of the 2013 7th International Conference on
Intelligent Systems and Control (ISCO), Coimbatore, India, 4–5 January 2013; pp. 265–270. [CrossRef]
17. Bahramali, M.; Jiang, J.; Reyhani-Masoleh, A. A fault detection scheme for the FPGA implementation of SHA-1 and SHA-512
round computations. J. Electron. Test. Theory Appl. 2011, 27, 517–530. [CrossRef]
18. Michail, H.E.; Kotsiolis, A.; Kakarountas, A.; Athanasiou, G.; Goutis, C. Hardware implementation of the Totally Self-Checking
SHA-256 hash core. In Proceedings of the IEEE EUROCON 2015—International Conference on Computer as a Tool (EUROCON),
Salamanca, Spain, 8–11 September 2015; pp. 1–5. [CrossRef]
19. Michail, H.E.; Athanasiou, G.S.; Theodoridis, G.; Gregoriades, A.; Goutis, C.E. Design and implementation of totally-self checking
SHA-1 and SHA-256 hash functions’ architectures. Microprocess. Microsyst. 2016, 45, 227–240. [CrossRef]
20. Kahri, F.; Mestiri, H.; Bouallegue, B.; Machhout, M. An efficient fault detection scheme for the secure hash algorithm SHA-512.
In Proceedings of the 2017 International Conference on Green Energy Conversion Systems (GECS), Hammamet, Tunisia,
23–25 March 2017; pp. 1–5. [CrossRef]
21. Algredo-Badillo, I.; Ramírez-Gutiérrez, K.A.; Morales-Rosales, L.A.; Pacheco Bautista, D.; Feregrino-Uribe, C. Hybrid Pipeline
Hardware Architecture Based on Error Detection and Correction for AES. Sensors 2021, 21, 5655. [CrossRef]
22. Kuon, I.; Tessier, R.; Rose, J. FPGA Architecture: Survey and Challenges. Found. Trends Electron. Des. Autom. 2008, 2, 135–253.
[CrossRef]
23. Algredo-Badillo, I.; Feregrino-Uribe, C.; Cumplido, R.; Morales-Sandoval, M. FPGA-based implementation alternatives for the inner
loop of the Secure Hash Algorithm SHA-256. Microprocess. Microsyst. 2013, 37, 750–757. [CrossRef]
24. Manjikian, N. A study of maximum frequency in FPGA chips using mesh and toroid circuit topologies. In Proceedings of the 2017
IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), Windsor, ON, Canada, 30 April–3 May 2017;
pp. 1–6. [CrossRef]
25. Guzman, I.C.; Nieto, R.D.; Bernal, Á. FPGA implementation of the AES-128 algorithm in non-feedback modes of operation. Dyna
2016, 83, 37–43. [CrossRef]
26. García, R.; Algredo-Badillo, I.; Morales-Sandoval, M.; Feregrino-Uribe, C.; Cumplido, R. A compact FPGA-based processor for
the Secure Hash Algorithm SHA-256. Comput. Electr. Eng. 2014, 40, 194–202. [CrossRef]
27. Pal, S.; Mukherjee, A. A New Power-Gated Hybrid Defect Tolerant Approach Based on Modular Redundancy. In Proceedings of
the 2021 Asian Conference on Innovation in Technology (ASIANCON), Pune, India, 27–29 August 2021; pp. 1–4. [CrossRef]
28. Liu, C.; Wang, X.; Liu, X. Research on Hybrid Redundancy Voting Algorithm Based on Fuzzy Theory. J. Phys. Conf. Ser. 2021,
2035, 012016. [CrossRef]