In-Memory Acceleration of McEliece

This document summarizes a paper presented at the 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) that proposes iMACE, an in-memory hardware accelerator for the Classic McEliece encryption algorithm using resistive RAM (ReRAM) crossbar arrays. The paper presents an in-memory design for the Classic McEliece encryption that includes two pipelined versions to trade off throughput and energy efficiency. Simulation results show iMACE achieves 18.8-94x better throughput and 46%-97% reduction in energy compared to an FPGA implementation.

Uploaded by

Thi Pham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views6 pages

In-Memory Acceleration of McEliece

Uploaded by

Thi Pham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

iMACE: In-Memory Acceleration of Classic

McEliece Encoder
(Invited Paper)
Karthikeyan Nagarajan, Sina Sayyah Ensan, Swagata Mandal*
Swaroop Ghosh, and Anupam Chattopadhyay**
School of EECS, Pennsylvania State University, Univeristy Park, USA
School of ECE, Jalpaiguri Government Engineering College, India*
School of CSE, Nanyang Technological University, Singapore**
{kxn287, sxs2541, szg212}@[Link] [Link]@[Link]* anupam@[Link]**

Abstract— Asymmetric code-based crypto-systems have been to conventional computing.

developed in the last decade due to rapid evolution of quantum Hardware acceleration of McEliece crypto-system is pro-
computing that can potentially compromise RSA and ECC posed [11] using alternate codes (QC-MDPC) to achieve a
based crypto-systems. The McEliece crypto-system based on
the general decoding problem is one of the front runner smaller resource footprint while ensuring reasonable perfor-
candidates for post-quantum cryptography but the energy- mance for various applications. However, this implementa-
efficiency is limited by the heavy data traffic between the pro- tion deviates from the traditional usage of Goppa codes [12]
cessing elements and the memory. In memory-computing (IMC) for McEliece. A smart-card implementation of McEliece has
architectures can remove the energy-efficiency barriers posed been proposed [13] on an Infineon SLE76 chip. However,
by Von-Neumann computing due to movement of data between
the processor and the memory. Emerging non-volatile memories this implementation poses the problem of transmission and
(NVM) such as, Resistive RAM (ReRAM) implemented in a storage of the large public keys required for McEliece
crossbar array are promising substrates to realize IMC due encryption. There has been little work on exploiting IMC for
to excellent High Resistance State (HRS) to Low Resistance security oriented applications. In [14], a hardware based hash
State (LRS) ratios and high-densities. Therefore, McEliece can function is proposed using crossbar memristive technology. It
be benefited substantially by in-memory acceleration. We pro-
pose, iMACE, a high performance and area-efficient hardware exploits the write disturb phenomenon and process variations
implementation of the core encoding function of McEliece in the crossbar array for implicit key embedding. A ReRAM-
by exploiting ReRAM-based IMC. Simulation results show based in-memory implementation of the SHA-3 encryption
18.8X-94X better throughput and 46%-97% reduction in energy algorithm is proposed [15].
consumption compared to the FPGA-based implementation. In this work, we propose iMACE: In-memory accelera-
tion of classic McEliece encoder using ReRAM-based In-
I. I NTRODUCTION
memory computing. iMACE employs a dynamic in-memory
The security of traditional security primitives such as, computing (DCIM) architecture [16] that is energy-efficient
Rivest Shamir Adleman (RSA) [1], Digital Signature Algo- compared to other contemporary IMC architectures. Follow-
rithm (DSA) [2], Elliptic Curve Digital Signature Algorithm ing contributions are made in this paper. We, (a) propose
(ECDSA) [3], and other Error-Correcting Code (ECC)-based an in-memory implementation for the Classic McEliece
crypto-systems rely on the assumption about the difficulty crypto-system’s encryption algorithm including its complex
of various problems in number theory, such as, the Inte- mathematical operations on DCIM arrays; (b) propose two
ger Prime Factorization Problem or the Discrete Logarithm versions of pipelining the array operations to create a trade-
Problem. Recent developments have shown that quantum off surface between throughput and energy; (c) perform pro-
computers can break public-key crypto-systems based on cess variation analysis on iMACE designs; and, (d) provide
such hard number theory problems. Therefore, it is essential comparative analysis against FPGA implementation.
to explore alternate quantum resistant crypto-systems. The The paper is organized as follows: Section II describes
Classic McEliece code-based crypto-system is a candidate the basics of Classic McEliece and the DCIM architecture.
for the Post-Quantum Cryptography Standardization project Section III presents the in-memory design of the encryption
by NIST [4]. Conventional von-Neumann computing sepa- algorithm including two pipelined versions: iMACE-1 and
rates processing and memory elements resulting in energy iMACE-2. Section IV presents process variation analysis and
and performance bottlenecks when Classic McEliece algo- comparison with FPGA implementations. Finally, Section V
rithm is executed in the processor. In-Memory Computing draws the conclusion.
(IMC) architectures have gained significant attention due to
features such as, high degree of parallelism, elimination of II. BACKGROUND ON M C E LIECE AND IMC
external data movement, and the flexibility of partitioning A. Classic McEliece Encoder
the memory resources between computation and storage. The McEliece crypto-system [17] is a public key crypto-
Many works have been proposed exploring various IMC system that uses a public and private key pair to encrypt and
architectures such as, native realization of neural network decrypt a message.
architectures [5], programmable architectures [6], matrix Public key crypto-system: Assuming that the message
multiplications [7], neuromorphic computing [8][9], and ap- receiver is A and the sender is B, A publishes its public key
proximate computing [10]. These techniques have shown to everyone. B uses A’s public key to encrypt the message
improvement in performance and power efficiency compared and transmit it to A. Finally, A uses its private key to decrypt

978-1-7281-3391-1/19/$31.00 ©2019 IEEE 513

DOI 10.1109/ISVLSI.2019.00098
Fig. 1: Representative illustration of calculation of G and plaintext encryption in the McEliece crypto-system with n = 12
and k = 8.

Fig. 2: (a) XOR implementation using DCIM in ReRAM crossbar array; (b) Timing diagram of logical XOR operation. [16]

the received message. An adversarial interceptor, C, would (B) has to create a random binary vector (e) of length n
be unable to decode the transmitted message with just the and a Hamming weight of wt(e) ≤ t. The ciphertext, c, is
public key. This is because it is computationally infeasible then calculated as follows: c = mG + e. The overview of
to compute the private key based on the public key. the calculation of the G matrix and plaintext encryption is
Classical Goppa Codes: Goppa codes [12], a class of shown in Fig. 1.
linear error correcting codes, are used in the McEliece McEliece Decryption: Once A receives the encrypted
crypto-system. Specifically, it uses a class of irreducible signal c, it computes c = cP −1 . This is followed by
binary Goppa codes. The code defined has a length of n, computing a value known as syndrome (Sz = z H T ) and
dimension of k, and can correct up to t errors. A binary the application of an error correction algorithm that uses c
Goppa code uses a polynomial g(x) over GF (2m ) of degree and Sz as its inputs. Once the error vector, e, is calculated
t. The x is chosen to ensure that the Goppa polynomial is the message is recovered as the first k bits of z ⊕ e.
irreducible. Note that an irreducible Goppa code Γ(L, g(x)) In this paper, we focus on the implementation of McEliece
has a minimum distance d ≥ 2t + 1. Similar to other block encryption using the ReRAM based In-memory computing
codes, Goppa codes use a parity check matrix (H) such that architecture described in Section II-B.
HcT = 0 for all code vectors c in GF (2m ). This satisfies
the Goppa code requirement. Encoding requires the plaintext B. Logic operations using In-Memory Computing
message (m) to be multiplied by a generator matrix of the Various IMC architectures have been proposed in the
Goppa code. This generator matrix is defined to be a k × n literature. iMACE uses DCIM [16], which is a ReRAM
matrix G, where the rows of G form the basis of the Goppa crossbar based architecture shown in Fig. 2 (a). Each bitcell
code. Any k ×n matrix G with a rank k, such that GH T = 0 in DCIM, is composed of a bidirectional diode in series with
is a generator matrix. a ReRAM. We have used the ASU ReRAM Verilog-A model
McEliece Key Generation: In McEliece, A contructs the [18] along with PTM 65nm technology for the analysis. The
public key by selecting a Goppa polynomial g(z) of degree ReRAM is bipolar HfOx -based resistive switching memory
t and computes a generator matrix G of the Goppa code. [18]. Oxide thickness = 5nm, gap = 0.1nm/1.7nm, atomic
A then chooses a random invertible matrix S and a random distance for oxide = 0.25nm, and atomic energy for vacancy
permutation matrix P . These matrices (S, G, and P) are then generation/recombination = 1.501eV/1.5eV are used here.
used to

compute G = SGP . The public key of A consists By implementing the functions in the Sum-of-Product
of (G , t) and the private key consists of (S, G, P ). (SoP) form, DCIM executes in-memory computation. DCIM
McEliece Encryption: In order to encrypt a plaintext, dedicates different arrays for AN D and OR. Wordlines
m ∈ {0, 1}k , where k refers to the dimensionality, the sender (WL) and bitlines (BL) serve as inputs and outputs to

514
Fig. 3: Division of the S and G matrices in sub-matrices. Note that n = 12 and k = 8 and the numbers shown are only
examples.

Fig. 4: Division of the SG and P matrices to sub-matrices. Note that n = 12 and k = 8 and the numbers shown are only
examples.
the AN D and OR functions, respectively. Arrays are pre- III. I MACE A RCHITECTURE
programmed to perform the desired functions. For instance,
in order to implement in0 .in1 , the bitcells connected to A. Preliminaries
in0 and in1 are programmed to Low Resistance State The key parameters considered for the in-memory im-
(LRS) while the bitcells connected to in0 and in1 are plementation of McEliece are the message length (n) and
programmed to High Resistance State (HRS) (Fig. 2(a)). All the dimensionality (k). In this paper we have chosen a
other ReRAMs are programmed to HRS (e.g., the ReRAMs representative example of n = 12 and k = 8. Additionally,
connected to input inn and inn ). for a fixed n and k, the dimensions of the matrices S,
G, and P are fixed as (8 × 8), (8 × 12), and (12 × 12)
An XOR function implemented in DCIM is shown in Fig. respectively. Note that iMACE allows other values of n and
2. At first, the P re signal that precharges the AN D array’s k as well (Section IV-C). iMACE performs the computation
BLs to VDD is asserted. Then, inputs (in0 and in1 ) are of the public key G (calculated as G = SGP ). This
applied by asserting the ENAN D signal. As shown in Fig. operation is split into generation of the S × G matrix and
2(b), when in0 = in1 = 1 both BL0 and BL1 drop below the SG × P matrix. This is followed by the multiplication of
the reference voltage (VRef −AN D ). As a result, the Sense the plaintext message by G , addition of an error vector (e),
Amplifier (SA) that which determines the results of in0 .in1 and subsequently the generation of the encrypted message c.
and in0 .in1 functions are pulled down to 0 at the edge of B. S × G Computation
SEAN D . Next, the AN D array SA outputs are provided
as inputs to the OR array. Since inputs of the OR array The first step involves the multiplication of the (8 × 8) S
are ‘0’, the BL (BL0OR ) remains discharged and results in matrix and the (8 × 12) G matrix. The computation of each
in0 ⊕ in1 = 0. If in0 = 0 and in1 = 1, BL0 discharges element in the SG matrix requires 8 2-bit ANDs and 1 8-bit
while BL1 remains precharged and results in in0 .in1 = 0 XOR. Each of the 8 elements of a row (i ∈ {1, 8}) of S
and in0 .in1 = 1. Therefore, BL0OR starts charging at the is AND’ed with its corresponding element for all columns
edge of ENOR . Finally, the voltage of BL0OR is compared (j ∈ {1, 12}) in G. The 8 AND’ed outputs for a particular i
against VRef −OR at the edge of SEOR and produces ‘1’ at and j are then XOR’ed with each other to compute the SG
the output of the SA. Note that the OR array sense enable element at location {i, j}.
(SEOR ) is an active low signal. This overall XOR operation AND operation: The total number of AND operations
shown in Fig. 2 also depicts the AND and OR operations required is 8 × 8 × 12 = 768. We perform the calculation
that are required for iMACE’s implementation. after splitting the matrices S and G into smaller matrices as
shown in Fig. III-B. Note that this division of matrices is just
From simulation results it is determined that the XOR a representative example and done to ensure sufficient sense
operation takes 1.96ns to 3ns (delay increases when ReRAM margin in all DCIM arrays. S is split into two matrices by
HRS/LRS resistances change with time). iMACE uses a rows as S1 (4 × 8) and S2 (4 × 8). G is split into 3 matrices
simpler version of DCIM called FPCAS [19] that uses by columns as G1 (8 × 4), G2 (8 × 4), and G3 (8 × 4). Both
N AN D-N AN D memory arrays instead of AN D-OR ar- S1 and S2 are multiplied by each of the G1, G2, and G3
rays. N AN D arrays are the same as AN D arrays but out matrices. This operation requires a total of 8 ReRAM DCIM
node of SA is considered as the final result instead of SA’s arrays with each array having 64 inputs (8 × 4 + 8 × 4) and
out node. N AN D is a complete function and any logical 128 outputs (4 × 8 × 4).
function can be implemented using two stages of N AN D XOR operation: In order to perform the 8-bit XOR for
each SG element, the 8 bits are divided into 2 sets of 4 bits
functions (e.g. a.b + a.b = a.b + a.b = a.b.a.b). each. Each of the 4 bits are XOR’ed with each other and

515
(a) (b)
Fig. 5: (a) iMACE-I pipeline architecture with a latency of 2.5ns and a throughput of 3.2Gbps; (b) Size and number of
arrays required for each operation for both iMACE-I & -II.

(a) (b)
Fig. 6: (a) iMACE-II pipeline architecture with a latency of 8.5ns and a throughput of 16Gbps; (b) Area, aggregate power,
and delay of each of the McEliece encryption operations.
then the resultant 2 bits are XOR’ed again to obtain the SG the 8×12 SG matrix. Fig. 5b gives a overview of all the
element. designed array sizes and numbers.
Simulation Results: The overall area consumed by the
4-bit XOR: The 4-bit XOR is divided into 2 stages. The DCIM arrays for this step is 25.94 KGEs where 1 GE (0.25
first stage has a total of 8 × 8 × 12 = 768 inputs. This μm2 / 60F 2 ) is the area of a minimum sized NAND gate in
translates to a total of 192 4-bit sets. Each 4-input XOR gate 65nm process [20]. The aggregate power consumed by each
has 8 intermediate minterms and therefore the XOR array of the arrays is 15.28mW. Note that the energy consumed
requires 8 BLs per input. Since, each input requires 2 WLs will depend on the configuration of the pipeline as discussed
(input and its complement), a 4 input XOR gate requires 8 in Sections III-F and III-G. The sum of individual delays
WLs. Therefore, the operation requires a total of 12 ReRAM (each < 500ps) of all operations is 2.33ns. Fig. 6b shows
DCIM arrays with each array taking 128 inputs and resulting the simulation results for each operation.
in 128 outputs (128 × 128). In the second stage of the 4-bit
XOR operation, we have a total of 12×128 = 1536 inputs as C. SG × P Computation
generated by the first stage. Each set of 8 input bits requires This step involves the multiplication of the (8 × 12) SG
1 output BL. Therefore, this stage requires 24 arrays each matrix with the (12 × 12) P matrix to generate a (8 × 12)
taking 128 inputs and resulting in 8 outputs (8 × 128). A SGP (also G ) matrix. The SG matrix is split into 4 (2×12)
total of 24 × 8 = 192 outputs are generated. matrices by rows as SG1, SG2, SG3, and SG4 (Fig. 4). The
2-bit XOR: The 2-bit XOR is divided into 2 stages as P matrix is split into 4 (12 × 3) matrices by columns as
well. The first stage has a total of 192 inputs as generated P1, P2, P3, and P4 (Fig. 4). Similar to the previous SG
by the 4-bit XOR’s final output. Each set of 2 bits requires computation, the operation is divided into two stages: AND
2 BLs for its 2 minterm outputs and requires 4 WLs for its and XOR. The XOR is further divided into 4-input XORs
input and complement. A total of 3 arrays are required, each and 3-input XORs. The number of inputs, outputs, BLs and
taking 128 inputs and resulting in 64 outputs (64 × 124). WLs required for SGP computation is calculated using the
3 × 64 = 192 outputs are generated. The second stage takes rules of DCIM as shown in Section. III-B. For the sake of
in the 192 intermediate minterm inputs. A set of 2 minterm brevity, we only report the choices on the number and size
inputs requires 1 BL for its output and 4 WLs for its input of arrays required for the above mentioned operations.
and complement. Three arrays each taking 128 inputs and AND operation: AND requires a total of 16 DCIM arrays
resulting in 32 outputs (32 × 128) are required. A total of each taking 128 inputs and resulting in 72 outputs. In order
3 × 32 = 96 outputs are generated. These 96 outputs forms to ensure that the array sizes are a power of 2, we round

516
up the number of outputs to 128 (with 56 unused output G. iMACE-II Pipelining
BLs). The 16 (128 × 128) AND arrays generate a total of iMACE-II is an aggressively pipelined configuration
16 × 72 = 128 outputs. where each single mathematical operation (e.g. 4-bit XOR,
XOR operation: In order to perform the 12-bit XOR for AND etc) is pipelined as shown in Fig. 6a. This ensures that
each SGP element, the 12 bits are divided into 3 sets of 4 no array lays dormant in any cycle. iMACE-II concurrently
bits each. Each of the 4 bits are XOR’ed and the resultant 3 works on 10 plaintext messages as compared to 4 proposed in
bits are XOR’ed again to obtain the SGP element. iMACE-I. This configuration produces an encrypted output
4-bit XOR: The first stage requires 18 arrays each taking every 500ps and has a throughput of 16Gbps. The power
128 inputs and resulting in 128 outputs (128 × 128). The consumed is 31.4mW. It is geared towards applications that
second stage requires 36 arrays each taking 128 inputs and are throughput sensitive.
resulting in 8 outputs (8 × 128).
3-bit XOR: The first stage requires 5 arrays each taking IV. R ESULTS , A NALYSIS , AND D ISCUSSION
128 inputs and resulting in 128 outputs (128×128). Note that A. Process Variation Analysis
some of the BLs and WLs are not used in the arrays since Process variations can lead to worst case scenarios where
we are rounding up the array sizes to the nearest power of 2. the bitline voltage change may not provide a sufficiently high
This stage generates 384 outputs. The second stage requires sense margin (SM) for the SA to detect the change in bitline
6 arrays each taking 128 inputs and resulting in 16 outputs voltage. We conduct a 1000-point Monte-Carlo analysis at
(16 × 128). It generates a total of 3 × 32 = 96 outputs that -10◦ C, 25◦ C, and 90◦ C on the DCIM NAND arrays. The
form the 8 × 12 SGP matrix. process variation is modeled by changing the key metrics of
Simulation Results: Fig. 6b shows the simulation results. the crossbar design including ReRAM low resistance gap
D. Plaintext encryption (GIL), high resistance gap (GIH), oxide thickness of the
MOSFETs, and the channel length of all the transistors.
The plaintext message (m) which is represented as a (1 × GIL and GIH are modeled as a 3σ variation of 7% of their
8) matrix is multiplied with the (8 × 12) G matrix (also nominal values of 0.1nm and 1.7nm respectively. The oxide
SGP ). This intermediate 1 × 12 matrix (c ) is computed thickness and channel lengths of the MOSFETS are modeled
similar to the previously discussed matrix multiplications. as a 3σ variation of 10% of their nominal values of 1.2nm
It consists of 2-input AND, 4-input XOR, and 2-input XOR and 65nm respectively. In DCIM arrays it is observed that
gates implemented using DCIM arrays. The size and number the SM is inversely proportional to the number of inputs per
of arrays are listed as follows. output. Our implementation utilizes 2, 4, and 8-input NAND
AND: AND requires 2 arrays each taking 128 inputs and gates. The distributions of SM observed for each of these
resulting in 64 outputs (64 × 128). cases under different temperatures are shown in Fig. 7. The
XOR Operation: The XOR bits of the AND outputs is lowest SM is 17mV (at 90C o ). Note that the number of sense
divided into 2 sets of 4 bits each. The 4 bits are XOR’ed and amplifiers in iMACE designs is 8344 which corresponds to
the resultant 2 bits are XOR’ed again to obtain c element. 3.38 sigma. The sense amplifiers should be upsized to keep
4-bit XOR: The first stage requires 1 (128 × 128) array sense amplifier offset to ∼ 5mV per sigma.
and 1 (64 × 64) array. The second stage requires 3 (8 × 128)
arrays. B. Comparative Analysis of iMACE
2-bit XOR: The first stage requires 1 (32 × 64) array and We have compared iMACE with a FPGA based im-
the second stage requires 1 (16 × 64) array. plementation. Since this paper focuses on the encryption
Simulation Results: Fig. 6b shows the simulation results. portion of the McEliece encoder, we have implemented the
encryption portion alone (Section III) on a FPGA device
E. Error vector addition
(used xc7k325t-2ffg900 as device during synthesis). The
The final step of the encryption process is a bit-wise XOR FPGA implementation has following characteristics: No. of
of an error vector of length 12 with the c generated in the slice registers = 64, No. of slice LUTs = 401, No. of
previous step. It only requires 12 2-input XOR gates. LUT-FF pair = 48, No. of bonded IOB = 112, and No. of
XOR Operation: The first stage of the XOR operation BUFG/BUFGCTRL/BUFHCEs = 2. The IMC design of the
requires 1 (32 × 64) array and the second stage required 1 McEliece Encryption requires a total area of 74.88 KGE.
(16 × 64) array. It is not possible to compare the resource utilization of
Simulation Results: Fig. 6b shows the simulation results. an FPGA with the area requirements in a CMOS-based
implementation. Therefore, our comparison uses energy and
F. iMACE-I Pipelining throughput for each implementation as shown in Table I.
The above discussed operations to encrypt the plaintext The FPGA implementation mimics the Von-Neumann
message can be pipelined to maximize the throughput. We computing model with separate processing and memory
have proposed two different configurations. The first config- elements. Therefore, the throughput and energy requirements
uration, iMACE-I, has each of the stages (i.e. S ×G, SG×P , for the encryption will worsen due to the delay and energy
m × G , and mG + e) pipelined as shown in Fig. 5a. Since consumption of transmitting the plaintext bits to the pro-
the delay of all mathematical DCIM array operations for cessor. The FPGA implementation would require >0.21nJ
McEliece is < 500ps, the clock frequency for this operation (assuming 40pJ/bit for LPDDR2 memory) [21] and 42ns
is chosen to be 2GHz. The configuration concurrently works latency (RAS timing) to just move 1 block of message (8
on 4 plaintext messages based on the array sizes and numbers bits of plaintext message) from the memory to the processor.
shown in Fig. 5b. The pipeline allows the encryption of 8-bits It is seen that iMACE-I and iMACE-2 have 1.15X and
of plaintext message every 2.5ns. Therefore, iMACE-I has a 5.76X better throughput than the FPGA implementation
throughput of 3.2Gbps and power consumption of 6.28mW. while not considering the data transmission. The throughput

517
(a) (b) (c)
Fig. 7: Process variation analysis of DCIM arrays at temperatures for 2-input, 3-input, 4-input and 8-input NAND gates at
(a) −10C o ; (b) 25C o ; and (c) 90C o .
TABLE I: Comparison of iMACE implementations [3] D. Johnson et al, “The elliptic curve digital signature algorithm
Work Tech. f Lat. Tput. Energy (ECDSA),” International journal of information security, vol. 1, no. 1,
(MHz) (ns) (Mbps) (nJ) pp. 36–63, 2001.
[4] National Institute of Standards and Technology (NIST),
iMACE-I DCIM 2K 8.5 3.2K 0.02 “Post-Quantum Cryptography Standardization.” Online:
iMACE-II DCIM 2K 8.5 16K 0.34 [Link]
FPGA - 1K 2.88 2.78k 0.41 quantum-cryptography-standardization.
Data Trans. - 1K 42 0.19K 0.21 [5] M. Hu et al, “Dot-product Engine for Neuromorphic Computing:
FPGA Total. - 1K 44.88 0.17K 0.62 Programming 1T1M Crossbar to Accelerate Matrix-vector Multi-
plication,” in Proceedings of the 53rd Annual Design Automation
Conference, DAC ’16, (New York, NY, USA), pp. 19:1–19:6, ACM,
performance increases to 18.8X and 94X when the data 2016.
transmission delay is included. iMACE-I and iMACE-II [6] Y. Zha et al, “Reconfigurable In-memory Computing with Resistive
reduces energy consumption by 95% and 18% (without Memory Crossbar,” in Proceedings of the 35th International Confer-
ence on Computer-Aided Design, ICCAD ’16, pp. 120:1–120:8, 2016.
FPGA data transfer) and by 97% and 46% (with data [7] L. Ni et al, “An energy-efficient matrix multiplication accelerator by
transfer) respectively. In summary, the comparison shows distributed in-memory computing on binary RRAM crossbar,” in 2016
superior throughput and energy consumption in all scenarios 21st Asia and South Pacific Design Automation Conference (ASP-
DAC), pp. 280–285, Jan 2016.
for both flavors of iMACE against a traditional FPGA-based [8] G. W. Burr et al, “Experimental Demonstration and Tolerancing of a
implementation of the same encryption algorithm. Large-Scale Neural Network (165000 Synapses) Using Phase-Change
Memory as the Synaptic Weight Element,” IEEE Transactions on
C. Scalability Electron Devices, vol. 62, pp. 3498–3507, Nov 2015.
[9] S. Yu et al, “A neuromorphic visual system using RRAM synaptic
The proposed iMACE design can be scaled to higher devices with Sub-pj energy and tolerance to variability: Experimental
values of n and k by increasing the number of DCIM arrays characterization and large-scale modeling,” in 2012 International
Electron Devices Meeting, pp. 10.4.1–10.4.4, Dec 2012.
to a feasible quantity based on area and power constraints [10] B. Li et al, “Memristor-based Approximated Computation,” in Pro-
of the application. For example, an alternative selection ceedings of the 2013 International Symposium on Low Power Elec-
of McEliece parameters such as n = 16 and k = 12 tronics and Design, ISLPED ’13, pp. 242–247, 2013.
[11] I. von Maurich and T. Gneysu, “Lightweight code-based cryptography:
requires would result in increase of array area by 5.9X Qc-mdpc mceliece encryption on reconfigurable devices,” in 2014
and energy consumption by 2.6X. Similarly, higher values Design, Automation Test in Europe Conference Exhibition (DATE),
of n and k required for post-quantum cryptography can be pp. 1–6, March 2014.
[12] E. Berlekamp, “Goppa codes,” IEEE Transactions on Information
accommodated by iMACE. Under resource constraints, it is Theory, vol. 19, no. 5, pp. 590–592, 1973.
also possible to implement resource sharing to accommodate [13] F. Strenzke, “A smart card implementation of the mceliece pkc,”
larger key sizes but with a relatively lower throughput. in Information Security Theory and Practices. Security and Privacy
of Pervasive Systems and Smart Devices (P. Samarati, M. Tunstall,
J. Posegga, K. Markantonakis, and D. Sauveron, eds.), (Berlin, Hei-
V. C ONCLUSIONS delberg), pp. 47–59, Springer Berlin Heidelberg, 2010.
We proposed 2 flavors of McEliece encryption imple- [14] L. Azriel et al, “Towards a Memristive Hardware Secure Hash
Function (MemHash),” in 2017 IEEE International Symposium on
mentation using dynamic in-memory computing. The salient Hardware Oriented Security and Trust (HOST), pp. 51–55, May 2017.
features include, i) elimination of large amounts of data [15] K. Nagarajan et al, “SHINE: A Novel SHA-3 Implementation Using
movement between the processor and the memory saving ReRAM-based In-Memory Computing,” in ISLPED, 2019.
[16] H. Motaman et al, “Dynamic Computing in Memory (DCIM) in
orders of magnitude energy consumption, ii) re-configurable Resistive Crossbar Arrays,” ICCD, Oct 2019.
design that allows optimization between energy and through- [17] R. J. McEliece, “A public-key cryptosystem based on algebraic coding
put, and, (iii) low memory footprint. theory. Technical report, NASA,” Coding Thv, vol. 4244, pp. 114–116,
1978.
Acknowledgements: This work is supported by SRC [18] P. Y. Chen et al, “Compact Modeling of RRAM Devices and Its
(2847.001), NSF (CNS- 1814710, CNS- 1722557, CCF- Applications in 1T1R and 1S1R Array Design,” IEEE Transactions
1718474, DGE-1723687 and DGE-1821766) and DARPA on Electron Devices, vol. 62, pp. 4022–4028, Dec 2015.
[19] Sayyah Ensan et al, “FPCAS: In-memory Floating Point Computations
Young Faculty Award (D15AP00089). for Autonomous Systems,” The International Joint Conference on
Neural Network (IJCNN), Jun 2020.
R EFERENCES [20] P. Pessl et al, “Pushing the Limits of SHA-3 Hardware Implemen-
[1] R. Rivest et al, “A Method for Obtaining Digital Signatures and Public- tations to Fit on RFID,” in Cryptographic Hardware and Embedded
key Cryptosystems,” Commun. ACM, vol. 21, pp. 120–126, Feb. 1978. Systems - CHES 2013, pp. 126–141, Springer Berlin Heidelberg, 2013.
[2] D. W. Kravitz, “Digital Signature Algorithm,” July 27 1993. US Patent [21] K. T. Malladi et al, “Towards Energy-Proportional Datacenter Memory
5,231,668. with Mobile DRAM,” in 2012 39th Annual International Symposium
on Computer Architecture (ISCA), pp. 37–48, June 2012.

518

Low Latency ASIC for Classic McEliece
No ratings yet
Low Latency ASIC for Classic McEliece
23 pages
McEliece Cryptosystem for Secure WSNs
No ratings yet
McEliece Cryptosystem for Secure WSNs
3 pages
McEliece Algorithm in Code-Based Cryptography
No ratings yet
McEliece Algorithm in Code-Based Cryptography
5 pages
Decryption Through Polynomial Ambiguity: Noise-Enhanced High-Memory Convolutional Codes For Post-Quantum Cryptography
No ratings yet
Decryption Through Polynomial Ambiguity: Noise-Enhanced High-Memory Convolutional Codes For Post-Quantum Cryptography
23 pages
Quantum-Resistant RLCE Encryption Scheme
No ratings yet
Quantum-Resistant RLCE Encryption Scheme
13 pages
Root
No ratings yet
Root
4 pages
Code-Based Post-Quantum Cryptography
No ratings yet
Code-Based Post-Quantum Cryptography
31 pages
Keeloq Cipher Cracking on CUDA
No ratings yet
Keeloq Cipher Cracking on CUDA
12 pages
Goppa Codes in McEliece Cryptosystem
No ratings yet
Goppa Codes in McEliece Cryptosystem
41 pages
CRC Error Detection in FPGA for Niederreiter
No ratings yet
CRC Error Detection in FPGA for Niederreiter
9 pages
Configurable 10T SRAM IMC Accelerator
No ratings yet
Configurable 10T SRAM IMC Accelerator
6 pages
Classic Mceliece: Conservative Code-Based Cryptography 29 November 2017
No ratings yet
Classic Mceliece: Conservative Code-Based Cryptography 29 November 2017
38 pages
Lattice-Based Homomorphic Encryption
No ratings yet
Lattice-Based Homomorphic Encryption
5 pages
McEliece Cryptosystems with Convolutional Codes
No ratings yet
McEliece Cryptosystems with Convolutional Codes
20 pages
28nm Programmable IMC Accelerator for DNN
No ratings yet
28nm Programmable IMC Accelerator for DNN
2 pages
Li FALCON A Fourier Transform Based Approach For Fast and Secure CVPR 2020 Paper
No ratings yet
Li FALCON A Fourier Transform Based Approach For Fast and Secure CVPR 2020 Paper
10 pages
12T SRAM Architecture for CNNs
No ratings yet
12T SRAM Architecture for CNNs
5 pages
LDPC Codes in McEliece Cryptosystem
No ratings yet
LDPC Codes in McEliece Cryptosystem
9 pages
McEliece Crypto System with Hamming Codes
No ratings yet
McEliece Crypto System with Hamming Codes
4 pages
Generalized LDPC Codes in McEliece Cryptosystem
No ratings yet
Generalized LDPC Codes in McEliece Cryptosystem
8 pages
Efficient Goppa Polynomials for McEliece Cryptography
No ratings yet
Efficient Goppa Polynomials for McEliece Cryptography
16 pages
Enhanced ECC for Medical Image Security
No ratings yet
Enhanced ECC for Medical Image Security
6 pages
A8 Key Generation in GSM Authentication
No ratings yet
A8 Key Generation in GSM Authentication
22 pages
FPGA Lightweight Encryption for Cybersecurity
No ratings yet
FPGA Lightweight Encryption for Cybersecurity
13 pages
Deniable Authentication Protocol Analysis
No ratings yet
Deniable Authentication Protocol Analysis
2 pages
AI-Based Image Encryption Using FPGA
No ratings yet
AI-Based Image Encryption Using FPGA
10 pages
Ijser: Image Encryption Using Chaotic Based Artificial Neural Network
No ratings yet
Ijser: Image Encryption Using Chaotic Based Artificial Neural Network
4 pages
1) Image Encryption Using Chaotic Based Artificial Neural Network PDF
No ratings yet
1) Image Encryption Using Chaotic Based Artificial Neural Network PDF
4 pages
A Survey For Realizing In-Memory Computing
No ratings yet
A Survey For Realizing In-Memory Computing
4 pages
Post-Quantum Certificateless Encryption Scheme
No ratings yet
Post-Quantum Certificateless Encryption Scheme
25 pages
C++ Neural Cryptography Implementation
No ratings yet
C++ Neural Cryptography Implementation
8 pages
Flowchart Approach To Scalable Encryption Algorithm Design and Implementation in FPGA
No ratings yet
Flowchart Approach To Scalable Encryption Algorithm Design and Implementation in FPGA
9 pages
Sign-Extension-Less CIM for AI Efficiency
No ratings yet
Sign-Extension-Less CIM for AI Efficiency
5 pages
McEliece Public Key Encryption Explained
No ratings yet
McEliece Public Key Encryption Explained
32 pages
Efficient Cryptographic Boolean Functions
No ratings yet
Efficient Cryptographic Boolean Functions
8 pages
ADC-Free SRAM Macro for AI Computing
No ratings yet
ADC-Free SRAM Macro for AI Computing
5 pages
McEliece Scheme Security Enhancements
No ratings yet
McEliece Scheme Security Enhancements
14 pages
MIFARE Classic Vulnerabilities Exposed
No ratings yet
MIFARE Classic Vulnerabilities Exposed
37 pages
Self-Recycling 10T-SRAM for AES CIM
No ratings yet
Self-Recycling 10T-SRAM for AES CIM
11 pages
Design and Implementation of Network Security Using Neural Network Architecture
No ratings yet
Design and Implementation of Network Security Using Neural Network Architecture
6 pages
8T SRAM-CIM for Efficient Logic Operations
No ratings yet
8T SRAM-CIM for Efficient Logic Operations
15 pages
New Distinguisher for Goppa Codes
No ratings yet
New Distinguisher for Goppa Codes
34 pages
Group Session Key Exchange Multilayer Perceptron Based Simulated Annealing Guided Automata and Comparison Based Metamorphosed Encryption in Wireless Communication (Gsmlpsa)
No ratings yet
Group Session Key Exchange Multilayer Perceptron Based Simulated Annealing Guided Automata and Comparison Based Metamorphosed Encryption in Wireless Communication (Gsmlpsa)
20 pages
New Cryptographic Algorithm Proposal
No ratings yet
New Cryptographic Algorithm Proposal
3 pages
In-Memory Computation with 8T SRAM
No ratings yet
In-Memory Computation with 8T SRAM
5 pages
Challenges in nvCIM for AI Edge Devices
No ratings yet
Challenges in nvCIM for AI Edge Devices
5 pages
DES Algorithm Implementation in Python
No ratings yet
DES Algorithm Implementation in Python
25 pages
In-Memory Computation With Improved Linearity
No ratings yet
In-Memory Computation With Improved Linearity
11 pages
AI Innovations in Secure Hill Cipher
No ratings yet
AI Innovations in Secure Hill Cipher
9 pages
Neural Network Cryptography Techniques
No ratings yet
Neural Network Cryptography Techniques
16 pages
Overview of Classic Encryption Ciphers
No ratings yet
Overview of Classic Encryption Ciphers
13 pages
Paillier Cryptosystem: Algorithm
No ratings yet
Paillier Cryptosystem: Algorithm
5 pages
Blur Gate for IC Security Enhancement
No ratings yet
Blur Gate for IC Security Enhancement
6 pages
Key-Recovery Attack on Classic McEliece
No ratings yet
Key-Recovery Attack on Classic McEliece
28 pages
Understanding Fully Homomorphic Encryption
No ratings yet
Understanding Fully Homomorphic Encryption
3 pages
Fibonacci Encryption and Decryption Guide
No ratings yet
Fibonacci Encryption and Decryption Guide
4 pages
Co-Regulation and Self-Regulation in Infants
No ratings yet
Co-Regulation and Self-Regulation in Infants
2 pages
Matrix Concepts and Solutions Guide
No ratings yet
Matrix Concepts and Solutions Guide
3 pages
Synoptic Gospels and Acts Overview
No ratings yet
Synoptic Gospels and Acts Overview
4 pages
Fcaa 179
No ratings yet
Fcaa 179
14 pages
Grade 1 Art Lesson Plan: Shapes and Meaning
No ratings yet
Grade 1 Art Lesson Plan: Shapes and Meaning
2 pages
Understanding Charisms and Gifts of the Holy Spirit
No ratings yet
Understanding Charisms and Gifts of the Holy Spirit
8 pages
English Olympiad Exam for 7th Grade
No ratings yet
English Olympiad Exam for 7th Grade
3 pages
Robust Twitter Sentiment Detection
No ratings yet
Robust Twitter Sentiment Detection
9 pages
MAP120 UserManual
No ratings yet
MAP120 UserManual
94 pages
Overview of Art and Design Disciplines
No ratings yet
Overview of Art and Design Disciplines
24 pages
Noynoy Aquino: Legacy of Leadership
No ratings yet
Noynoy Aquino: Legacy of Leadership
3 pages
Changes in Reported Speech Tenses
100% (1)
Changes in Reported Speech Tenses
3 pages
Heuristic Search and Functions Explained
No ratings yet
Heuristic Search and Functions Explained
6 pages
Atlantis - The Lost Continent Finally Found
100% (2)
Atlantis - The Lost Continent Finally Found
337 pages
Basics of HTML for Java Programming
No ratings yet
Basics of HTML for Java Programming
7 pages
Tristubh Hymns of Rigveda X
No ratings yet
Tristubh Hymns of Rigveda X
10 pages
Teaching Demonstration Schedule 2021
No ratings yet
Teaching Demonstration Schedule 2021
2 pages
Transmission Line Theory and Problems
No ratings yet
Transmission Line Theory and Problems
2 pages
Xcode 26: Key Features from WWDC 2025
No ratings yet
Xcode 26: Key Features from WWDC 2025
41 pages
Equivalence in Translation Theory
No ratings yet
Equivalence in Translation Theory
11 pages
Mathematics Sample Question Paper 2025-26
No ratings yet
Mathematics Sample Question Paper 2025-26
3 pages
USP Coffee Shop Cashier Program Guide
No ratings yet
USP Coffee Shop Cashier Program Guide
6 pages
Understanding Raster and Vector Data Models
No ratings yet
Understanding Raster and Vector Data Models
48 pages
Tafj
100% (2)
Tafj
4 pages
Vector Release Notes
No ratings yet
Vector Release Notes
503 pages
OPTIGA™ TPM SLB 9660 Data Sheet
No ratings yet
OPTIGA™ TPM SLB 9660 Data Sheet
23 pages
Dominican Rosary Prayer Guide
No ratings yet
Dominican Rosary Prayer Guide
4 pages
Stream Ciphers: Overview and Examples
No ratings yet
Stream Ciphers: Overview and Examples
16 pages
Analisis Perniagaan Antarabangsa Malaysia
No ratings yet
Analisis Perniagaan Antarabangsa Malaysia
10 pages
Exploring Self and Identity in Philosophy
No ratings yet
Exploring Self and Identity in Philosophy
6 pages

In-Memory Acceleration of McEliece

Uploaded by

In-Memory Acceleration of McEliece

Uploaded by

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

iMACE: In-Memory Acceleration of Classic

Abstract— Asymmetric code-based crypto-systems have been to conventional computing.

978-1-7281-3391-1/19/$31.00 ©2019 IEEE 513

You might also like