0% found this document useful (0 votes)
48 views

A Decoder For Short BCH Codes With High Decoding Efficiency and Low Power For Emerging Memories

Uploaded by

bhargavchanti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

A Decoder For Short BCH Codes With High Decoding Efficiency and Low Power For Emerging Memories

Uploaded by

bhargavchanti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

A Decoder for Short BCH Codes With High


Decoding Efficiency and Low Power
for Emerging Memories
Sara Choi, Hong Keun Ahn, Byung Kyu Song, Jung Pill Kim, Seung H. Kang ,
and Seong-Ook Jung , Senior Member, IEEE

Abstract— In this paper, a double-error-correcting and triple- high-density, and low-latency characteristics [1]. In addition to
error-detecting (DEC-TED) Bose–Chaudhuri–Hocquenghem SCMs, some emerging memories, such as STT-MRAM, are
(BCH) code decoder with high decoding efficiency and low also considered promising candidate embedded memories due
power for error correction in emerging memories is presented.
To increase the decoding efficiency, we propose an adaptive to their fast read and write latencies, low leakage power, and
error correction technique for the DEC-TED BCH code that logic-friendly compatibility [2], [3].
detects the number of errors in a codeword immediately after As technology scales down, these emerging memories are
syndrome generation and applies a different error correction also struggling with reduced reliability, and as a solution,
algorithm depending on the error conditions. With the adaptive error-correcting code (ECC) and its encoder/decoder circuits
error correction technique, the average decoding latency and
power consumption are significantly reduced owing to the have been applied. While NAND flash requires a powerful ECC
increased decoding efficiency. To further reduce the power capable of correcting up to 100 errors, most of the emerging
consumption, an invalid-transition-inhibition technique is memories can reach the required chip yield using an ECC
proposed to remove the invalid transitions caused by glitches of capable of correcting two or three errors because of new
syndrome vectors in the error-finding block. Synthesis results developments in storage physics [2]–[8]. In addition to simply
with an industry-compatible 65-nm technology library show
that the proposed decoders for the (79, 64, 6) BCH code take increasing the memory yield, ECC can be used to optimize
only 37%–48% average decoding latency and achieve more memory performance regarding density [9], [10] and energy
than 70% power reduction compared to the conventional consumption [11], [12]. In this manner, ECC has become an
fully parallel decoder under the 10−4 –10−2 raw bit-error essential part of emerging memories.
rate. To correct two or three errors, the Bose–Chaudhuri–
Index Terms— Adaptive error correction, Bose–Chaudhuri– Hocquenghem (BCH) code is widely adopted for emerging
Hocquenghem (BCH) code, double-error-correcting and memories [2]–[8]. However, the standard iterative and sequen-
triple-error-detecting (DEC-TED), emerging memories, error- tial decoding processes, which require multiple cycles, are
correcting code (ECC), invalid transition inhibition.
not compatible with emerging memories. This is because the
latency of the BCH code decoder should be a few nanosec-
onds, considering the short read or write access time in emerg-
I. I NTRODUCTION
ing memories. To achieve a double-error-correcting (DEC)

E MERGING memories, such as phase change mem-


ory, spin-transfer torque magnetoresistive random access
memory (STT-MRAM), phase change RAM (PRAM), and
BCH code decoder with latency of a few nanoseconds, a fully
parallel decoder structure that uses combinatorial logic gates
has been proposed in [13]–[17]. However, it continues to
resistive random access memory (ReRAM) have been inves- have 50%–80% latency penalty and consumes 6–8 times
tigated to fill the gaps in terms of performance and density more power than the single-error-correcting and double-error-
between DRAM and NAND flash memory, referred to as stor- detecting (SEC-DED) decoder. As non- or single-bit errors are
age class memories (SCMs). They are of interest for their flex- considerably more likely than multibit (double-bit or triple-
ible and efficient memory hierarchy, owing to their nonvolatile, bit) errors despite the increased raw bit-error rate (RBER) in
nanotechnology, it is inefficient to deal with non- or single-
Manuscript received May 21, 2018; revised August 27, 2018; accepted bit errors with a DEC-TED decoder in terms of latency and
October 10, 2018. This work was supported by the Graduate School of Yonsei
University Research Scholarship Grants in 2017. (Corresponding author: power, which leads to reduced decoding efficiency. Moreover,
Seong-Ook Jung.) the fully parallel decoders consume large dynamic power
S. Choi, H. K. Ahn, B. K. Song, and S.-O. Jung are with the School owing to the invalid transitions in the error-finding block.
of Electrical and Electronic Engineering, Yonsei University, Seoul 120-749,
South Korea (e-mail: [email protected]). Since most emerging memories have been widely researched
J. P. Kim and S. H. Kang are with Qualcomm Inc., San Diego, for use in low-power applications, such as wearable devices
CA 92121 USA. and IoT devices, the power of fully parallel BCH decoders
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. should also be reduced to maximize the benefits of emerging
Digital Object Identifier 10.1109/TVLSI.2018.2877147 memories.
1063-8210 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

In this paper, we propose a high-decoding-efficiency and TABLE I


low-power BCH decoder with DEC and triple-error-detecting R ELATIONSHIP B ETWEEN THE N UMBER OF E RRORS AND
S YNDROME V ECTORS FOR THE BCH DEC-TED C ODE
(DEC-TED) capability for emerging memories. To reduce
the average delay and power consumption, an adaptive error
correction technique for the DEC-TED BCH code is proposed.
In addition, an invalid transition inhibition technique using
flip-flops (FFs) and a specific ECC clock is applied to reduce
the power consumption further. The synthesis results using
65-nm technology show that the proposed DEC-TED BCH
decoder with 64-bit data words achieves more than 50% aver-
age latency reduction and 70%–75% average power saving in
comparison to the conventional decoder with an insignificant To determine whether the received codeword, v, has errors,
area overhead. syndrome vector S is calculated as

The remainder of this paper is organized as follows. S = v · HT = v · 1T , v · H1T , v · H3T = [S0 , S1 , S3 ] (2)
In Section II, an overview of BCH codes and fully parallel
structure is given. In Section III, the problems in conventional where S0 is a 1-bit vector, and S1 and S3 are m-bit vectors
fully parallel BCH decoders are discussed. The proposed for the code generated in GF(2m ). A single-bit error can be
decoder with high decoding efficiency and low power is corrected using only the S1 vector because H1 can be used as
presented in Section IV. Section V presents the synthesis and the parity-check matrix for the Hamming code. For a double-
comparison results of conventional decoders and the proposed bit error correction, S1 and S3 vectors are utilized together.
decoder. Finally, Section VI concludes this paper. It is worth noticing that the syndrome vector can be used to
detect the number of errors in the received codeword using the
II. BCH C ODES AND F ULLY PARALLEL BCH specific relationships among S0 , S1 , and S3 vectors, as shown
D ECODERS FOR E MERGING M EMORIES in Table I. This specific relationship is applied in the proposed
A. Primitive Binary BCH Code and Decoding Algorithm decoder, as will be explained in Section IV.
In general, a primitive binary BCH code is defined over a 2) Determining the Error Location Polynomial: The next
binary Galois field with degree m, denoted by GF(2m ). decoding step is to complete the error location polyno-
The (n, k, d) BCH code over GF(2m ) is represented as mial (ELP) based on the calculated syndrome vectors. For a
follows [18]: DEC-TED BCH code, the ELP can be represented by
Codeword length : n = 2m − 1 σ (x) = 1 + σ1 x + σ2 x 2 . (3)
Number of information bits : k ≥ 2m − mt − 1 Notice that each coefficient of the ELP is an m-bit vector if
Minimum distance : d ≥ 2t + 1. the codeword is constructed on GF(2m ). Conventionally, the
This code is capable of correcting any combination of Berlekamp–Massey (BM) [24] algorithm is widely applied to
t or fewer errors in a block of n digits, called a t-error- compute the coefficients of the ELP.
correcting BCH code. Since the number of information bits is 3) Finding the Error Locations: After computing the coef-
not represented as the power of two, a shortened binary BCH ficients (σ1 and σ2 ), the Chien search is performed to find
code is applied in a memory system by eliminating information the roots of the ELP by substituting n elements of GF(2m ),
bits ( p), such as (n − p, k − p, d). {α 0 , α 1 , . . . , α n−1 } , into (3).
The RBER of the memory cell varies widely depending on 4) Correcting Errors: Through step 3, an error vector, e,
design goals such as memory density, read or write latency, is obtained, and a corrected codeword, v∗ , can be represented
and energy consumption. For emerging memories, RBERs as v∗ = v + e. This can be implemented using XOR gates.
of STT-MRAM, ReRAM, and PRAM are distributed with a
range of 10−10–10−3 [5]–[18], [19]–[21]. These RBER can be B. Fully Parallel BCH Decoders for Emerging Memories
reduced by appropriate device, circuit, and architecture design The long BCH code is already adopted in NAND flash
techniques [5], [6], [8], [20]. When it is lower than 10−5 , memories to correct tens of errors in thousands of data
the target block failure rate (BFR) can be achieved with an bits [25]–[27]. For long BCH codes, conventional iterative
ECC capable of correcting two errors [2]–[8]. If TED option is BCH decoding algorithms are applied, and the decoder is
added to DEC, the BFR can be improved further. Thus, DEC- usually implemented by the linear feedback shift register,
TED BCH code is adopted in this paper, and the following which takes 2n + 2t cycles to finish the error correction.
decoding processes are described based on the primitive binary However, this decoding algorithm is not compatible with
DET-TED BCH code [22], [23]. low latency emerging memories, so a fully parallel decod-
1) Computing Syndrome: For (n, k, 6) DEC-TED BCH ing architecture has been employed to achieve a decoding
code, the parity-check matrix H is given by latency of a few nanoseconds [13]–[17]. The fully parallel
⎡ ⎤  
1 1 1 ... 1 1 decoding architecture is fully parallelized and implemented
H = ⎣1 α α 2 ··· α n−1 ⎦ = H1 (1) using a combinatorial logic, which can significantly reduce the
1 α 3 α 6 · · · α 3(n−1) H3 decoding latency at the expense of hardware overhead. How-
where α is the primitive element in GF(2m ). ever, the hardware overhead caused by the fully parallelized
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHOI et al.: DECODER FOR SHORT BCH CODES WITH HIGH DECODING EFFICIENCY AND LOW POWER FOR EMERGING MEMORIES 3

operations [13]. The test circuit for checking whether σ̃ (α i )


is 0 requires a multiplication by a constant (α i ), and it can be
implemented by XOR-trees, as shown in Fig. 2(b) [31].
2) LUT-Based Decoder: In [17], an LUT-based decoder is
proposed by replacing the coefficients calculator and Chien
search blocks in the PA-based decoder with an LUT. The LUT
contains all the possible pairs of syndromes and their corre-
sponding error patterns. In this decoder, the error positions can
Fig. 1. Block diagrams of (a) PA-based decoder and (b) LUT-based decoder.
be determined directly from the syndromes after a syndrome
vector is computed.
3) Comparison of PA-Based and LUT-Based Decoders: In
the LUT-based decoder, the error vector can be directly deter-
mined immediately after the syndrome vector is computed.
Thus, the LUT-based decoder has a shorter decoding latency
at the cost of increased area overhead in comparison to the
PA-based decoder. However, as the number of correctable
errors (t) or the number of information bits (k) increases,
the table size exponentially increases, resulting in an inefficient
realization in both area and delay. In comparison to the LUT-
based decoder, the increased area of the PA-based decoder
with increased t or k is much smaller. Therefore, the PA-based
decoder is more appropriate for area-constrained applications.
In terms of dynamic power consumption, the PA-based
Fig. 2. (a) Syndrome generator and (b) Chien search blocks [31] for 64-bit decoder consumes more power than the LUT-based decoder.
codeword in the fully parallel DEC-TED BCH decoder.
In the PA-based decoder, the computed syndrome vectors
are continuously used in both the coefficient calculator and
implementation is not significant in emerging memories. This Chien search blocks until the error vector is determined.
is because a short BCH code [28]–[30] can be used owing Thus, whenever the syndrome vectors are newly computed
to a low required error-correcting capability and its relatively in response to a newly received codeword, most of the nodes
low size of memory array compared to the NAND flash in the blocks following the syndrome generator are toggled,
memories [2]–[8]. leading to high dynamic power consumption. On the other
In the fully parallel structure, the syndrome vector S can hand, only one circuit path in the LUT block is activated by the
be obtained by separate XOR trees with inputs taken from corresponding input syndrome vector due to the inherited LUT
the received code vector, as shown in Fig. 2(a) [31]. Accord- characteristic, leading to low dynamic power consumption.
ing to the decoding algorithm and implementation methods Therefore, the LUT-based decoder is favorable in power-
in determining ELP and finding the roots, fully parallel constrained applications.
BCH decoders can be divided into two categories, Peterson’s
algorithm (PA)- and lookup table (LUT)-based decoders.
III. P ROBLEMS IN C ONVENTIONAL F ULLY
1) Peterson’s Algorithm-Based Decoder: As an alternative
PARALLEL BCH D ECODERS
to the BM algorithms, PA was proposed in [32] to eliminate
the time-consuming iterations. By applying PA, the ELP of A. Decoding Efficiency Issue Based on Error Probability
DEC-TED BCH code is given by Although RBER for memories increases in nanotechnolo-

gies, non- or single-bit errors in a codeword are still much
S3
σ (x) = 1 + σ1 x + σ2 x = 1 + S1 x + S1 +
2 2
x 2 . (4) more likely than multibit errors (double- or triple-bit errors),
S1
as shown in Fig. 3. In the case of 10-ppm RBER, most code-
However, complex finite-field dividers are required to compute words (∼99.1%) will have non-errors; thus, error correction
the coefficients. Thus, a reverse ELP (RELP) was proposed in is required for only a few codewords (∼0.9%). Moreover,
[16] to alleviate the complexity of coefficient evaluation and in codewords having errors, 99.6% of them are corrected
computation during the Chien search, and it is expressed as using single error correction (SEC), and only 0.4% codewords

σ̃ (x) = σ̃0 + σ̃1 x + σ̃2 x 2 = S13 + S3 + S12 x + S1 x 2 . (5) require DEC. Thus, probabilistically, the DEC-TED decoder
corrects non- and single-bit errors mostly, and double-bit errors
The overall structure of a PA-based decoder is shown are corrected very infrequently.
in Fig. 1(a). A coefficient calculator determines the bit compo- In general, the decoder for DEC-TED BCH codes has longer
nents of the σ̃0 , σ̃1 , and σ̃2 vectors in (5), and each component latency, higher area complexity, and much higher power con-
of the coefficients is obtained by using the syndrome vector sumption than the decoder for SEC-DED codes. In Table II,
bit components with only modulo-2 addition and multiplica- the performance of the decoder for SEC-DED Hamming
tion [14]. In the Chien search block, the computations of σ̃ (α i ) code and fully parallel decoders for DEC-TED BCH code,
for 0 ≤ i ≤ n − 1 are conducted in parallel using simple logic synthesized with 65-nm technology, are obtained for 64 bits
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 4. Invalid transitions in the coefficient calculator and parallel Chien


search blocks due to the transitions on syndrome vectors in the case of
consecutive non-error input codewords.

Fig. 3. Probability depending on the RBER with various types of errors


(non-, single-bit, and double-bit errors) for 64 bits of codeword.
TABLE II
P ERFORMANCE C OMPARISON OF SEC-DED AND
F ULLY PARALLEL DEC-TED D ECODERS

of codeword when the latency constraints of SEC-DED and


DEC-TED decoders are 1.5 and 3 ns, respectively. It should be Fig. 5. Conceptual block diagram of DEC-TED BCH decoder with adaptive
noted that the PA-based decoder has about four times smaller error correction.
area than the LUT-based decoder, but it consumes more power
under the same latency constraint as discussed in Section II-B. factor in finding errors in both PA- and LUT-based decoders.
When the latency of the SEC-DED decoder is half that of the Whenever a new input is entered into the decoder, the syn-
DEC-TED decoder, the SEC-DED decoder requires only 40% drome generator produces invalid glitches before its outputs
(10%) area and consumes 14% (20%) power in comparison settle down. The glitches cause undesired transitions at internal
to the PA-based (LUT-based) DEC-TED decoder, as shown nodes in the blocks that follow the syndrome generator, such
in Table II. as the coefficient calculator, the parallel Chien search (LUT),
Thus, correction of all non-, single-, and double-bit errors and the error corrector in the PA-based (LUT-based) decoder,
with the DET-TED decoder is inefficient in terms of latency and increase dynamic power consumption.
and power consumption, and this reduces the decoding effi- The effect of invalid transition on power is severe, especially
ciency. If the proper decoder between SEC-DED and DEC- when consecutive non-error codewords are received. When
TED decoders can be adaptively selected depending on the a non-error codeword is entered into the decoder, most of
error conditions, decoding latency and power consumption can the key generated vectors, such as syndromes, coefficients of
be significantly reduced on average. RELP, and error vectors, are settled to 0. If the next non-error
codeword is immediately entered, then the syndrome vector
B. Dynamic Power Problem in Fully Parallel BCH Decoders toggles several times until it settles down to 0. This causes
Most of the previous studies on a fully parallel archi- invalid transitions in other blocks following the syndrome
tecture for the BCH decoder have focused on circuit opti- generator, as shown in Fig. 4. This invalid transition problem
mization methods to reduce the latency while minimizing can be prevented if stable syndrome vectors are delivered to
the complexity of implementation. However, considering that subsequent blocks. Implementation of stable syndrome vector
the read or write power of emerging memories is generally delivery will be discussed in detail in Section IV.
at the microwatt level, much higher power consumption in
conventional fully parallel decoders undermines the low-power IV. P ROPOSED H IGH -D ECODING -E FFICIENCY AND
advantage of emerging memories. L OW-P OWER DEC-TED BCH D ECODER
The high dynamic power consumption in the fully parallel In this section, a DEC-TED BCH decoder using an adaptive
BCH decoders can mainly be attributed to the syndrome vector error correction and an invalid transition inhibition technique
dependence in decoder blocks following the syndrome genera- is proposed to achieve the high decoding efficiency and
tor. As discussed in Section II-B, the syndrome vector is a key low-power consumption.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHOI et al.: DECODER FOR SHORT BCH CODES WITH HIGH DECODING EFFICIENCY AND LOW POWER FOR EMERGING MEMORIES 5

TABLE III TABLE IV


F LAG C ONDITION FOR THE N UMBER OF E RRORS C LASSIFICATION E STIMATED D ELAY AND P OWER OF THE PA-BASED D ECODER W ITH
THE A DAPTIVE E RROR C ORRECTION T ECHNIQUE A CCORDING TO THE
N UMBER OF E RRORS FOR 64 B ITS OF C ODEWORD

A. DEC-TED BCH Decoder With Adaptive Error Correction


Fig. 5 shows a conceptual block diagram of the proposed
TDEC (PDEC ), TMUX (PMUX ), Tcor (Pcor ), and TD (PD ) represent
adaptive error correction technique. After syndrome vectors
the delay (power consumption) of the syndrome genera-
are generated, the number of errors caused in the received
tor, error counter, SE corrector, DE corrector for PA-based
codeword is classified in an error counter block, and a 2-bit
decoder, MUX, error correction block, and total latency
flag signal that represents the number of errors is generated.
(power consumption) of the conventional PA-based DEC-TED
Then, different error correction algorithms are applied depend-
decoder, respectively. The total estimated average latency
ing on the generated flag signal to improve the decoding
and power for 64-bit data words can be calculated with the
efficiency, and a proper error vector is added to the received
probabilities of each error case (shown in Fig. 2). Using a 100-
codeword through the 3:1 MUX.
ppm RBER, the average latency (Taverage ) and power (Paverage )
The 2-bit flag signal can be generated based on the gener-
results are calculated as follows:
ated syndrome vectors, as shown in Table III. For odd numbers
of errors (single- or triple-bit errors), S0 is “1,” whereas Taverage ≈ Pr0 ∗ T0 + Pr1 ∗ T1 + Pr 2 ∗ T2 = 0.371TD (6)
for non-error and double-bit errors, S0 is “0.” Multiple-bit Paverage ≈ Pr0 ∗ P0 + Pr 1 ∗ P1 + Pr2 ∗ P2 = 0.341PD (7)
error (MBE), a logical OR of all the vector bit components
(σ̃0 [m − 1]|σ̃0 [m − 2]| · · · |σ̃0 [0]) is “0” in the case of non- where Pr0 (T0 ), Pr1 (T1 ), and Pr2 (T2 ) are the probabilities
and single-bit errors because of S13 + S3 = 0. For double- and (latencies) of non-, single-, and double-bit error cases, respec-
triple-bit errors, MBE is “1” due to the nonzero vector of σ̃0 . tively. These results show that the average latency and power
Based on the generated flag signal, we can choose between consumption are highly determined by those in non-error cases
the single-error (SE) corrector and the double-error (DE) cor- that most frequently occur. The total average latency of the
rector. In the proposed design, the SE corrector uses Hamming proposed decoder is only 37% that of the conventional PA-
SEC code and the DE corrector uses the DEC BCH code. based decoder, while it requires only 34% of the total average
Since error correction algorithms are not required regarding power consumed by the conventional PA-based decoder. Note
non- or triple-error cases (flag = “00” or “11”), all zero vectors that, when the adaptive error correction technique is applied
go directly to the MUX without being processed in most for the LUT-based decoder, the average latency is the same
delay and power consuming error correction blocks. Thus, as that of the PA-based decoder, and the average power
the latency and power consumption can be minimized for non- consumption is reduced to half that of the conventional LUT-
or triple-bit error cases. Since the most common non-error based DEC-TED decoder. Thus, we can conclude that the
case has minimum latency and power, the average decoding proposed DEC-TED decoder with adaptive error correction
latency and power consumption can be greatly reduced. When improves the decoding efficiency in terms of latency and power
a single-bit error occurs (flag = 01), the SE corrector, which consumption. The average power consumption can be reduced
compares each column of the H1 matrix with the S1 vector, more by solving the invalid transition problem. This will be
carries out single-bit error correction. Thus, when there is a discussed in Section IV-B in detail.
single-bit error in the received codeword, the proposed decoder To realize the adaptive error correction technique, additional
has similar latency and slightly larger power consumption in error counter, SE corrector, and MUX blocks are added to the
comparison to the conventional SEC-DED code decoder. In the conventional fully parallel DEC-TED decoder. Regarding the
case of double-bit errors (flag = 10), the DE corrector per- PA-based DEC-TED decoder, the coefficient calculator in the
forms error correction, and the latency and power consumption conventional decoder can be a part of the error counter because
are similar to those of conventional fully parallel DEC-TED it originally generates the σ̃0 vector. The error counter addi-
BCH decoders. tionally requires m-bit OR gate for the BCH code in GF(2m )
Thus, the delay and power consumption of the DEC-TED to evaluate the flag [1] value. Thus, the costs of area, delay,
BCH decoder with the adaptive error correction varies accord- and power in the error counter for the PA-based DEC-TED
ing to the types of errors in the codeword. Based on the syn- decoder with adaptive error correction are minor. In addition,
thesized results in Table I, Table IV summarizes the estimated the cost of area in an additional SE corrector is insignificant
delay and power consumption of the PA-based decoder, which because the area of the error location block in the SEC decoder
employs the proposed adaptive error correction technique for is much smaller than that of the coefficient calculator and
each error cases, where Tsynd (Psynd ), TEC (PEC ), TSEC (PSEC ), parallel Chien search blocks in the conventional DEC-TED
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

To make sure that both FFs transfer the settled syndrome


vectors, the control signals of both FFs should be acti-
vated after the syndrome vector and flag bits become stable.
To achieve this, a specific clock for the decoder (called the
ECC clock) is used to generate the control signal of the
FFs, as shown in Fig. 8. For positive-edge-triggered FFs,
an inverting ECC clock (FF clock in Fig. 8) is used, and
the pulsewidth of the ECC clock should be larger than the
summation of the worst delay of syndrome generator (Tsynd )
and that of error counter (TEC ). In addition, to prevent the
simultaneous operation of SEC and DEC-FFs, a clock-gating
technique is applied to the FF clock signal and flag bits using
simple INV and AND gates. Note that for non- and triple-error
bits cases, both FFs do not transfer the vectors to the following
Fig. 6. (a) Error counter and (b) SE corrector blocks in the fully parallel blocks.
DEC-TED BCH decoder for the 64-bit codeword.
The SEC-FFs convey the settled S1 vector to the SE
corrector only when a single-bit error occurs. DEC-FFs do
decoder. The hardware structures of the error counter and the
not transfer the syndrome vectors to the DE corrector; thus,
SE corrector blocks are shown in Fig. 6.
the power consumption is significantly reduced in the single-
For the LUT-based DEC-TED decoder, the area and power
bit error case. Similarly, when a double-bit error occurs, only
costs of the SE corrector are negligible. The pair of syndrome
DEC-FFs transfer the S1 and σ̃0 vectors (S1 and S3 vectors)
vectors and corresponding error patterns in the LUT for the
to the DE corrector in the PA-based (LUT-based) decoder.
conventional DEC-TED decoder can be divided into two parts.
One is for single-bit error cases, and the other is for double- C. Comparison With the Previous Works
bit error cases. Thus, each part can be replaced by the SE
As a way of improving the decoding efficiency, several ECC
corrector and the DE corrector in the proposed LUT-based
structures that utilize more than one error-correcting strength
decoder, respectively. Since the number of errors is already
have been well researched. These ECC structures for memories
classified in the error counter block, the S0 vector is no longer
can be categorized into two types based on the ECC selection
necessary in both SE and DE correctors in our proposed LUT-
mechanism. The first type is an “adaptive ECC based on RBER
based decoder. Also, especially for the SE corrector, only the
estimation,” and the other type is a “hierarchical ECC.” In
m-bits S1 vector is required to determine the corresponding
this paper, the adaptive ECC based on RBER estimation and
error vector. Thus, the sum of the total area of the SE and
hierarchical ECC are called “type 1 ECC” and “type 2 ECC,”
DE correctors is smaller than that of the LUT block in
respectively.
the conventional LUT-based decoder. Unlike the PA-based
In the case of the type 1 ECC, the ECC correction ability
decoder, increased area, delay, and power consumption due to
is determined based on the memory RBER estimation. If the
the additional error counter are inevitable. However, it would
estimated RBER increases, then the stronger error-correcting
be insignificant or compensated by the reduced area of the
algorithm is used [12], [33]–[37]. According to the target
LUT block owing to the smaller required size of the syndrome
memory, the parameters for RBER prediction are different.
vector. For the hardware implementation of the LUT-based
In the case of NAND flash, ECC types are usually determined
decoder, only the DE corrector block differs from the PA-
based on the number of program and erase (P/E) cycles and
based decoder. The DE corrector for the LUT-based decoder
retention time [33]–[36]. In [37], the reliability of SRAM
is implemented with AND gates similar to the SE corrector
is predicted by the threshold voltage (VTH ) variation. Then,
block.
based on the estimated reliability, ECC with appropriate error-
correcting ability is performed. For the STT-MRAM, the num-
B. Invalid Transition Inhibition Technique for ber of bits flipping from “0” to “1” in a write operation is used
DEC-TED Decoder to estimate the write failure rate [12]. Then, the code rate of
As mentioned in Section III-B, settled syndrome vectors SEC-DED is changed to reduce the write error rate, especially
should be transferred to the SE or DE corrector to prevent for writing “1” from “0.” In fact, type 1 ECC can be applied to
invalid transitions. Furthermore, the SE and DE correctors the memory that can predict RBER or some target error rate.
should not operate simultaneously in the proposed decoder That is why NAND flash is a good target memory for applying
to ensure lower power consumption. FFs are used between type 1 ECC because the memory controller counts the P/E
the syndrome generator and the SE and DE correctors to cycles and measures the retention time [33]–[35]. On the other
satisfy these two constraints. A block diagram of the proposed hand, for other memories such as SRAM and STT-MRAM,
DEC-TED decoder with adaptive error correction and invalid the additional RBER estimation block is required, which
transition inhibition techniques is shown in Fig. 7. Note that causes area and power overhead [12], [37]. In addition, in most
positive-edge-triggered FFs are used in this design. FFs con- papers, encoder and decoder must be implemented separately
nected to the SE corrector (DE corrector) are called SEC-FFs according to the error-correcting ability, which leads to the
(DEC-FFs) for easy representation. significant area overhead [12], [33]–[35], [37]. Moreover, type
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHOI et al.: DECODER FOR SHORT BCH CODES WITH HIGH DECODING EFFICIENCY AND LOW POWER FOR EMERGING MEMORIES 7

Fig. 7. Block diagram of the proposed high-decoding-efficiency and low-power DEC-TED decoder.

TABLE V
C OMPARISON OF THE C ONVENTIONAL PA-BASED AND LUT-BASED
D ECODERS AND THE P ROPOSED PA-BASED
AND LUT-B ASED R EALIZATIONS

Fig. 8. Timing requirement of ECC clock.

1 ECC cannot fundamentally prevent the situation that single-


bit errors are corrected by the multibit error decoder because
only one error-correcting algorithm is applied to each decoding
process.
Compared to type 1 ECC, the proposed ECC does not
require the additional RBER estimation block since the pro-
posed decoder can detect the number of errors in codeword
using the error counter. Thus, theoretically, the proposed ECC
can be applied to all the memory types. In addition, the same
encoder and syndrome generator are used regardless of the
error-correcting strength, minimizing area overhead. It can also
eliminate all cases of correcting single-bit errors with multibit In addition, most type 2 ECC requires separate SEC and DEC
error decoder, regardless of reliability. Therefore, the proposed encoder and decoder circuits.
ECC maximizes the decoding efficiency than type 1 ECC. Contrary to type 2 ECC, the proposed decoder can be imple-
For type 2 ECC, SEC-DED is first performed using Ham- mented with similar latency and power consumption com-
ming decoder to figure out whether the number of errors is 0,
pared to the conventional SEC-DED or DEC-TED decoders
1, or more than 1. In [39] and [40], they decode the code hier- in single-bit and double-bit error cases, respectively. This is
archically such that SEC-DED is always conducted first, and because the number of errors is detected before the actual
then, DEC is performed for correcting two errors whenever
error-correcting algorithm is applied, and an appropriate error-
double-bit errors are detected. Thus, for a non- or single-bit correcting algorithm is performed. Also, the decoding latencies
error case, the latency and power can be reduced in comparison for non- and triple-error cases are shorter than that of the SE
to using only the DEC-TED decoder. However, this decoder case because the error correcting algorithm is not performed
uses more time and power, especially for the double-bit error in the proposed decoder. Furthermore, only one of the error
cases, because two error correction processes (SEC and DEC) correction algorithms operates depending on the detected
are performed. On the other hand, in [38], to avoid the latency number of errors, thus eliminating the power overhead.
overhead in the double-bit error case, SEC and DEC decoders
are concurrently performed, and the output is determined based
on the detected number of errors. However, since both SEC V. S YNTHESIS R ESULTS AND C OMPARISON
and DEC decoders are simultaneously consuming the power This section presents the synthesis results for the proposed
regardless of the number of errors, the average power is DEC-TED BCH decoder. The proposed adaptive error correc-
highly increased compared to the conventional DEC decoders. tion and invalid transition inhibition techniques are applied
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 10. Area comparison of conventional decoders and the proposed


decoders.
Fig. 9. Average decoding latency for the proposed decoder and conventional
decoder according to the RBER.

to both conventional PA- and LUT-based (79, 64, 6) DEC-


TED BCH decoders synthesized using a 65-nm technology
cell library. Both proposed PA- and LUT-based decoders are
designed with the same delay constraint and are compared
in terms of area and power consumption. The delay for
the double-bit error case (maximum) is constrained as the
maximum delay for the conventional DEC-TED BCH decoder
(3 ns), and for the single-bit error case, the delay constraint is
half that for the double-bit error case. The synthesis results
Fig. 11. Area overhead of the proposed decoders according to the delay
for the conventional PA- and LUT-based DEC-TED BCH constraint of double-bit error.
decoders and the proposed PA- and LUT-based DEC-TED
decoders double-bit error case are summarized in Table V.
of the proposed decoders, the area overhead of the PA-based
decoder is much larger than that of the LUT-based decoder
A. Latency Comparison because the PA-based decoder is originally implemented with
Due to the adaptive-error-correction technique, the latency a much smaller area than the LUT-based decoder. However,
of the proposed decoder varies according to the error cases in the proposed PA-based decoder is still implemented with three
the received codeword. In the case of non- and triple- error times smaller area in comparison to the proposed LUT-based
cases, both SE and DE correctors do not operate because decoder.
conducting error correction algorithm is not required. Thus, For the proposed LUT-based decoder, the area of the DE
the non- and triple-bit error cases have the shortest delay corrector is reduced because of the reduced contents of the
(1.1 ns), which is almost one-third that of the conventional LUT as discussed in Section IV-A. The increased area due to
decoder. For the single-bit error case, the decoding latency is the added blocks can be compensated; thus, the total area of
only half that of the conventional decoder. the proposed decoder is very similar to that of the conventional
Fig. 9 shows the average decoding latency under various one.
RBERs. When the RBER is very small, the proposed decoder The area overhead of the proposed decoders can be reduced
takes only 1.1 ns to finish decoding on average, which is as the latency constraint for the double-bit error case of the
similar to the estimated latency in (6). As the RBER increases, decoder increases, as shown in Fig. 11. The area overhead
the latency increases, but it is still less than half the latency in the proposed PA-based decoder is reduced to 14% when
of the conventional decoder, even when the RBER increases the latency constraint increases to 4 ns. Especially for the
up to 0.01. In this regard, the proposed decoder significantly proposed LUT-based decoder, the area is smaller than that
improves the decoding efficiency in terms of delay. of the conventional one as the latency increases by more than
3 ns. This is because the area reduction of the LUT in the
proposed decoder is greater than the area increase due to the
B. Area Comparison added blocks.
Compared to the conventional PA-based decoder, the area
of the proposed PA-based decoder increases by 18%, as shown
in Fig. 10. As previously mentioned, the area overhead in the C. Power Comparison
error counter for the proposed PA-based decoder is almost neg- 1) Power Measurement Method: In our proposed decoder
ligible. The increased area in the proposed PA-based decoder is design, the signal path differs according to the number of
mainly due to the added blocks, such as the SE corrector, FFs, errors in the received codeword, leading to power consumption
and MUX. Although the increased area is the same for both variation. Thus, we evaluated the average power consumption
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHOI et al.: DECODER FOR SHORT BCH CODES WITH HIGH DECODING EFFICIENCY AND LOW POWER FOR EMERGING MEMORIES 9

Fig. 12. Power consumption comparison of the conventional decoders and the proposed decoders. (a) PA-based decoder. (b) LUT-based decoder.

for the specific input vectors in relation to the error cases.


Input codewords were entered into the decoder for every 10 ns
because we assumed that the decoder inputs are changed with
10-ns cycles based on the required read or write operation time
and ECC decoder operating time in most emerging memories
for real applications. Then, the average power was measured
during several thousands of cycles (20k cycles). Note that the
input codeword vectors were generated based on the generator
matrix of the BCH code.
To measure the power consumption of non-error cases,
successive input vectors that had no errors were used. Unlike
non-error cases, the probability of continuous error occurrence
is quite low. Thus, to measure single-, double-, and triple-bit
error cases, erroneous and non-error input vectors were applied
alternately to the decoder. This sequence is practical because Fig. 13. Average power consumption for the proposed PA- and LUT-based
decoders and conventional decoders according to the RBER.
it can maximize the transitions in operating blocks.
2) Power Comparison: Fig. 12 shows the power reduction
of the proposed PA-based and LUT-based decoders in rela- consumption in the added blocks, such as the error counter
tion to the number of errors. Similar to latency, the power in the LUT-based decoders, DE-FFs, and MUX, was not
consumption for non-error cases was the lowest because the significant. In the case of triple-bit error, even though the
most power consuming DE corrector and error correction operating blocks were the same as in the non-error case,
blocks did not operate. Moreover, the power consumption the power consumption was slightly higher than that of the
for the error correction block was greatly reduced (almost non-error cases because of nonzero syndrome vectors in the
three times lower) because invalid transitions are eliminated. error counter.
Thus, the power of the proposed PA- and LUT-based decoders Fig. 13 shows the average power consumption of the
was reduced by 75% and 70% compared to the conventional proposed and conventional PA- and LUT-based decoders
decoders, respectively. Note that the power reduction rate according to RBER. When the RBER was very small,
in the proposed LUT-based decoder was lower than that of the power consumption was reduced by 75% and 70% in the
the PA-based decoder in all error cases, due to the added PA- and LUT-based decoders, respectively. Since the invalid
power consumption in the error counter. For the single-bit transition issue is not considered in (7), the synthesis result
error case, the power consumption was very similar as that of the average power consumption was much lower than the
of the non-error case in both proposed decoders because value estimated by (7). Even though the RBER increased
the power consumption of additional blocks (SE-FFs and SE to 10 000 ppm, the power consumption of both proposed
corrector) was relatively small. Even with the double-bit error decoders was three times lower than that of the conventional
case, the total power consumption was greatly reduced. The decoders. In this regard, it is concluded that the proposed
significant reduction in power consumption occurred in the DE decoders operate with much lower power even though they
corrector because settled syndrome vectors were transferred, are implemented with fully parallel structures.
preventing invalid transitions. In addition, the power of the The power consumption in the two proposed decoders is
error correction block was also reduced due to the elimination similar because the power of the DE corrector and error cor-
of invalid transitions in the DE corrector. Finally, the power rection blocks in the PA-based decoder is significantly reduced
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

by the elimination of invalid transitions. Note that the proposed [9] B. Del Bel, J. Kim, C. H. Kim, and S. S. Sapatnekar, “Improving
LUT-based decoder consumes slightly less power than the STT-MRAM density through multibit error correction,” in Proc.
IEEE/ACM Conf. Design, Autom. Test Eur. (DATE), Mar. 2014, pp. 1–6.
proposed PA-based decoder because fewer changes occur in [10] Z. Pajouhi, X. Fong, and K. Roy, “Device/circuit/architecture co-design
the internal nodes by only one activated path corresponding of reliable STT-MRAM,” in Proc. IEEE/ACM Conf. Design, Autom. Test
to the input syndrome vector in the LUT. Based on the area Eur. (DATE), Mar. 2015, pp. 1437–1442.
[11] Y. Alkabani, Z. Koopmans, H. Xu, A. K. Jones, and R. Melhem, “Write
and power results of the proposed decoders, the proposed pulse scaling for energy efficient STT-MRAM,” in Proc. IEEE Comput.
PA-based decoder can be implemented with three times less Soc. Annu. Symp. VLSI (ISVLSI), Jul. 2016, pp. 248–253.
area and similar power consumption in comparison to the pro- [12] X. Wang, M. Mao, E. Eken, W. Wen, H. Li, and Y. Chen, “Sliding
posed LUT-based decoder. Therefore, the proposed PA-based basket: An adaptive ECC scheme for runtime write failure suppression
of STT-RAM cache,” in Proc. IEEE/ACM Conf. Design, Autom. Test
decoder is a more favorable option in any applications than Eur. (DATE), Mar. 2016, pp. 762–767.
the LUT-based decoder, unlike the conventional decoders. [13] X. Wang, D. Wu, C. Hu, L. Pan, and R. Zhou, “Embedded high-speed
BCH decoder for new-generation NOR flash memories,” in Proc. IEEE
Custom Integr. Circuits Conf. (CICC), Sep. 2009, pp. 195–198.
VI. C ONCLUSION [14] W. Xueqiang, P. Liyang, W. Dong, H. Chaohong, and Z. Runde,
“A high-speed two-cell BCH decoder for error correcting in MLC nor
This paper presented a (79, 64, 6) BCH DEC-TED decoder flash memories,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 56,
with high decoding efficiency and low power for emerging no. 11, pp. 865–869, Nov. 2009.
[15] Y. Yoo and I.-C. Park, “A search-less DEC BCH decoder for low-
memories. We proposed an adaptive error correction technique complexity fault-tolerant systems,” in Proc. IEEE Workshop Signal
that chooses a different decoding algorithm depending on Process. Syst. (SiPS), Oct. 2014, pp. 1–6.
the error conditions in a codeword, to improve the decoding [16] C.-C. Chu, Y.-M. Lin, C.-H. Yang, and H.-C. Chang, “A fully par-
allel BCH codec with double error correcting capability for NOR
efficiency regarding delay and power consumption. Also, flash applications,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal
to avoid the high dynamic power consumption problem in Process. (ICASSP), Mar. 2012, pp. 1605–1608.
conventional fully parallel BCH decoders, the invalid transition [17] R. Naseer and J. Draper, “Parallel double error correcting code design to
mitigate multi-bit upsets in SRAMs,” in Proc. Eur. Solid-State Circuits
inhibition technique was adopted by using FFs and a specific Conf. (ESSCIRC), Sep. 2008, pp. 222–225.
ECC clock. In comparison to the conventional PA- and LUT- [18] S. Lin and D. J. Costello, “BCH codes,” in Error Control Coding:
based DEC-TED decoders, the average decoding latency of Fundamentals and Applications, 2nd ed. Englewood Cliffs, NJ, USA:
the proposed decoders is less than half that of the conventional Prentice-Hall, 2004, pp. 141–177.
[19] D. H. Yoon, J. Chang, R. S. Schreiber, and N. P. Jouppi, “Practical
decoders with an RBER of 100 ppm. Due to the added blocks, nonvolatile multilevel-cell phase change memory,” in Proc. Int. Conf.
the area increases by 18% and 1.4% in the PA- and LUT- High Perform. Comput., Netw., Storage Anal., Nov. 2013, pp. 1–12.
based decoders, respectively. However, this overhead can be [20] B. L. Ji et al., “In-line-test of variability and bit-error-rate of HfOx -
based resistive memory,” in Proc. IEEE Int. Memory Workshop (IMW),
decreased with an increased latency constraint of the decoder. May 2015, pp. 1–4.
The proposed PA- and LUT-based decoders achieve 75% [21] S. Sills et al., “Challenges for high-density 16 Gb ReRAM with 27 nm
and 70% power reduction on average in comparison to the technology,” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, Jun. 2015,
conventional decoders, respectively. This paper provides a pp. 106–107.
[22] J. V. D. Horst and T. Berger, “Complete decoding of triple-error-
promising ECC decoder solution to achieve the target yield correcting binary BCH codes,” IEEE Trans. Inf. Theory, vol. 22, no. 2,
even with a high RBER, especially for high-performance and pp. 138–147, Mar. 1976.
low-power applications using emerging memories. [23] F. Gulliang, “An algebraic complete decoding for double-error-correcting
binary BCH codes,” J. Electron., China, vol. 1, no. 1, pp. 12–17, 1984.
[24] S. B. Wicker, Error Control Systems for Digital Communication and
Storage. Englewood Cliffs, NJ, USA: Prentice-Hall, 1995.
R EFERENCES
[25] X. Zhang and K. K. Parhi, “High-speed architectures for parallel long
[1] P. Amato, S. Bellini, M. Ferrari, C. Laurent, M. Sforzin, and BCH encoders,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
A. Tomasoni, “Fast decoding ECC for future memories,” IEEE J. Sel. vol. 13, no. 7, pp. 872–877, Jul. 2005.
Areas Commun., vol. 34, no. 9, pp. 2486–2497, Sep. 2016. [26] W. Liu, J. Rho, and W. Sung, “Low-power high-throughput BCH error
[2] S. H. Kang, “Embedded STT-MRAM for energy-efficient and cost- correction VLSI design for multi-level cell NAND flash memories,”
effective mobile systems,” in IEEE Symp. VLSI Circuits Dig. Tech. in Proc. IEEE Workshop Signal Process. Syst. Design Implement.,
Papers, Jun. 2014, pp. 36–37. Oct. 2006, pp. 303–308.
[3] H. Noguchi, K. Ikegami, N. Shimomura, T. Tetsufumi, J. Ito, and [27] J. Freudenberger and J. Spinner, “A configurable Bose–Chaudhuri–
S. Fujita, “Highly reliable and low-power nonvolatile cache memory Hocquenghem codec architecture for flash controller applications,”
with advanced perpendicular STT-MRAM for high-performance CPU,” J. Circuits, Syst. Comput., vol. 23, no. 2, p. 1450019, 2014.
in IEEE Symp. VLSI Circuits Dig. Tech. Papers, Jun. 2014, pp. 1–2. [28] A. Fahrner, H. Griesser, R. Klarer, and V. V. Zyablov, “Low-complexity
[4] P. Amato, C. Laurent, M. Sforzin, S. Bellini, M. Ferrari, and GEL codes for digital magnetic storage systems,” IEEE Trans. Magn.,
A. Tomasoni, “Ultra fast, two-bit ECC for emerging memories,” in Proc. vol. 40, no. 4, pp. 3093–3095, Jul. 2004.
6th IEEE Int. Memory Workshop (IMW), May 2014, pp. 79–82. [29] J. Spinner, M. Rajab, and J. Freudenberger, “Construction of high-rate
[5] Y. Emre, C. Yang, K. Sutaria, Y. Cao, and C. Chakrabarti, “Enhancing generalized concatenated codes for applications in non-volatile flash
the reliability of STT-RAM through circuit and system level techniques,” memories,” in Proc. IEEE 8th Int. Memory Workshop (IMW), May 2016,
in Proc. IEEE Workshop Signal Process. Syst., Oct. 2012, pp. 125–130. pp. 1–4.
[6] D. Niu, Y. Xiao, and Y. Xie, “Low power memristor-based ReRAM [30] I. Zhilin and A. Kreschuk, “Generalized concatenated code construc-
design with error correcting code,” in Proc. 17th Asia South Pacific tions with low overhead for optical channels and NAND-flash mem-
Design Autom. Conf., Jan./Feb. 2012, pp. 79–84. ory,” in Proc. 15th Int. Symp. Problems Redundancy Inf. Control
[7] M. Mao, Y. Cao, S. Yu, and C. Chakrabarti, “Optimizing latency, energy, Syst. (REDUNDANCY), Sep. 2016, pp. 177–180.
and reliability of 1T1R ReRAM through cross-layer techniques,” IEEE J. [31] D. Strukov, “The area and latency tradeoffs of binary bit-parallel BCH
Emerg. Sel. Topics Circuits Syst., vol. 6, no. 3, pp. 352–363, Sep. 2016. decoders for prospective nanoelectronic memories,” in Proc. Asilomar
[8] C. Yang, M. Mao, Y. Cao, and C. Chakrabarti, “Cost-effective design Conf. Signals, Syst. Comput., Oct./Nov. 2006, pp. 1183–1187.
solutions for enhancing pram reliability and performance,” IEEE Trans. [32] W. W. Peterson, Error-Correcting Codes. Cambridge, MA, USA:
Multi-Scale Comput. Syst., vol. 3, no. 1, pp. 1–11, Jan./Mar. 2017. MIT Press, 1972.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHOI et al.: DECODER FOR SHORT BCH CODES WITH HIGH DECODING EFFICIENCY AND LOW POWER FOR EMERGING MEMORIES 11

[33] S. Tanakamaru, Y. Yanagihara, and K. Takeuchi, “Error-prediction LDPC Jung Pill Kim received the B.S. degree in electronic
and error-recovery schemes for highly reliable solid-state drives (SSDs),” engineering from Hanyang University, Seoul, South
IEEE J. Solid-State Circuits, vol. 48, no. 11, pp. 2920–2933, Nov. 2013. Korea, in 1988, and the M.S. and Ph.D. degrees
[34] J. Xiao-Bo, T. Xue-Qing, and H. Wei-Pei, “Novel ECC structure and in computer science and electrical engineering
evaluation method for NAND flash memory,” in Proc. IEEE Int. Syst.- from Harvard University, Cambridge, MA, USA,
Chip Conf. (SOCC), Sep. 2015, pp. 100–104. in 2000 and 2003, respectively.
[35] L. Yuan, H. Liu, P. Jia, and Y. Yang, “Reliability-based ECC system From 1988 to 1998, he was with Hynix Semicon-
for adaptive protection of NAND flash memories,” in Proc. Int. Conf. ductor, Icheon, South Korea, where he was involved
Commun. Syst. Netw. Technol., Apr. 2015, pp. 897–902. in the development and research of many DRAM
[36] L. Yuan, H. Liu, P. Jia, and Y. Yang, “An adaptive ECC scheme for products from 1-MB DRAM to 64-MB EDO and
dynamic protection of NAND Flash memories,” in Proc. IEEE Int. Conf. SDR DRAM. From 2001 to 2008, he was a Design
Acoust., Speech, Signal Process. (ICASSP), Apr. 2015, pp. 1052–1055. Team Leader and a Principle Design Engineer with Qimonda, Research
[37] A. Basak, S. Paul, J. Park, J. Park, and S. Bhunia, “Reconfigurable ECC Triangle Park, NC, USA, focusing on high-density 1-GB SDR DRAM prod-
for adaptive protection of memory,” in Proc. IEEE Int. Midwest Symp. ucts, low-power mobile DRAM and LPDDR1 products, and high-performance
Circuits Syst. (MWSCAS), Aug. 2013, pp. 1085–1088. 1-GB GDDR3 products. Since 2008, he has been with Qualcomm Incorpo-
[38] P. P. Ankolekar, R. Isaac, and J. W. Bredow, “Multibit error-correction rated, San Diego, CA, USA, where he is involved in the development of
methods for latency-constrained flash memory systems,” IEEE Trans. STT-MRAM-related IPs and macros. He has authored or coauthored several
Device Mater. Rel., vol. 10, no. 1, pp. 33–39, Mar. 2010. technical papers and holds over 20 U.S. patents. His current research interests
[39] R. Ramaraju, R. Rock, E. J. Gieske, C. Park, and D. F. Greenberg, include low-power circuit design, high-speed memory design, and advanced
“Hierarchical error correction for large memories,” U.S. Patent 8 677 205 future memory design and technologies.
B2, Mar. 18, 2014.
[40] Z. Wang, “Hierarchical decoding of double error correcting codes for
high speed reliable memories,” in Proc. 50th ACM/EDAV/IEEE Design
Autom. Conf. (DAC), May/Jun. 2013, pp. 1–7.

Seung H. Kang received the B.S. and M.S. degrees


from Seoul National University, Seoul, South Korea,
in 1989 and 1991, respectively, and the Ph.D. degree
in materials science and engineering from the Uni-
Sara Choi was born in Gwangju, South Korea, versity of California at Berkeley, Berkeley, CA,
in 1990. She received the B.S. degree in electrical USA, in 1996.
and electronic engineering from Yonsei University, He was with the Lawrence Berkeley National Lab-
Seoul, South Korea, in 2013, where she is currently oratory, Berkeley, CA, USA, where he was involved
working toward the combined Ph.D. degree. in the fields of SQUID sensors and VLSI intercon-
Her current research interests include the periph- nects. From 1998 to 2005, he was a Distinguished
eral circuit and memory architecture design for Member of Technical Staff with the Lucent Tech-
STT-RAM. nologies Bell Laboratories, Murray Hill, NJ, USA, and led advanced device
reliability projects. In 2006, he joined Qualcomm Inc., San Diego, CA, USA,
and has pioneered embedded STT-MRAM and spintronic devices for mobile
systems. He is currently the Director of Engineering, Corporate Research and
Development, Qualcomm Inc., and leads an emerging memory technology
group for mobile systems, including Internet of Things and wearables. He
has authored or coauthored over 70 papers and delivered over 35 keynotes
and invited speeches at international conferences. He holds over 350 patents
granted globally.
Dr. Kang currently serves as an IEEE Electron Device Society Distinguished
Lecturer. He has served at numerous technical committees.
Hong Keun Ahn was born in Seoul, South Korea,
in 1993. He received the B.S. degree in electrical
and electronic engineering from Yonsei University,
Seoul, in 2017, where he is currently working toward
the combined Ph.D. degree.
His current research interests include the memory
architecture design for STT-RAM.
Seong-Ook Jung (M’00–SM’03) received the B.S.
and M.S. degrees in electrical and electronic engi-
neering from Yonsei University, Seoul, South Korea,
in 1987 and 1989, respectively, and the Ph.D. degree
in the electrical engineering from the University of
Illinois at Urbana–Champaign, Urbana, IL, USA, in
2002.
From 1989 to 1998, he was with Samsung Elec-
tronics Co., Ltd., Hwaseong, South Korea, where
he was involved in the specialty memories, such
Byung Kyu Song was born in Seoul, South Korea, as video RAM, graphic RAM, window RAM, and
in 1988. He received the B.S. degree in electrical and merged memory logic. From 2001 to 2003, he was with T-RAM Inc.,
electronic engineering from Dankook University, Mountain View, CA, USA, where he was the Leader of the Thyristor-Based
Yongin, South Korea, in 2013. He is currently work- Memory Circuit Design Team. From 2003 to 2006, he was with Qualcomm
ing toward the combined Ph.D. degree at Yonsei Inc., San Diego, CA, USA, where he focused on high-performance low-power
University, Seoul. embedded memories, process variation tolerant circuit design, and low-power
His current research interests include the bit- circuit techniques. Since 2006, he has been a Professor with Yonsei University.
cell structure and peripheral circuit design for His current research interests include process variation tolerant circuit design,
STT-RAM. low-power circuit design, mixed-mode circuit design, and future generation
memory and technology.
Dr. Jung is currently a Board Member of the IEEE SSCS Seoul Chapter.

You might also like