0% found this document useful (0 votes)

3 views9 pages

Computing-In-Memory Aware Model Adaption For Edge Devices

This paper presents a two-stage model adaptation process for Computing-in-Memory (CIM) architectures aimed at improving performance on edge devices. The first stage focuses on CIM-aware morphing to optimize model size and resource utilization, while the second stage implements quantization-aware training to mitigate errors from analog-to-digital conversion. The proposed method enhances CIM array utilization, achieves significant model compression, and maintains accuracy comparable to existing techniques.

Uploaded by

polagame

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views9 pages

Computing-In-Memory Aware Model Adaption For Edge Devices

Uploaded by

polagame

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

1

Computing-In-Memory Aware Model Adaption For

Edge Devices
Ming-Han Lin, and Tian-Sheuan Chang, Senior Member, IEEE

Abstract—Computing-in-Memory (CIM) macros have gained must be quantized by the ADC, causing quantization errors to
popularity for deep learning acceleration due to their highly accumulate and severely degrade model accuracy. A common
parallel computation and low power consumption. However, workaround is to severely restrict the number of concurrently
limited macro size and ADC precision introduce throughput
and accuracy bottlenecks. This paper proposes a two- activated wordlines to match ADC precision (e.g., activating
arXiv:2510.14379v1 [cs.AR] 16 Oct 2025

stage CIM-aware model adaptation process. The first stage only 16 wordlines for a 4-bit ADC). However, this drastically
compresses the model and reallocates resources based on underutilizes the available parallelism of the CIM array and
layer importance and macro size constraints, reducing model throttles performance.
weight loading latency while improving resource utilization and
maintaining accuracy. The second stage performs quantization- To overcome these obstacles, researchers have proposed
aware training, incorporating partial sum quantization and
ADC precision to mitigate quantization errors in inference. The
various model adaptation strategies. One line of work focuses
proposed approach enhances CIM array utilization to 90%, on CIM-aware model compression and architecture search.
enables concurrent activation of up to 256 word lines, and For instance, E-UPQ [1] enhances model sparsity through
achieves up to 93% compression, all while preserving accuracy pruning and mixed-precision quantization but suffers from
comparable to previous methods. low macro utilization. XPert [2] co-searches for the neural
Keywords : Computing-in-memory, AI accelerator, Pruning
architecture and peripheral circuits, but its rigid optimization
framework, Network architecture search, Quantize aware constraints can limit flexibility. Similarly, CIMNet [3] uses a
training device-aware accuracy predictor for neural architecture search
but overlooks the significant performance penalty caused by
weight reloading.
I. I NTRODUCTION
The proliferation of complex deep learning models has Another line of work targets mitigating ADC quantization
spurred the development of specialized hardware accelerators effects. These methods aim to increase the effective number
for edge devices, where power and latency are critical of bits (ENOB) by mapping the multiply-accumulate (MAC)
constraints. Computing-in-Memory (CIM) has emerged as a distribution to the ADC’s input range. Approaches include
highly promising architecture, offering massive parallelism optimizing quantization ranges based on MAC statistics [4],
and reduced data movement by performing computations using input-conditioned subrange reduction techniques [5], or
directly within the memory array. However, the practical learning analog scaling factors [6], [7]. While effective, these
deployment of CIM is hindered by two fundamental and methods often do not account for the large number of partial
interconnected challenges rooted in its physical limitations. sums generated when many wordlines are activated in parallel,
First, Hardware Mapping and Throughput Bottlenecks arise or are designed for smaller CIM macros [6].
from the constrained physical size of CIM macros. Modern
deep neural networks are often too large to be stored entirely The existing literature reveals a critical gap: a holistic
on-chip, necessitating that model weights be repeatedly approach that simultaneously optimizes the model architecture
loaded from off-chip memory. This frequent reloading incurs for dense mapping onto the CIM array while also making
significant latency and energy overhead, negating many of the model inherently robust to the partial sum quantization
CIM’s intrinsic benefits. errors that arise from maximizing parallelism. To bridge this
Second, Computational Fidelity and Accuracy Degradation gap, this paper proposes a tailored model adaptation method
are direct consequences of the precision-limited analog-to- that adjusts the model architecture and recalibrates weights
digital converters (ADCs) inherent to CIM design. When to mitigate quantization errors. Our approach reallocates
convolutions are segmented due to hardware size limits, limited resources, such as bitlines per convolutional layer, to
multiple analog partial sums are generated. Each of these sums enhance efficiency while maintaining or improving accuracy.
We implement a two-stage quantization-aware training process
This work was supported by the National Science and Technology Council, that quantizes both weights and partial sums, simulating CIM
Taiwan, under Grant 111-2622-8-A49-018-SB, 110-2221-E-A49-148-MY3,
113-2221-E-A49-078-MY3, and 113-2640-E-A49-005.. The authors are behavior and reducing the impact of quantization on model
affiliated with the Institute of Electronics, National Yang Ming Chiao Tung accuracy.
University, Taiwan. (e-mail: [email protected], [email protected]).
cited as: M.-H. Lin and T. S. Chang, ”Computing-in-memory aware model The rest of the paper is organized as follows: Section
adaption for edge devices”, to be published in IEEE Transactions on Circuits
and Systems for Artificial Intelligence, 2026. II details the proposed methods, Section III presents the
Manuscript received XXXX XX, 2025; revised XXXX XX, XXXX. experimental results, and Section IV concludes the paper.
2

II. P ROPOSED CIM-AWARE M ODEL A DAPTION In Fig. 2, 64 5-bit partial sums are accumulated using an
adder tree and then multiplied by a scaling factor. Since the 64
A. The Target Multibit CIM Architecture ADCs are not used simultaneously, a multiplexer at each ADC
output selects the appropriate ADC for accumulation. The final
scaling factor combines both the weight scaling factor and
the ADC step size, addressing the need to reverse the effects
of scaling. This is necessary because the weights, initially in
decimal form, are quantized into 4-bit integers, and the partial
sums from the ADC also undergo scaling during conversion.

Fig. 1. 4-bit CIM macro architecture

Fig. 1 illustrates the configuration of the CIM macro used Fig. 3. Mapping convolution weights into a CIM macro
in this paper. The workflow involves the following steps: a
line buffer transfers 4-bit input data to a Digital-to-Analog Fig. 3 illustrates the weight mapping for convolution. Due
Converter (DAC), converting it into an analog signal that to the limited number of wordlines in the memory array, the
enters the CIM weight array’s wordlines. Each weight cell multiply-accumulate operation cannot be completed in a single
multiplies the input data, and the products are accumulated pass. Instead, the convolution kernel is divided into multiple
in each bitline. A multiplexer selects the processed signals, parts based on the number of wordlines, processed in batches,
which are then converted into 5-bit digital partial sums by an and accumulated for the final result. For instance, with 256
ADC. wordlines and a 3x3 filter size, one bitline can handle up to
In terms of precision, each weight cell uses 4 bits, 28 input channels, necessitating that any excess data be placed
with parallel inputs converted to voltage by the DAC. The in the next bitline.
ADC then transforms the analog signal into a 5-bit digital In the example, three filters are split into two parts, indicated
format. This system requires only one ADC conversion for by different colors, and stored in separate bitlines. The DAC
the multiply-accumulate operation, reducing the number of inputs to the CIM macro include the orange section of the
conversions by a factor of 16 compared to a bit-by-bit method, feature map, representing the first half of the input channels,
which helps minimize quantization errors, especially in the which perform dot products with the corresponding darker
most significant bits (MSB). sections of the filters. Consequently, only outputs from three
bitlines are valid at this stage, while the remaining data will
The CIM array consists of 256 wordlines and 256 bitlines,
be processed subsequently.
along with 64 ADCs. Each weight cell stores 4 bits of
data. The bitlines include positive (PBL) and negative (NBL)
lines. The multiplexer selects different bitlines, and the ADCs B. Overall Two-Stage Model Adaption Flow for CIM
operate in rotation to convert the analog signals into digital
sums.

Fig. 4. Model adaption flow for CIM

Fig. 4 outlines the overall model adaptation flow, consisting

Fig. 2. The digital circuits that assist our CIM macro of two stages: CIM Aware Morphing to align models with
3

macro size, and ADC Aware Learned Scaling to scale weights Eq. 1, where λ is a hyper-parameter that controls the weight of
based on quantization precision of both weights and ADC. the regularization term, and θ represents the model parameters.
CIM Aware Morphing adapts MorphNet [8] for CIM by
adjusting channel numbers to fit macro size constraints like Loss(θ) = LCE (θ) + λF (θ) (1)
numbers of bitlines and wordlines instead of the model size
or FLOPs in the original MorphNet. This iterative adjustment, To minimize redundancy, a regularization term related to
typically converging in about three iterations, ensures that the parameter count is designed as in MorphNet [8] to identify
model meets accuracy and resource requirements. redundant parameters (see Eq. 2). The convolution filter
After roughly determining the model’s shape and size, the dimensions are denoted as x and y. Filter importance is
next step involves quantizing the weights and partial sums determined by the γ of the BN layer, with small γ values
according to the CIM weight cell’s bit width, ADC precision, being zeroed out to prune unimportant filters. After pruning,
and ADC step size. ADC Aware Learned Scaling focuses on the remaining input and output channels, denoted as AL and
quantization-aware training in two steps: BL , correspond to the number of non-zero weights in the
preceding and subsequent BN layers. The pruned parameter
• Quantization-aware training for the weights, including
count is then calculated by multiplying AL and BL with x
training the quantization step size to minimize quanti-
and y. Here, IL and OL represent the number of input and
zation errors of weights, and
output channels for convolution layer L, while γL−1 and γL
• Quantization-aware training for partial sums.
denote the BN weights before and after the convolutional layer
With these processing, the final model not only benefits L, respectively.
from reduced redundancy through model morphing, which
eliminates unnecessary filters and computations, but also OL
X IL
X
addresses quantization errors through quantization-aware F (layerL) = x × y × (AL |γL , i| + BL |γL−1 , j|) (2)
training, mitigating any significant accuracy drops caused by i=1 j=1

weight and partial sum quantization.

To address CIM macro size constraints and identify
redundancy, we use parameter count as a regularization term
C. Stage 1: CIM Aware Morphing when adjusting channels. This approach targets deeper layers,
which typically contain more redundant parameters, helping
CIM Aware Morphing, based on MorphNet [8], adapts to maintain model accuracy during compression.
the number of channels in convolutional layers to account
For the ”Expanding Phase”, it is not possible to derive the
for the constraints of wordline and bitline quantities in CIM
expansion ratio for CIM macros directly using an equation
macros by iteratively shrinking and expanding layers within a
as with parameter expansion ratios due to the array-based
predefined architecture. In the shrinking phase, it prunes each
structure of CIM macros. Therefore, we first list the constraint
layer based on sparsity, varying the pruning ratio across layers.
equations for the model’s expansion ratio in the CIM macro
During the expansion phase, layers are proportionally scaled
as follows:
up according to predefined constraints, focusing on reducing
computational complexity or parameter count. This targeted 3 × kernel size2
⌈ ⌉ × round(C1 × R) (3)
approach can efficiently optimize network structures without wordlines
extensive architectural redesign or architectural search. n−1
X round(Ci × R)
+ [⌈ ⌉ × round(Ci+1 × R)] ≤ targetbl
i=1
channelsper bl
(4)

wordlines
channelsper bl =⌊ ⌋ (5)
kernel size2
Where R is the desired expansion ratio, n is the total number
of convolutional layers, Ci is the number of output channels
in the i-th convolutional layer, and channelsper bl represents
the maximum number of input channels that a single bitline
can accommodate.
Since solving the above inequality is very complex, we
use exhaustive search here. By incrementing the ratio from
1 by 0.001 until the condition is no longer satisfied, we
Fig. 5. Model morphing flow can find the desired expansion ratio. Additionally, only one
exhaustive search is needed per morphing process, making the
The details of the method are described below. In the search very efficient. Note that the expansion ratio is applied
”Shrinking Stage” of a deep learning network, the loss function proportionally across all layers, not a separate ratio for each
for channel pruning consists of two parts: the cross-entropy layer. This makes the optimization a simple one-dimensional
loss LCE (θ) and the regularization term λF (θ), as shown in search for a single scalar value.
4

D. Stage 2: ADC Aware Learned Scaling being quantized. For instance, if quantizing to n bits, then
Based on the above model adjustment, the next steps involve QN = QP = 2n−1 − 1. This process allows the quantization
two rounds of quantization-aware training as shown in Fig. 6. error to be reflected in floating-point representation.
First, we combine convolutional and BN weights and quantize W
them to 4 bits to fit within a 4-bit weight macro. Second, output = [round(clip( , −QN , QP ))] ∗ Input × SW (6)
SW
partial sum quantization is applied to obtain the final quantized
For our target macro, to produce 4-bit weights, the weights
model.
are first divided by SW for scaling (where SW is the weight
quantization step size, typically less than 1). Then, based on
the maximum and minimum values of the stored weight, the
weights are clipped and rounded to obtain 4-bit weights that
can be stored in the CIM macro. After performing convolution
in the CIM macro with the 4-bit quantized weights, the output
is multiplied by SW to scale it back down.
During the Phase-1 training, we optimize the BN and
Fig. 6. Quantization type for models mapped to the CIM macro convolution weights, along with the quantization step size SW .
The goal is to complete BN weight folding and quantize the
A convolution layer undergoes three types of quantization: weights, as detailed in Fig. 8.
1) Weight Quantization: Here, BN weights and convolu- In the backward pass, gradient computation bypasses scaling
tional weights from the morphed model are combined and non-differentiable rounding to maintain stability. The
and quantized to 4 bits according to the precision of the straight-through estimator (STE) is applied during rounding
weight cells in the CIM macro. skips: gradients exceeding the clipping range are set to
2) Partial Sum Quantization: The partial sums are zero, while those within the range pass through unchanged.
quantized to 5 bits based on the precision of the given Additionally, since amplified weights and output gradients are
ADC. used to compute input gradients, these input gradients are
3) Activation Quantization: This is included in the inversely scaled down according to the weight amplification.
original seed model and will be quantized to 4 bits based
on the DAC precision.

Fig. 8. Forward and backward data flow weight quantization

Fig. 7. Forwarding flow Phase1 training

1) Phase-1: Weight Quantization Training: Fig. 7 illustrates

the Phase-1 weight quantization process for the model. During
the forward computation of the model training, we reduce the
number of parameters by combining the BN parameters with
the convolutional kernel weights. These combined weights
are then scaled by dividing them by the corresponding
weight quantization step size, followed by clipping and
rounding based on the weight bit-width. After performing the
convolution with quantized activations, the results are scaled Fig. 9. Partial Sum Formation
back by multiplying with the scaling factors.
In the above process, the step size of weight quantizationis 2) Phase-2: Partial Sum Quantization Training: Due to
learned by the LSQ method [9]. The weight quantization limited wordlines, larger convolutions must be processed in
equation is presented in Eq. 6. Here, W represents the weight, segments, leading to accumulated ADC quantization errors
SW is the weight quantization step size, and −QN and with each partial sum. To mitigate this, we incorporate partial
QP represent the minimum and maximum clipping values, sum quantization during Phase-2 training to simulate ADC
respectively. These values are related to the number of bits behavior, which helps the model adapt to the quantization
5

process. For example, as shown in Fig. 9, with 256 wordlines, Compared to the Phase 1 training, the Phase 2 includes
a 3x3 kernel can accommodate up to 28 input channels per scaling the partial sums according to the ADC step size,
bitline, requiring additional channels to be assigned to another followed by rounding and summing. Finally, the scaling effect
bitline. Therefore, for a feature map and filter with 56 input of the ADC step size is inversely scaled back at the output.
channels, we divide them into two groups—denoted with blue In the backward pass, the gradient computation similarly
and purple in the figure. The blue feature map convolves with skips all scaling and non-differentiable rounding operations to
the blue filters, while the purple feature map convolves with ensure that the gradients do not experience sudden scaling up
the purple filters, resulting in two partial sums that can be or down, thus maintaining stability.
added point by point to obtain the final result.

Fig. 11. Forward and backward data flow partial sum quantization

Finally, the trained 4-bit weights can be directly used in the

CIM macro for convolution operations with 4-bit inputs. After
each convolution, the output only needs to be scaled by the
product of the weight step size SW and ADC step size SADC .
For further simplification, this product can be approximated
as a power of two, allowing the output to be adjusted with a
simple digital shift operation.
Fig. 10. Forwarding flow of Phase2 training
III. E XPERIMENTAL R ESULTS
Fig. 10 illustrates the forwarding flow of the Phase-2 A. Experimental Setup
training. Compared to the Phase-1, the Phase-2 includes
The experimental settings for our model training are shown
additional steps for the segmented convolution, quantization
below. We adopt the ADAM optimizer for all trainings. The
of partial sums, and summation of partial sums. The model
seed models used in model morphing are trained with the
output from the Phase-1 training serves as the baseline model
learning rate at 0.01 over 2000 epochs. The CIM aware
for the Phase-2 training.
morphing phase uses the learning rate at 0.05 over 100 epochs
Since the Phase 2 training involves the quantization of for the shrinking stage, and the learning rate at 0.01 over 100
partial sums, even minor variations in SW can directly affect epochs for the following fine-tuning stage, respectively. The
the size of the 4-bit quantized weights if SW is not fixed. ADC aware learning scaling adopts the learning rate at 0.001
This, in turn, can cause significant fluctuations in the partial with 100 epochs at the phase-1, and the learning rate at 0.01
sums, hindering model convergence. Therefore, in the Phase over 300 epochs at the phase-2, respectively.
2 training, SW is fixed, and the BN and convolution weights
are trained to adapt to the partial sum quantization.
By slightly modifying Eq. 6, we obtain the partial sum B. Analysis of Parameter Selection for the Model Morphing
quantization formula, as shown in Eq. 7. This formula The CIM aware model morphing has shown how to morph
primarily incorporates the ADC step size and sets the the model under the macro constraints. However, how to select
maximum and minimum clipping values according to the ADC the ratio of compression and expansion is crucial for model
precision, represented as −QNADC and QPADC . performance and hardware utilization of the CIM macro.
As an example to show the effect of compression ratio,
Table I shows the accuracy of models with different
Qw · Input compression ratios after being expanded to the same parameter
output = round clip ,
SADC count and fine-tuned. The baseline model has 9.218M
−QNADC , QPADC )) · SW · SADC (7) parameters and an accuracy of 90.71%. The target for
expansion is set at 50% of the baseline parameters, totaling
W 4.609M. This table shows that excessive compression (e.g.
Qw = [round(clip( , −QN , QP ))] (8) pruning ratio > 0.9) will decrease performance due to a loss
SW
of important features. However, insufficient compression (e.g.
During the Phase-2 training process, only the BN and pruning ratio < 0.1) limits the effectiveness of expansion and
convolution weights are trained. The main goal is to adapt thus decreases performance as well. In addition to performance
the weights to the quantization of partial sums. The detailed concerns, these ratios also lead to different macro usage due
forward and backward methods are shown in Fig. 11. to macro constraints.
6

TABLE I focusing on wordline and bitline limitations, as well as

M ODEL COMPRESSION LIMIT quantization restrictions for weight cells and ADCs. These
Parameters Parameters Accuracy tables display accuracy based on CIFAR-10 test performance,
(Pruned) (Expanded) where BLs denotes the number of bitlines in the CIM
0.429M 4.611M 87.66% macro architecture (256 wordlines), and MACs represents
0.501M 4.607M 88.94%
0.691M 4.608M 89.70% the multiply-accumulate operations required for inference
1.014M 4.605M 90.70% (equivalent to ADC activations). The baseline model features
1.262M 4.609M 90.90% 4-bit quantized activations and was trained on CIFAR-10 for
1.993M 4.609M 90.90%
2000 epochs. Four models are created under varying bitline
2.445M 4.604M 90.70%
2.848M 4.610M 90.76% constraints, each undergoing three morphing rounds: a 150-
3.791M 4.607M 90.62% epoch compression phase and a 300-epoch fine-tuning phase,
4.049M 4.610M 90.32% both utilizing the ADAM optimizer (with learning rates of 0.05
and 0.01, respectively).
Table II shows the accuracy differences after expansion and In the tables, Morphed Model Accuracy indicates the
fine-tuning for two models with varying macro utilization rates model’s accuracy after compression. The Phase-1 Training
by a grid search on the parameters of the model morphing flow. shows accuracy after batch normalization (BN) folding
In this table, the top two rows are the best and worst macro and 4-bit weight quantization, while the Phase2 Training
usage when λ = 5E − 8. The bottowm two rows are the reflects accuracy after further 5-bit partial quantization. The
best and worst macro usage when λ = 3E − 8. The baseline partial sum storage and latency reduction presents model
model has 9.218M parameters and an accuracy of 90.71%. weights allocated in a CIM macro with 256 bitlines and
The target for model expansion is set at 8192 bitlines and 256 wordlines, each featuring 4-bit weight cells. Due to limited
wordlines, using the ADAM optimizer for both compression wordlines, 5-bit partial sums are generated, necessitating
and fine-tuning. During the 150-epoch compression phase, the additional storage, with Partial Sum Storage indicating the
learning rate is 0.05, and λ is gradually increased from 0 over maximum space required for these sums. Loading Weight
the first 100 epochs before being fixed for the last 50 epochs. Latency estimates the clock cycles needed to load weights;
Compressed models with the highest and lowest macro usage a CIM macro would require 256 cycles for this process.
are compared. After expansion, models are fine-tuned for 300 Lastly, Computing Latency denotes the clock cycles required
epochs at a learning rate of 0.01. for model inference. Convolution filters are divided into
smaller chunks that convolve with input channels sequentially,
TABLE II necessitating multiple passes through the wordlines. With
R ESULT OF DIFFERENT CIM MACRO USAGE MODEL FOR THE VGG-9
MODEL ON CIFAR-10.
only 64 ADCs available (4 bitlines per ADC), exceeding
64 simultaneous computations requires additional passes. The
Parameters Parameters Macro Usage Accuracy table provides the clock cycles needed for a CIM macro to
(Pruned) (Expanded)
1.154M 1.960M 93.46% 91.16% perform model inference.
1.203M 1.867M 88.53% 90.97% 2) Results: Tables III to V present the results for
1.255M 1.929M 92.00% 91.01% VGG9/VGG16/ResNet18 after model morphing and weight
1.413M 1.833M 87.41% 90.88%
adaptation, respectively. VGG9 comprises 8 convolutional
layers and 1 fully connected layer. VGG16 features 13
Table I shows that model performance declines when the convolutional layers and 1 fully connected layer. ResNet18
compression ratio falls below a certain threshold, e.g. 0.1 in has 17 convolutional layers and 1 fully connected layer. For
Table I. Therefore, below this threshold, it’s crucial to select a simplicity, only the convolutional layers are accelerated by the
model that retains feature representation rather than focusing CIM macros.
solely on CIM macro utilization after expansion. Thus, if the The model morphing results indicate that for models uti-
target macro size is less than 0.1 times the baseline model’s lizing over 4096 bitlines (aka. more parameters), reallocating
parameter count, it’s better to choose a model with higher resources improves accuracy (91.33% and 91.07% in VGG9,
accuracy during compression. In contrast, if the target macro 92.98% and 92.66% in VGG16, and 92.17% in ResNet18)
size exceeds 0.1 times the baseline count, there’s less risk of compared to the baseline, even with fewer bitlines and to-
losing feature representation, making it acceptable to select a tal MAC operations. This enhancement stems from pruning
model with higher CIM macro utilization. This strategy can redundant filters and reallocating excess bitline resources to
help achieve higher accuracy through resource reallocation. critical convolutional layers, resulting in more meaningful and
efficient weight storage and operations within the CIM macro.
C. End-to-End Performance The proposed morphing can also achieve high macro usage,
This subsection present the main results for latency, up to 94.54%, with small accuracy loss due to the CIM aware
accuracy, and compression across different models. constraints. The macro usage for ResNet18 is lower compared
1) Settings: To show the effectiveness of the proposed to the VGG models due to the higher number of convolutional
approach, the model adaption have been applied to different layers. Consequently, with a bitline limit of 4096, the model’s
models, VGG9, VGG16, and ResNet18, as shown in Tables accuracy is slightly declined. When the limit is reduced to 512,
III to V, tailored to the constraints of four CIM macros, accuracy decreases further to just 25% macro usage, resulting
7

in lower accuracy than the VGG models. Additionally, as the

number of parameters is decreased, quantization significantly
impacts accuracy, causing an extra 3.75% drop when the
bitline limit is 512.
The proposed quantization (P1 train and P2 train in the
tables) can achieve low accuracy loss for the bitline constraints
over 4096. The quantization loss will be increased for smaller
bitline constraints, which are reasonable since small model
size has low tolerance to quantization effects. The tables also
show that the proposed partial sum quantization (P2 train) has
introduced negligible loss compared to the weight quantization
(P1 train).
In the tables, the partial sum storage are reduced due to
model morphing except one case. With a bitline constraint
of 8192, partial sum storage for VGG16 is increasesd.
This occurs because the additional bitlines from pruning are
allocated to earlier layers, which are critical for accuracy.
These layers require more partial sum storage as their feature
maps have not undergone significant pooling.
The computing latency for all cases is reduced (26% to 86%
for VGG9, 30% to 89%,3% to 81% for ResNet18), which is
proportional to the reduction of the MACs due to the model
morphing. The latency to reload weight due to the limited
macro size is also reduced (79% to 99% for VGG9, 87% to Fig. 13. Mapping convolution weights into a CIM macro (model: VGG9, BL
99% for VGG16 and 82% to 99% for ResNet18), which has constraint: 1024)
higher reduction ratio than that in the computing latency due
to the CIM constraint. These ratios are proportional to the
reduction of the parameters and used BLs. ADCs, averaging 4.0 and 5.4 bits, respectively, with weights
Fig. 12 and 13 illustrate the mapping of the VGG9 model, fixed at 8 bits. It activates 64 wordlines simultaneously,
morphed under bitline constraints of 512 and 1024, onto a reducing parameters by 68.41% with 92.46% accuracy.
256x256 CIM macro. Different colors in the figures represent Compared to the previous approaches, our method begins
different convolutional layers. with 4-bit quantized activations and floating-point weights,
achieving over 90% compression through morphing and
quantizing while maintaining comparable accuracy. This
approach outperforms other methods in three aspects:
1) Parallelism: By using 4-bit parallel input and activating
256 wordlines simultaneously, our method leverages
ADC-aware training to handle higher quantization errors
from concurrent operations. This achieves up to 64x
speedup compared to E-UPQ and 16x compared to
XPert.
2) CIM Macro Utilization: Our method achieves nearly
90% utilization in VGG9 and VGG16, and 78.77% in
ResNet18 with a 4096-bitline constraint, compared to
just 13% in E-UPQ. This is due to directly pruning
Fig. 12. Mapping convolution weights into a CIM macro (model: VGG9, BL inefficient weights instead of storing them, making more
constraint: 512)
efficient use of CIM macro space.
3) Compression Rate: Through pruning and resource re-
allocation, our method improves accuracy and com-
D. Comparisons with Other Approaches pensates for quantization-induced losses, achieving over
Table VI compares three model adaptation methods using 90% model compression.
a model with a 4096-bitline constraint. E-UPQ [1] employs
mixed precision (8, 4, 2, 1, 0) for weights, resulting in an
average precision around 1 due to extensive pruning. It uses IV. C ONCLUSION
a 16x16 operation unit (OU), activating 16 wordlines at a CIM brings the benefits of highly parallel computation
time, and achieves about 87% weight reduction. XPert [2] uses and low power consumption but suffers from throughput
full floating-point operations in its baseline model, while its and performance bottlenecks due to extra weight loading for
compressed model adopts mixed precision for activations and limited memory array size and ADC quantization errors for
8

TABLE III
C OMPREHENSIVE R ESULTS FOR VGG9 WITH D IFFERENT BL C ONSTRAINTS

BL Param BLs MACs Macro Morphed Model P1 P2 Partial sum Load Weight Computing
Constraint (M) Usage Acc. Train Train Storage Latency Latency
Baseline 9.218 38592 724992 - 90.71% - - 163840 38656 14696
8192 1.971 (-79%) 8186 (-79%) 489248 (-33%) 93.98% 91.33% (+0.62%) 90.01% 89.83% 133056 (-19%) 8192 (-79%) 10928 (-26%)
4096 0.924 (-90%) 3907 (-90%) 358888 (-50%) 88.12% 91.07% (+0.36%) 89.77% 89.17% 107520 (-34%) 4096 (-89%) 9116 (-38%)
1024 0.210 (-98%) 1024 (-97%) 123792 (-83%) 80.11% 89.24% (-1.47%) 87.58% 87.39% 41984 (-74%) 1024 (-97%) 3020 (-80%)
512 0.098 (-99%) 511 (-99%) 85756 (-88%) 74.77% 87.71% (-3.00%) 85.47% 85.40% 39936 (-76%) 512 (-99%) 2108 (-86%)

TABLE IV
C OMPREHENSIVE R ESULTS FOR VGG16 WITH D IFFERENT BL C ONSTRAINTS

BL Param BLs MACs Macro Morphed Model P1 P2 Partial sum Load Weight Computing
Constraint (M) Usage Acc. Train Train Storage Latency Latency
Baseline 14.710 61440 1443840 - 92.02% - - 196608 61440 31300
8192 1.983 (-87%) 8148 (-87%) 986784 (-32%) 94.54% 92.98% (+0.96%) 92.73% 92.25% 245760 (+25%) 8192 (-87%) 21996 (-30%)
4096 0.952 (-94%) 3963 (-94%) 622032 (-57%) 90.83% 92.66% (+0.64%) 92.49% 91.88% 174080 (-11%) 4096 (-93%) 16192 (-48%)
1024 0.203 (-99%) 1021 (-98%) 259420 (-82%) 77.58% 89.96% (-2.06%) 88.66% 88.55% 106496 (-46%) 1024 (-98%) 6028 (-81%)
512 0.088 (-99%) 510 (-99%) 117408 (-92%) 67.07% 86.45% (-5.57%) 83.03% 84.50% 35840 (-82%) 512 (-99%) 3532 (-89%)

TABLE V
C OMPREHENSIVE R ESULTS FOR R ES N ET 18 WITH D IFFERENT BL C ONSTRAINTS

BL Param BLs MACs Macro Morphed Model P1 P2 Partial sum Load Weight Computing
Constraint (M) Usage Acc. Train Train Storage Latency Latency
Baseline 10.987 46400 690176 - 91.44% - - 65536 46592 16860
8192 1.804 (-84%) 8188 (-82%) 674344 (-2%) 86.01% 92.17% (+0.73%) 91.34% 90.99% 97280 (+48%) 8192 (-82%) 16296 (-3%)
4096 0.829 (-92%) 4088 (-91%) 411848 (-40%) 78.77% 91.37% (-0.07%) 90.40% 90.21% 66560 (+2%) 4096 (-91%) 12092 (-28%)
1024 0.132 (-99%) 997 (-98%) 145888 (-79%) 50.71% 86.16% (-5.28%) 84.37% 84.68% 57344 (-13%) 1024 (-98%) 3940 (-77%)
512 0.033 (-99.6%) 512 (-99%) 79760 (-88%) 25.37% 81.01% (-10.43%) 78.74% 77.26% 40960 (-38%) 512 (-99%) 3128 (-81%)

TABLE VI
C OMPARISON TABLE

E-UPQ [1] E-UPQ [1] XPert [2] This work

Model ResNet18 ResNet20 VGG16 VGG9 VGG16 ResNet18
Dataset CIFAR-100 CIFAR-10 CIFAR-10 CIFAR-10 CIFAR-10 CIFAR-10
Baseline 74.4% 91.3% 94.0% 90.7% 92.0% 91.4%
accuracy
Compressed 73.2% 90.5% 92.46% 89.17% 91.88% 90.21%
accuracy (-1.2%) (-0.8%) (-1.5%) (-1.5%) (-0.8%) (-1.23%)
Bit (Weight/ 1.0/8.0/4.0 1.1/8.0/4.0 8.0/4.0/5.4 4.0/4.0/5.0 4.0/4.0/5.0 4.0/4.0/5.0
Activation/ADC)
Memory cell 1 bit 1 bit 1 bit 4 bits 4 bits 4 bits
Compression ratio -87.50% -86.30% -68.41% -89.98% -93.53% -92.45%
Macro usage 12.50% 13.70% - 88.12% 90.83% 78.77%
Activated wordlines 16 16 64 256 256 256
Pruning ✓ ✓ × ✓ ✓ ✓
Adjustable × × × ✓ ✓ ✓
after pruning
ADC aware × × × ✓ ✓ ✓
training

partial sum. Addressing this problem, this paper has presented accuracy loss.
a two-stage process to adapt models to CIM constraints. The
first stage compresses and reallocates the weights to maximize
R EFERENCES
macro utilization and minimize weight loading while retaining
accuracy under the CIM array size constraints. The second [1] C.-Y. Chang, K.-C. Chou, Y.-C. Chuang, and A.-Y. Wu, “E-UPQ: Energy-
stage quantizes the model with the learning quantization aware unified pruning-quantization framework for CIM architecture,”
IEEE Journal on Emerging and Selected Topics in Circuits and Systems,
step size and ADC aware training to reduce the impact of vol. 13, no. 1, pp. 21–32, 2023.
quantization errors for partial sum accumulation. Compared to [2] A. Moitra, A. Bhattacharjee, Y. Kim, and P. Panda, “XPert: Peripheral
the previous approaches, the presented method achieves higher circuit & neural architecture co-search for area and energy-efficient xbar-
based computing,” in 60th ACM/IEEE Design Automation Conference
macro utilization, up to 90%, higher compression ratio, up to (DAC). IEEE, 2023, pp. 1–6.
93%, and more activated wordlines, up to 256, with lower [3] X.-J. Chen and C.-L. Yang, “CIMNet: Joint search for neural network
and computing-in-memory architecture,” IEEE Micro, pp. 1–12, 2024.
9

[4] C. Sakr and N. R. Shanbhag, “Signal processing methods to enhance

the energy efficiency of in-memory computing architectures,” IEEE
Transactions on Signal Processing, vol. 69, pp. 6462–6472, 2021.
[5] A. B. Sundar, J. Viraraghavan, and B. Vijayakumar, “Input-conditioned
quantisation for enob improvement in cim adc columns targeting large-
length partial sums,” IEEE Transactions on Circuits and Systems II:
Express Briefs, vol. 71, no. 6, pp. 2971–2975, 2024.
[6] J. Bai, W. Xue, Y. Fan, S. Sun, and W. Kang, “Partial sum quantization
for computing-in-memory-based neural network accelerator,” IEEE
Transactions on Circuits and Systems II: Express Briefs, vol. 70, no. 8,
pp. 3049–3053, 2023.
[7] Y. Kim, H. Kim, and J.-J. Kim, “Extreme partial-sum quantization for
analog computing-in-memory neural network accelerators,” ACM Journal
on Emerging Technologies in Computing Systems (JETC), vol. 18, no. 4,
pp. 1–19, 2022.
[8] A. Gordon, E. Eban, O. Nachum, B. Chen, H. Wu, T.-J. Yang,
and E. Choi, “Morphnet: Fast & simple resource-constrained structure
learning of deep networks,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2018, pp. 1586–1595.
[9] S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, and
D. S. Modha, “Learned step size quantization,” arXiv preprint
arXiv:1902.08153, 2019.

Ming-Han Lin received the M.S. degree in

electronics engineering from the National Yang
Ming Chiao Tung University, Hsinchu, Taiwan, in
2024. He is currently working in the NovaTek,
Hsinchu, Taiwan. His research interest includes deep
learning and Computing-In-Memory.

Tian-Sheuan Chang (S’93–M’06–SM’07) received

the B.S., M.S., and Ph.D. degrees in electronic
engineering from National Chiao-Tung University
(NCTU), Hsinchu, Taiwan, in 1993, 1995, and 1999,
respectively.
From 2000 to 2004, he was a Deputy Manager
with Global Unichip Corporation, Hsinchu, Taiwan.
In 2004, he joined the Department of Electronics
Engineering, NCTU (as National Yang Ming Chiao
Tung University (NYCU) in 2021), where he is
currently a Professor. In 2009, he was a visiting
scholar in IMEC, Belgium. His current research interests include system-
on-a-chip design, VLSI signal processing, and computer architecture.
Dr. Chang has received the Excellent Young Electrical Engineer from
Chinese Institute of Electrical Engineering in 2007, and the Outstanding
Young Scholar from Taiwan IC Design Society in 2010. He has been actively
involved in many international conferences as an organizing committee or
technical program committee member.

Flexible CIM for Efficient ML Processing
No ratings yet
Flexible CIM for Efficient ML Processing
9 pages
Ieee A4
No ratings yet
Ieee A4
6 pages
A 16.38TOPS and 4.55POPS W SRAM Computing-In-Memory Macro For Signed Operands Computation and Batch Normalization Implementation
No ratings yet
A 16.38TOPS and 4.55POPS W SRAM Computing-In-Memory Macro For Signed Operands Computation and Batch Normalization Implementation
13 pages
A CMOS-integrated Compute-In-Memory Macro
No ratings yet
A CMOS-integrated Compute-In-Memory Macro
10 pages
A Charge Domain SRAM Computing-In-Memory Macro With Quantized Interval-Optimized ADC and Input Bit-Level Sparsity-Optimized P2O-DAC For 8-b MAC Operation
No ratings yet
A Charge Domain SRAM Computing-In-Memory Macro With Quantized Interval-Optimized ADC and Input Bit-Level Sparsity-Optimized P2O-DAC For 8-b MAC Operation
5 pages
10T SRAM Computing-in-Memory Macros For Binary and
No ratings yet
10T SRAM Computing-in-Memory Macros For Binary and
15 pages
A 22-Nm 264-GOPS - MM - inline-Foruting-In-Memory Macro For CNNs 1
No ratings yet
A 22-Nm 264-GOPS - MM - inline-Foruting-In-Memory Macro For CNNs 1
1 page
An Energy-Efficient Readout Method Based On Weight-Flip-Store Coding and Quantization Cycle Skipping Technology For Computing in Memory
No ratings yet
An Energy-Efficient Readout Method Based On Weight-Flip-Store Coding and Quantization Cycle Skipping Technology For Computing in Memory
6 pages
2021 Asscc 8-5
No ratings yet
2021 Asscc 8-5
3 pages
CIMAT A Compute-In-Memory Architecture For On-Chip Training Based On Transpose SRAM Arrays
No ratings yet
CIMAT A Compute-In-Memory Architecture For On-Chip Training Based On Transpose SRAM Arrays
11 pages
02Computing-in-Memory With SRAM and RRAM For Binary Neural Networks
No ratings yet
02Computing-in-Memory With SRAM and RRAM For Binary Neural Networks
4 pages
2025 Conf An 8T SRAM Based Digital Compute-In-Memory Macro For Multiply-And-Accumulate Accelerating
No ratings yet
2025 Conf An 8T SRAM Based Digital Compute-In-Memory Macro For Multiply-And-Accumulate Accelerating
5 pages
Deep Learning For Compute in Memory
No ratings yet
Deep Learning For Compute in Memory
8 pages
Jaisimha Thesis 2021
No ratings yet
Jaisimha Thesis 2021
81 pages
A Logic-Compatible EDRAM Compute-In-Memory With Embedded ADCs For Processing Neural Networks
No ratings yet
A Logic-Compatible EDRAM Compute-In-Memory With Embedded ADCs For Processing Neural Networks
13 pages
A 28 NM 16-kb Sign-Extension-Less Digital-Compute-in-Memory Macro With Extension-Friendly Compute Units and Accuracy-Adjustable Adder-Tree
No ratings yet
A 28 NM 16-kb Sign-Extension-Less Digital-Compute-in-Memory Macro With Extension-Friendly Compute Units and Accuracy-Adjustable Adder-Tree
5 pages
Depthwise Neural Network
No ratings yet
Depthwise Neural Network
5 pages
BR-CIM An Efficient Binary Representation Computation-In-Memory Design
No ratings yet
BR-CIM An Efficient Binary Representation Computation-In-Memory Design
14 pages
CIM Supplementary Tseng 0610
No ratings yet
CIM Supplementary Tseng 0610
6 pages
XNOR-SRAM for Efficient DNN Computing
No ratings yet
XNOR-SRAM for Efficient DNN Computing
11 pages
Article CIMInferenceDNN Mansour Berkia
No ratings yet
Article CIMInferenceDNN Mansour Berkia
7 pages
A High-Parallelism RRAM-Based Compute-In-Memory Macro With Intrinsic Impedance Boosting and In-ADC Computing
No ratings yet
A High-Parallelism RRAM-Based Compute-In-Memory Macro With Intrinsic Impedance Boosting and In-ADC Computing
9 pages
A Digital Bit-Reconfigurable Versatile Compute-In-Memory Macro For Machine Learning Acceleration
No ratings yet
A Digital Bit-Reconfigurable Versatile Compute-In-Memory Macro For Machine Learning Acceleration
5 pages
Colonnade A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro For Processing Neural Networks
No ratings yet
Colonnade A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro For Processing Neural Networks
13 pages
Accuracy Improvement With Weight Mapping Strategy and Output Transformation For STT-MRAM-Based Computing-in-Memory
No ratings yet
Accuracy Improvement With Weight Mapping Strategy and Output Transformation For STT-MRAM-Based Computing-in-Memory
7 pages
2020 A Reconfigurable Approximate Multiplier For Quantized CNN Applications
No ratings yet
2020 A Reconfigurable Approximate Multiplier For Quantized CNN Applications
6 pages
Dual-Split 6T SRAM-CIM for DNN Edge
No ratings yet
Dual-Split 6T SRAM-CIM for DNN Edge
14 pages
Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
No ratings yet
Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
8 pages
Design of Current-Mode 8T SRAM Compute-In-Memory Macro For Processing Neural Networks
No ratings yet
Design of Current-Mode 8T SRAM Compute-In-Memory Macro For Processing Neural Networks
2 pages
A Compute-In-Memory Chip Based On Resistive Random-Access Memory
No ratings yet
A Compute-In-Memory Chip Based On Resistive Random-Access Memory
29 pages
FDCA: Fine-Grained Digital-CIM Based CNN Accelerator With Hybrid Quantization and Weight-Stationary Dataflow
No ratings yet
FDCA: Fine-Grained Digital-CIM Based CNN Accelerator With Hybrid Quantization and Weight-Stationary Dataflow
6 pages
Hsu 等 - 2024 - A Nonvolatile AI-Edge Processor With SLC-MLC Hybrid ReRAM Compute-In-Memory Macro Using Current-Volt
No ratings yet
Hsu 等 - 2024 - A Nonvolatile AI-Edge Processor With SLC-MLC Hybrid ReRAM Compute-In-Memory Macro Using Current-Volt
12 pages
Ec24m2018 VTTD
No ratings yet
Ec24m2018 VTTD
11 pages
Time-Domain CIM with Ring Oscillator
No ratings yet
Time-Domain CIM with Ring Oscillator
2 pages
Mustafa Ali Dissertation
No ratings yet
Mustafa Ali Dissertation
127 pages
A Heterogeneous Microprocessor Based On All-Digital Compute-In-Memory For End-To-End AIoT Inference
No ratings yet
A Heterogeneous Microprocessor Based On All-Digital Compute-In-Memory For End-To-End AIoT Inference
5 pages
A 64 KB Reconfigurable Full-Precision Digital ReRAM-Based Compute-In-Memory For Artificial Intelligence Applications
No ratings yet
A 64 KB Reconfigurable Full-Precision Digital ReRAM-Based Compute-In-Memory For Artificial Intelligence Applications
13 pages
Precision Scalable In-Memory Computing for DNNs
No ratings yet
Precision Scalable In-Memory Computing for DNNs
4 pages
06) A Time-Domain Computing-In-Memory Based Processor Using Predictable Decomposed Convolution For Arbitrary Quantized DNNs
No ratings yet
06) A Time-Domain Computing-In-Memory Based Processor Using Predictable Decomposed Convolution For Arbitrary Quantized DNNs
4 pages
Integrating Memristors and CMOS For Better AI: News & Views
No ratings yet
Integrating Memristors and CMOS For Better AI: News & Views
2 pages
A Multi-Functional In-Memory Inference Processor Using A Standard 6T SRAM Array
No ratings yet
A Multi-Functional In-Memory Inference Processor Using A Standard 6T SRAM Array
14 pages
Hung 等 - 2021 - Challenges and Trends of Nonvolatile in-Memory-Computation Circuits for AI Edge Devices
No ratings yet
Hung 等 - 2021 - Challenges and Trends of Nonvolatile in-Memory-Computation Circuits for AI Edge Devices
13 pages
Approximate ADCs For In-Memory Computing
No ratings yet
Approximate ADCs For In-Memory Computing
10 pages
IJME Vol 7 Iss 4 Paper 9 1260 1264
No ratings yet
IJME Vol 7 Iss 4 Paper 9 1260 1264
5 pages
Sivanandham Review I
No ratings yet
Sivanandham Review I
17 pages
ReRAM-Assisted SRAM for NN Acceleration
No ratings yet
ReRAM-Assisted SRAM for NN Acceleration
14 pages
03) Time-Domain - Computing - in - Memory - Using - Spintronics - For - Energy-Efficient - Convolutional - Neural - Network
No ratings yet
03) Time-Domain - Computing - in - Memory - Using - Spintronics - For - Energy-Efficient - Convolutional - Neural - Network
13 pages
Reliable Computing of ReRAM Based Compute-in-Memory Circuits For AI Edge Devices
No ratings yet
Reliable Computing of ReRAM Based Compute-in-Memory Circuits For AI Edge Devices
6 pages
A Full Spectrum of Computing-In-memory Technologies
No ratings yet
A Full Spectrum of Computing-In-memory Technologies
39 pages
Roy 等 - 2025 - Compute SNDR-Boosted 22-Nm MRAM-Based in-Memory Computing Macro Using Statistical Error Compensation
No ratings yet
Roy 等 - 2025 - Compute SNDR-Boosted 22-Nm MRAM-Based in-Memory Computing Macro Using Statistical Error Compensation
11 pages
Reconfigurable Multiplier
No ratings yet
Reconfigurable Multiplier
16 pages
ADC-Less 3D-NAND Compute-In-Memory Architecture Using Margin Propagation
No ratings yet
ADC-Less 3D-NAND Compute-In-Memory Architecture Using Margin Propagation
4 pages
An 8-Bit in Resistive Memory Computing Core With Regulated Passive Neuron and Bitline Weight Mapping
No ratings yet
An 8-Bit in Resistive Memory Computing Core With Regulated Passive Neuron and Bitline Weight Mapping
13 pages
Sensors 24 00181 v2
No ratings yet
Sensors 24 00181 v2
26 pages
FPGA DNN Acceleration with BRAMAC
No ratings yet
FPGA DNN Acceleration with BRAMAC
11 pages
Improved Low-Power Cost-Effective DCT Implementation Based On Markov Random Field and Stochastic Logic
No ratings yet
Improved Low-Power Cost-Effective DCT Implementation Based On Markov Random Field and Stochastic Logic
11 pages
A Cross-Layer Framework For Design Space and Variation Analysis of Non-Volatile Ferroelectric Capacitor-Based Compute-In-Memory Accelerators
No ratings yet
A Cross-Layer Framework For Design Space and Variation Analysis of Non-Volatile Ferroelectric Capacitor-Based Compute-In-Memory Accelerators
6 pages
Weight-Oriented Approximation For Energy-Efficient Neural Network Inference Accelerators
No ratings yet
Weight-Oriented Approximation For Energy-Efficient Neural Network Inference Accelerators
14 pages
14.2 A Compute SRAM With Bit Serial Integer - Floating Point Operations For Programmable in Memory Vector Acceleration
No ratings yet
14.2 A Compute SRAM With Bit Serial Integer - Floating Point Operations For Programmable in Memory Vector Acceleration
3 pages
Wavefront Coding For Accommodation-Invariant Near-Eye Displays
No ratings yet
Wavefront Coding For Accommodation-Invariant Near-Eye Displays
14 pages
Energy-Efficient FPGA Framework For Non-Quantized Convolutional Neural Networks
No ratings yet
Energy-Efficient FPGA Framework For Non-Quantized Convolutional Neural Networks
2 pages
From Loop Nests To Silicon
No ratings yet
From Loop Nests To Silicon
34 pages
Low Power Vision Transformer Accelerator With Hardware-Aware Pruning and Optimized Dataflow
No ratings yet
Low Power Vision Transformer Accelerator With Hardware-Aware Pruning and Optimized Dataflow
10 pages
A Direct Memory Access Controller (DMAC) For Irregular Data Transfers On RISC-V Linux Systems
No ratings yet
A Direct Memory Access Controller (DMAC) For Irregular Data Transfers On RISC-V Linux Systems
6 pages
Rescaling-Aware Training For Efficient Deployment of Deep Learning Models On Full-Integer Hardware
No ratings yet
Rescaling-Aware Training For Efficient Deployment of Deep Learning Models On Full-Integer Hardware
4 pages
Fine-Tuning Small Language Models For Domain-Specific AI
No ratings yet
Fine-Tuning Small Language Models For Domain-Specific AI
15 pages
Deadlock-Free Routing For Full-Mesh Networks Without Using Virtual Channels
No ratings yet
Deadlock-Free Routing For Full-Mesh Networks Without Using Virtual Channels
15 pages
Wireless Laser Power Transfer For Low-Altitude Uncrewed Aerial Vehicle-Assisted Internet of Things
No ratings yet
Wireless Laser Power Transfer For Low-Altitude Uncrewed Aerial Vehicle-Assisted Internet of Things
7 pages
Enhancing Urban VANETs Stability
No ratings yet
Enhancing Urban VANETs Stability
10 pages
Introducing Large Language Models Into The Design Flow of Time Sensitive Networking
No ratings yet
Introducing Large Language Models Into The Design Flow of Time Sensitive Networking
7 pages
Toward An Unbiased Collective Memory For Efficient LLM-Based Agentic 6G Cross-Domain Management
No ratings yet
Toward An Unbiased Collective Memory For Efficient LLM-Based Agentic 6G Cross-Domain Management
12 pages
Dynamic Low Power Traffic Pattern For Energy Constrained Wireless Sensor Networks
No ratings yet
Dynamic Low Power Traffic Pattern For Energy Constrained Wireless Sensor Networks
14 pages
Faster Offloads by Unloading Them
No ratings yet
Faster Offloads by Unloading Them
7 pages
Target Wake Time Scheduling For Time-Sensitive and Energy-Efficient Wi-Fi Networks
No ratings yet
Target Wake Time Scheduling For Time-Sensitive and Energy-Efficient Wi-Fi Networks
18 pages
A Review of Software For Designing and Operating Quantum Networks
No ratings yet
A Review of Software For Designing and Operating Quantum Networks
12 pages
The Role of Legacy Mobile Networks in Infrastructure Resilience
No ratings yet
The Role of Legacy Mobile Networks in Infrastructure Resilience
6 pages
Towards Scalable Proteomics
No ratings yet
Towards Scalable Proteomics
10 pages
Bridging The Gap Between Simulated and Real Network Data Using Transfer Learning
No ratings yet
Bridging The Gap Between Simulated and Real Network Data Using Transfer Learning
7 pages
Optimizing Version AoI in Energy-Harvesting IoT
No ratings yet
Optimizing Version AoI in Energy-Harvesting IoT
6 pages
Dimension Reduction For Clustering
No ratings yet
Dimension Reduction For Clustering
35 pages
The COLIBRE Project Cosmological Hydrodynamical Simulations of Galaxy Formation and Evolution
No ratings yet
The COLIBRE Project Cosmological Hydrodynamical Simulations of Galaxy Formation and Evolution
54 pages
Modified Loss of Momentum Gradient Descent
No ratings yet
Modified Loss of Momentum Gradient Descent
52 pages
Quantum Algorithms For General Nonlinear Dynamics Based On The Carleman Embedding
No ratings yet
Quantum Algorithms For General Nonlinear Dynamics Based On The Carleman Embedding
148 pages
Compressibility Measures and Succinct Data Structures For Piecewise Linear Approximations
No ratings yet
Compressibility Measures and Succinct Data Structures For Piecewise Linear Approximations
19 pages
The Impact of Spectroscopic Redshift Errors On Cosmological Measurements
No ratings yet
The Impact of Spectroscopic Redshift Errors On Cosmological Measurements
27 pages
Mixed Dark Matter and Galaxy Clustering: The Importance of Relative Perturbations
No ratings yet
Mixed Dark Matter and Galaxy Clustering: The Importance of Relative Perturbations
29 pages
Cosmic Variance in Anisotropy Searches at Pulsar Timing Arrays
No ratings yet
Cosmic Variance in Anisotropy Searches at Pulsar Timing Arrays
10 pages
An Introduction To Gravitational Wave Theory: S. Speziale and D.A. Steer
No ratings yet
An Introduction To Gravitational Wave Theory: S. Speziale and D.A. Steer
71 pages
zNID-GPON-GE-6024T - DZS
No ratings yet
zNID-GPON-GE-6024T - DZS
4 pages
Master Your Morning Routine
No ratings yet
Master Your Morning Routine
1 page
October 2014 SAT CR
0% (1)
October 2014 SAT CR
15 pages
Nvidia's AI Roadmap to 2027
No ratings yet
Nvidia's AI Roadmap to 2027
9 pages
Class 12 Chem Practical Exp 3
No ratings yet
Class 12 Chem Practical Exp 3
10 pages
Skillful RW3 Unit 1 Video Worksheet
0% (1)
Skillful RW3 Unit 1 Video Worksheet
2 pages
Mechanical Seal Design
No ratings yet
Mechanical Seal Design
6 pages
IM1.1. Intro To UD Urban Design and Community (1 of 3)
No ratings yet
IM1.1. Intro To UD Urban Design and Community (1 of 3)
32 pages
Contoh CV
No ratings yet
Contoh CV
1 page
MC10, MC12, MC15 (B860) Parts Manual: Yale Europe Materials Handling Limited
No ratings yet
MC10, MC12, MC15 (B860) Parts Manual: Yale Europe Materials Handling Limited
170 pages
Simultaneous Operations (SIMOPS)
100% (10)
Simultaneous Operations (SIMOPS)
156 pages
TM 9-700 Through TM 9-834 06142015
100% (1)
TM 9-700 Through TM 9-834 06142015
12 pages
You Are How You Fuck
No ratings yet
You Are How You Fuck
2 pages
IIT-Foundation Class 6 Maths Sample
No ratings yet
IIT-Foundation Class 6 Maths Sample
18 pages
Seismic Damage Analysis Method
No ratings yet
Seismic Damage Analysis Method
16 pages
Pelatihan Career Happiness Plan Untuk Meningkatkan
No ratings yet
Pelatihan Career Happiness Plan Untuk Meningkatkan
12 pages
Cuemath Year 2
No ratings yet
Cuemath Year 2
2 pages
Kitsune FF Mask v1.03
100% (1)
Kitsune FF Mask v1.03
11 pages
Water Resources Engineering-II
No ratings yet
Water Resources Engineering-II
8 pages
Data Analyst Program by Invact
No ratings yet
Data Analyst Program by Invact
18 pages
Khajieva Study of Stylistic Lexicology
No ratings yet
Khajieva Study of Stylistic Lexicology
8 pages
GCE O-Level 2022 Private Application
No ratings yet
GCE O-Level 2022 Private Application
1 page
Schneider Electric - TeSys-LR97D - LR97D25M7
No ratings yet
Schneider Electric - TeSys-LR97D - LR97D25M7
4 pages
Attitude and Brand Loyalty: A Longitudinal Study of Multiattribute Attitude Models and Intervening Variables
No ratings yet
Attitude and Brand Loyalty: A Longitudinal Study of Multiattribute Attitude Models and Intervening Variables
14 pages
Chapter 22: Principles of Passive Vibration Control: Basics
No ratings yet
Chapter 22: Principles of Passive Vibration Control: Basics
11 pages
"Unarvu" Impact Assessment of The Community Radio Awareness Programme For Tribal Development in Wayanad District, Kerala Community Radio Mattoli, Wayanad, India
No ratings yet
"Unarvu" Impact Assessment of The Community Radio Awareness Programme For Tribal Development in Wayanad District, Kerala Community Radio Mattoli, Wayanad, India
34 pages
GCFB 507 Pqam
No ratings yet
GCFB 507 Pqam
72 pages
Prezi Presentation Translation Elements
No ratings yet
Prezi Presentation Translation Elements
2 pages
INS1015. Nguyen Minh Trang
No ratings yet
INS1015. Nguyen Minh Trang
14 pages
EECE340 Assignment 03
No ratings yet
EECE340 Assignment 03
3 pages

Computing-In-Memory Aware Model Adaption For Edge Devices

Uploaded by

Computing-In-Memory Aware Model Adaption For Edge Devices

Uploaded by

1

Computing-In-Memory Aware Model Adaption For

Fig. 1. 4-bit CIM macro architecture

Fig. 4. Model adaption flow for CIM

Fig. 4 outlines the overall model adaptation flow, consisting

weight and partial sum quantization.

Fig. 8. Forward and backward data flow weight quantization

Fig. 7. Forwarding flow Phase1 training

1) Phase-1: Weight Quantization Training: Fig. 7 illustrates

Finally, the trained 4-bit weights can be directly used in the

TABLE I focusing on wordline and bitline limitations, as well as

in lower accuracy than the VGG models. Additionally, as the

E-UPQ [1] E-UPQ [1] XPert [2] This work

[4] C. Sakr and N. R. Shanbhag, “Signal processing methods to enhance

Ming-Han Lin received the M.S. degree in

Tian-Sheuan Chang (S’93–M’06–SM’07) received

You might also like