0% found this document useful (0 votes)
8 views19 pages

Noma Amc 2024

Uploaded by

RstStt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views19 pages

Noma Amc 2024

Uploaded by

RstStt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

SVM-Driven Modulation Classification in

NOMA Systems with CNN-Attention based


Deep Feature Learning
Ashok Parmar, Kamal Captain and Suresh Dahiya
Department of Electronics Engineering, Sardar Vallabhbhai
National Institute of Technology, Surat, 395007, Gujarat, India.

*Corresponding author(s). E-mail(s): [email protected];


Contributing authors: [email protected];
[email protected];

Abstract
Non-orthogonal multiple access (NOMA) is a promising solution to
the spectrum deficiency problem. NOMA has been recognized as one
of the enabling technologies for beyond 5G networks. Different from
the conventional orthogonal multiple access (OMA), where users share
a spectrum in frequency or time or spatial domains; in NOMA sys-
tems, various users can be served within the same frequency, time,
and spatial domains. To separate each user’s signal, a successive inter-
ference cancellation is carried out at receivers, which requires infor-
mation about the modulation type of the signals. This information
can be sent with the signal, but it will be an overhead. To over-
come this issue, an automatic modulation classification (AMC) can be
used which detects the modulation type in the received signal blindly.
In this work, we propose dual stream convolutional neural network
with bidirectional long short-term memory and multi-head attention
(CNN-BiLSTM-MHA) model for AMC. We use deep learning model
for feature extraction and a support vector machine is used as a
classifier. We compare the performance of the proposed model with
some state-of-the-art models used for OMA and NOMA methods. The
proposed model is found to outperform recently proposed AMC models.

Keywords: Non-orthogonal multiple access (NOMA), Automatic


Modulation Classification (AMC), Convolution Neural Network (CNN), Long
Short Term Memory (LSTM), Multi-Head Attention (MHA), Support Vector
Machines (SVM).

1
2 SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based

1 Introduction
The escalating need for mobile internet and the Internet of Things (IoT)
presents demanding prerequisites for the next-generation wireless network
communications. These prerequisites include high spectral efficiency, enhanced
fairness, increased reliability, reduced latency, and extensive connectivity. To
meet these demands, advanced technologies have been proposed as potential
solutions for the challenges. Examples of such technologies include massive
multiple input multiple output (MIMO), millimeter wave communications,
ultra-dense networks, and non-orthogonal multiple access (NOMA)[1]. This
paper is focused on NOMA Technology. In power domain NOMA, different
users are allocated different power based on their channel condition. The user
with poor channel condition is called far user and the one of good channel con-
ditions is called near user. Higher power is allocated to far user and low power
is allocated to near user. At far user, the near user signal is considered as noise
and the intended signal is recovered. At near user, to separate and eliminate
far user signal interference, successive interference cancellation (SIC) should
be conducted at near user receiver. In order to perform SIC at the near user,
the far user signal needs to be extracted for which the modulation technique
used by the far user must be known. The modulation information can be sent
as an overhead in the transmitted but it will reduce the spectral efficiency. The
automatic modulation classification (AMC) is a technique which detects the
modulation used in the received signal blindly, that is, without any marker or
prior information about the signal. AMC plays an important role in reducing
the overhead and improving the spectral efficiency and hence getting increased
attention of the researchers.
Multiple approaches have been proposed by researchers for AMC, demon-
strating that modulation recognition can extract the digital baseband informa-
tion without significant prior knowledge about the device type and transmis-
sion schemes. AMC serves as an intermediate step between signal detection and
demodulation at the receiver’s end. The AMC problem can be categorized into
likelihood-based (LB) and feature-based (FB) methods. LB methods employ
a probabilistic approach for classification, while FB methods utilize feature
extraction techniques to address the classification problem. LB methods have
shown favorable results but come with increased computational complexity.
On the other hand, the FB method serves as a suboptimal classifier that is
suitable for practical implementation. It relies on extracting relevant features
from the received signal and subsequently classifying them using a classifier.
The FB method offers a suboptimal solution with relatively low latency since
it doesn’t require extensive prior knowledge. An apparent challenge in machine
learning (ML) methods is feature engineering, which typically requires expert
knowledge and experience. However, with the rapid progress in deep learning
(DL) technologies, various approaches have emerged that enable the automatic
learning of features. DL has gained popularity in communication systems due
to its ability to leverage large datasets, which are readily available. This advan-
tage allows DL models to autonomously learn and extract relevant features
SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based Deep

from the data without significant manual intervention, making it a preferred


choice in communication system applications.
In [2], the author developed a convolutional neural network (CNN) model that
effectively distinguished between ten distinct modulation techniques with a
high level of classification accuracy. Another study [3] evaluated the efficiency
of the ResNet network in discriminating among 24 different modulation types.
By incorporating the past and present state of the Long Short-Term Mem-
ory (LSTM), it enhanced the model’s capacity to learn temporal data, such as
handwriting and speech. In [4], it is discovered that the LSTM network was
well-suited for AMC and achieved a recognition accuracy of approximately
90% for Signal-to-Noise Ratio (SNR) above 0 dB. A hybrid parallel network
approach for AMC was proposed in [5], utilizing the AM-SoftMax activation
function at the output layer. The maximum achieved classification accuracy
by this model was 93% at 8 dB SNR. In [6], a robust CNN architecture was
introduced for classifying 24 different modulation techniques. The simulation
results demonstrated a classification accuracy exceeding 90% at SNR levels
of 8 dB and higher. In [7], the eye diagram of the raw signal is utilized as
input for a Lenet-5-based classifier [8], drawing a subtle connection between
the problem of AMC and the extensively researched field of image recogni-
tion. Furthermore, [9] investigates the analysis of long symbol-rate signals,
and excellent results are achieved through the use of stacked auto-encoders,
with increased simulation runs. In [10], an AMC algorithm is proposed for
adaptive modulation in time division duplex - orthogonal frequency division
multiplexing (TDD-OFDM) systems, utilizing channel reciprocity and receiver
knowledge of transmission data rate. An AMC method is proposed in [11] by
leveraging the finite alphabet property of information symbols and the equiv-
alent parallel model of OFDM systems. A two stage network is proposed in
[12] to effectively tackle the problem of inter-class confusion. In [13], the con-
cept of depthwise separable convolution was used in the models for complexity
reduction, as [13] focused on the modulation classification of low-powered IoT
device’s complexity, and they cannot process a large number of parameters
as in the conventional neural networks. Depthwise separable CNN architec-
tures were used to achieve better performance with fewer parameters. Authors
in [14] investigated the impact of hyperparameters in a CNN model on the
accuracy of modulation classification. The study found that to achieve high
accuracy in modulation classification, the size of the convolution kernel should
be large enough, but within an appropriate range, while keeping the other
hyperparameters fixed for a given dataset.
In [15], a cost-efficient and high-performing CNN called MCNet is intro-
duced. MCNet achieves effective modulation classification by utilizing parallel
asymmetric kernels in each convolutional block. By incorporating skip con-
nections in a block-wise manner, MCNet can capture spatiotemporal signal
correlations comprehensively. This design allows MCNet to balance accuracy
and latency effectively. A CNN based architecture having two convolutional
layers called CNN-2 is proposed in [16] for AMC. A similar architecture called
4 SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based

CNN-4 having four convolutional layers is proposed in [17]. Paper [18], ana-
lyzed fast deep learning algorithms for distinguishing between 10 different
modulation types. The study analyzed two deep neural network architectures:
Convolutional Long - Short-Term Deep Neural Network (CLDNN) [19] and
Residual Network (ResNet) [20]. CLDNN and ResNet exhibit superior per-
formance at low SNR, while Long Short Term Memory (LSTM) and ResNet
architectures perform better at high SNR conditions. These architectures are
optimized to achieve efficient modulation classification while considering the
specific characteristics of different SNR levels. In [21], the signals are converted
to feature matrix and this feature matrix is fed to a ResNet-50 image classifier.
Recently, [22] performed modulation classification by combining the cumulants
and image constellation features and then using the ResNet-50 as classifier.
These two methods work well for the AMC in OMA in terms of classification
performance but the classifier used is very complex and may not be suitable for
applications in wireless communications. Note that, all the works reported in
[2]-[22] consider orthogonal multiple access system where spectral resources are
shared orthogonally between users in time, frequency and space. However, in
case of NOMA, signals of multiple users are intentionally combined to improve
spectral efficiency. This makes the modulation recognition task more challeng-
ing in NOMA system. There are only two works reported in literature that
address the problem of modulation classification in NOMA system: [23] and
[24]. In [23], a fourth-order cumulant-based algorithm is proposed which is sim-
pler but may not capture the nuances effectively. Subsequently, authors in [24]
introduced a fresh perspective by employing a modified residual convolutional
neural network (MR-CNN) for modulation classification in NOMA systems.
This method outperformed the traditional cumulant-based approach, marking
a notable advancement. However, despite this progress, there remains a dis-
cernible scope for improvement in classification performance. This motivates
our current research to further enhance the efficacy of modulation classification
in NOMA systems. In this work, we propose a novel deep learning based archi-
tecture for modulation classification in NOMA system. The key contributions
of the paper are as follows:
• We begin by assessing the effectiveness of current automatic modulation
classification algorithms originally designed for orthogonal multiple access
(OMA) systems in the context of NOMA systems.
• We propose a novel deep learning architecture called dual stream CNN-
BiLSTM with multi-head attention (MHA) for the purpose of modulation
classification in NOMA systems. Our approach utilizes the deep learning
architecture as a feature extractor, enabling it to extract valuable features
from the input data. To perform the actual classification, we employ a
support vector machine (SVM) classifier.
• We evaluate the performance of our proposed model by comparing it with
existing models designed for OMA systems, as well as recently proposed
SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based Deep

models specifically developed for NOMA systems. This comprehensive anal-


ysis allows us to assess the effectiveness existing models in NOMA system
and to analyze the superiority of our model in NOMA scenario.
• We demonstrate the superiority of our proposed model over state-of-the-art
(SOTA) models through the use of comparative tables, plots, and confusion
matrices.
The rest of the paper is organized as follows: Section 2 describes the sys-
tem model. In section 3, we discuss the proposed model. Experimental dataset
and training-testing details are given in section 4. Section 5 presents the
experimental results and section 6 concludes the paper.

2 System Model
The primary emphasis of the paper revolves around the downlink NOMA,
which entails the amalgamation of multiple users signals at varying power
magnitudes. As there are more co-scheduled users, the automatic modulation
classification method becomes exponentially more difficult. Regardless of the
number of concurrent users, the basic modulation classification theory does
not change. This work restricts the number of co-scheduled users to two in
order to streamline the analysis. In the two-user NOMA system, the received
signal at near user for time instants m = 1, 2, ..., N is given by
hp p i
zn (m) = h Pf yf (m) + Pn yn (m) + wn (m) , (1)

where yf (m) and yn (m) represent the unit energy signal sent by base sta-
tion (BS) to far and near users, respectively, h is the Rayleigh fading channel
gain between BS and the near user and wn (m) represents the additive white
Gaussian noise (AWGN) with mean 0 and variance σ 2 . Far and near users
are allocated power Pf and Pn , respectively, based on the channel conditions
experienced by them. The far user, who has lower channel gain, is typically
given more power when two users with higher channel gain discrepancies are
co-scheduled, i.e., Pf > Pn and Pf + Pn = 1. Since the far user is allotted
high power compared to the near user, it retrieves its signal by consider-
ing the near user signal as noise. For the near user which has less power, it
uses successive interference cancellation (SIC) by subtracting the far user’s
signal obtained earlier from the received signal [25]. In terms of interference
cancellation techniques, there are two commonly used methods: symbol-level
interference cancellation (SLIC) and codeword-level interference cancellation
(CWIC). Among these two, CWIC is more widely adopted and preferred.
CWIC necessitates knowledge of the modulation type employed for the far
user’s signal. Consequently, in this study, CWIC is chosen to carry out the
intra-cell interference cancellation. The modulation type of the users from the
set M = { 8-PSK, 16-QAM, BPSK, QPSK} with equal probability. The signal
x(t) has in-phase and quadrature-phase (I/Q) components, which are described
as
6 SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based

I/Q

Conv1
Conv2 Conv3
(256 filters of MHA (heads=5)
(256 filters of (80 filters of
NOMA Dataset 1 x 3 size)
2 x 3 size) 1 x 3 size)
RAW Data Pre - Processing BiLSTM
(800000 x N X (100 units)
2) (Dense+Softmax)
Concatenation OR Output
SVM classifier

A/P

Conv2
Conv1 (256 filters of Conv3 MHA (heads=5)
(256 filters of 2 x 3 size) (80 filters of
1 x 3 size) 1 x 3 size)
BiLSTM
(100 units)

Fig. (1) Proposed Model: CNN-BilSTM-MHA dual stream

z I (m) = ℜ [z(m)] and z Q (m) = ℑ [z(m)] , (2)


where ℜ [·] and ℑ [·] represent the real and imaginary parts of a complex num-
ber, respectively. The signal that is received is represented in the form of a
2 × N matrix as
 I
z (0) z I (1) ... z I (N − 1)
  I
Z
Z= Q = , (3)
z (0) z Q (1) ... z Q (N − 1) ZQ

The row vectors Z I and Z Q have dimensions of 1 × N .


In signal preprocessing, the raw data is converted to Amplitude/Phase
(A/P) format which is obtained using the real and imaginary components of
the signal. The A/P matrix is comprised of two separate vectors, namely, the
amplitude vector Z A and the phase vector Z P , which are denoted as:

z A (0) z A (1) ... z A (N − 1)


   A
A/P Z
Z = P = , (4)
z (0) z P (1) ... z P (N − 1) ZP

The magnitude and phase vectors are obtained from in-phase and quadrature
components as q
2 2
z A [m] = (z I [m]) + (z Q [m]) , (5)
 Q 
z [m]
z P [m] = arctan . (6)
z I [m]

3 Proposed Model
In the proposed work, we utilize deep learning architecture for feature extrac-
tion and an SVM classifier for the actual classification task. We first train the
complete deep architecture using training data and once training is done, the
dense+softmax classifier is removed and the remaining architecture is used as
the feature extractor. Using this, features are extracted for the training data
SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based Deep

and are used to train the support vector machine classifier. We next discuss
the proposed deep learning based architecture below. The dual-stream archi-
tecture involving CNN, LSTM, and multi-head attention is explored in the
computer vision domain [26] and has outperformed the state-of-the-art. The
method in [19], which is proposed for AMC in OMA systems, also reports
that using CNN after LSTM gives better results. Hence, considering this, we
propose a dual-stream architecture, where each stream comprises three convo-
lution layers, followed by a BiLSTM layer, and a multi-head attention (MHA)
layer is used at the end of each stream, as shown in Fig. 1. After concatenating
features learned from both streams, they are fed to three dense layers having
128, 64 and 4 nodes followed by a softmax layer having four nodes. In Eq. (1),
yf (n) and yn (n) can use different modulation types from the predefined set of
modulation techniques. So in each sample of signal we can have combination
of two different modulations. For example, we can have BPSK modulation for
near user and QPSK for the far user. In this case, every sample will be a sum of
one of the constellation point from BPSK’s two possible constellation points,
and one from QPSK’s four possible constellation points. Since every received
NOMA signal is a combination of constellations utilized by near and far user
and every modulation technique has a different set of constellations, it contains
a special information. CNN is very good at learning spatial features so we have
utilized CNN architecture for extracting the spatial features present in the
input data. The values in each sample in the received NOMA signal changes
according to the modulations used in near and far users. Hence it contains the
temporal information and we have utilized BiLSTM for extracting the tempo-
ral features. After extracting the features using CNN-LSTM architecture, we
have added a multi-headed attention layer which allows the model to focus
on different aspects of the input data at the same time. This is essential for
capturing a wide range of complex patterns in the data. Each “head” in the
multi-head attention works independently, focusing on different positions and
representations while single head attention might give average representation.
We have added a multi-headed attention layer which is responsible for
encoding the temporal information present in the input features and capturing
their local temporal relationships.
We have only discussed the structure of Stream 1 because the structure
of both streams is the same. In each stream three convolution layers are used
with the specifications as follows:
• 1st Convolutional Layer- 256 filters each of size 1 × 3
• 2nd Convolutional Layer- 256 filters each of size 2 × 3
• 3rd Convolutional Layer- 80 filters each of size 1 × 3
The rectified linear unit (ReLU) is used as the activation function for all
the convolution layers. The input shape is N × 2 considering N samples for
each signal, which is achieved by concatenating the real and imaginary vectors
in the first stream, amplitude, and phase vector each in the second stream. The
convolution layer extracts spatial features from the varied representation used
8 SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based

in modulation signals. A (N −2)×2×256 feature map is obtained after the first


convolution layer. Consecutively, (N −4)×1×256, and (N −6)×1×80 feature
maps are obtained after the second and third convolution layers, respectively.
The output from third convolution layer is reshaped to a (N − 6) × 80 and fed
to the BiLSTM layer of 100 units. In BiLSTM the training sequence is passed
backward and forward to two independent LSTM networks, and after this, we
concatenate the results from both the forward and backward LSTM at each
time step.
To leverage the temporal correlations learned by the BiLSTM layer, a MHA
layer is introduced. The self-attention layer serves the purpose of capturing the
temporal dependency between the extracted features and learning long-term
relationships within the data. By performing multiple attention computations
in parallel, each head of the MHA layer can attend to different parts of the
input sequence, allowing the model to capture diverse temporal information.
Multi-Headed attention applies self-attention to associate each individual sam-
ple in the input to other samples in the input. We replicate the output of the
BiLSTM and pass that as Query (Qi ), Key (Ki ) and Value (Vi ) of the Self-
attention layer. The self-attention for ith head is computed as described in Eq.
(7)

Qi .(Ki )T
Attentioni (Qi , Ki , Vi ) = Sof tmax( √ ).Vi . (7)
dk

In multi-headed attention, multiple self-attention undergo in parallel where


each self-attention process is called a head. After performing the individual
attention computations in each head, the resulting H representations (corre-
sponding to the attended values) are concatenated to form the final output of
the MHA layer as described in Eq. 8.

M ultiHead(Q, K, V ) = Concat(head1 , ...., headH ), (8)

where H represents the number of heads. The outputs of MHA layer from both
the streams are concatenated to make the final feature matrix, mathematically
represented as follows
 
f output = Concat fs1 (X I/Q ), fs2 (X A/P ) , (9)

where fs1 and fs2 represent the feature functions of streams 1 and 2 and
X I/Q and X A/P are the input I/Q matrix and A/P matrix, respectively. Next,
the f output is flatted and applied as input to the dense+softmax classifier. We
use Adam optimizer [27] for training the model.
The features extracted from the pre-trained CNN-BiLSTM-MHA dual
stream model are used to train SVM classifier with radial basis function (RBF)
kernal. Finally, the SVM classifier output predicts the classes to which the
signal belongs. The model’s output is depicted by:
SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based Deep

Fig. (2) Multi - Head Attention

Y = F (X I/Q , X A/P , β). (10)


The overall function of the classifier is denoted by F(.) and the model
parameter obtained by training is β.

4 Experimental Dataset
This section delves into the procedure of dataset generation, which involves
employing the system model outlined in section 2 followed by the training and
testing details.

4.1 Dataset
Initially, the raw bit stream is modulated by randomly choosing modulation
formats from a predefined set M for both the far and near users. Subsequently,
the modulated symbols for both users are normalized to have unit energy. Eq.
(1) represents the signal received at the near user side. The Rayleigh fading
and AWGN is employed to replicate the real-world scenarios. To facilitate
experimentation, the values of Pf /Pn (the far to near users’ power ratio) and
N are systematically varied. For each pair of values, a dataset is generated by
varying the SNR within the range of −10 dB to 20 dB, with a step size of
2 dB. Every dataset consists of three dimensions: 800, 000 × N × 2. The first
dimension represents the number of signals, where each signal comprises of N
10 SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention base

Table (1) Dataset details

Parameter Details
Modulation schemes BPSK, QPSK, 8PSK, 16QAM,
Signal format Inphase and Qaudrature component
Sampling rate 1 MHz
Channel Rayleigh fading channel
Noise AWGN
SNR range (in dB) -10:2:20
Power ratio 2, 4
Number of samples in a 100, 200, 400, 800
signal (N)
Total number of signals 800,000
in a dataset

time samples of the I (in-phase) and Q (quadrature-phase) components. The


datasets are originally introduced in [24] and we have used in this work to
evaluate the classification performance of our proposed model. The details of
the datasets are summarized in the Table 1.

4.2 Training and Testing Details


The NOMA dataset is divided into training and testing sets with a ratio of
80 : 20, respectively. The dataset contains a total of 800, 000 signals. The
proposed model is trained using 640, 000 signals and evaluated on 160, 000
signals. The 20% of the training set is used as validation set while training. The
labels are one-hot encoded as required for comparing with the probabilities
generated by the softmax function in the last layer of the dense classifier. The
20% of the training sets is used as validation set while training. The model
is compiled using the Adam optimizer with a learning rate of 0.001. Since
our classification problem is a multiclass classification problem, the categorical
cross-entropy loss function is employed. The training process spans over 50
epochs with a batch size of 512. Once the model is trained, the output features
from the flatten layers are extracted for the training set and used for training
the SVM classifier. After training, The features are extracted from the flatten
layer of the model for the test set also and then the performance of the trained
SVM classifier is evaluated.

5 Results and Discussion


In this section, we first analyze the performance of the proposed architec-
ture for different parameter settings. Next, we compare the classification
performance of existing AMC approaches with the proposed architecture.

5.1 Ablation Study


We systematically examine the performance of the proposed model under dif-
ferent setups, shedding light on the impact of individual components and input
SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based Deep

modalities. The signals with length of 800 and power ratio 4 are used for train-
ing the models in the study. The results of the ablation study are given in
Table 2 and the analysis for each setup is discussed individually as follows:

5.1.1 One-stream Vs Dual-stream


The proposed model takes IQ in one stream and AP in another stream. To
assess the impact of the dual stream, we removed one of the streams and
trained the model separately for each input type (IQ/AP). With the IQ input
in one stream, the model achieved a classification accuracy of 75.89%, while
with the AP input, it reached 75.14%. In contrast, the proposed dual stream
model achieved a higher accuracy of 82.15%, which is about 6% higher than
the one-stream model with IQ or AP input. Although the dual stream model
is more complex, considering the classification performance justifies its use.

5.1.2 with BiLSTM Vs without BiLSTM


The BiLSTM layers from both streams are removed to assess the impact of
the BiLSTM layer in the proposed model. Without BiLSTM layers, the model
classifies the validation set with 78.48% accuracy, which is about 4% less than
the proposed model with BiLSTM layers. With BiLSTM, the proposed model
has 1,261,618 parameters, while without BiLSTM, the model has 17,181,938
parameters, which is 13.6 times higher. Hence, using the BiLSTM layers also
reduces the complexity.

5.1.3 With MHA Vs without MHA


Without MHA, the model achieves an accuracy of 77.55%, which is approx-
imately 4% lower than the accuracy of the proposed model with MHA.
Additionally, there is 0.0113% increase in the number of parameters when
MHA is used in the model, which is considered negligible.

5.1.4 Number of convolution layers


When a convolution layer with 256 kernels is used in both streams, the model
achieves an accuracy of 74.47%. Adding another layer of 256 kernels just after
the first convolution layer in both streams increases the accuracy to 76.12%.
When four convolution layers are employed in both streams, with the first two
having 256 kernels and the last two having 80 kernels, the model achieves an
accuracy of 82.17%. In comparison, the proposed model with three convolu-
tion layers attains an accuracy of 82.15%, which is approximately 7% and 6%
higher than the performance of the model with 1 and 2 convolution layers,
respectively. Interestingly, the model with 4 convolution layers yields almost
the same accuracy. Therefore, in the proposed model, 3 convolution layers are
used to maintain lower complexity.
Hence, the ablation study results emphasize the crucial role of incorporat-
ing both IQ and AP inputs, the benefits of employing BiLSTM layers, the
12 SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention base

Table (2) Ablation study results

Setup Accuracy in
%
One-Stream (IQ input) 75.89
One-Stream (AP input) 75.14
w/o BiLSTM 78.48
w/o MHA 77.55
1 convolution layer 74.47
2 convolution layers 76.12
4 convolution layers 82.17
Proposed model 82.15

1.0

0.9

0.8

0.7
Accuracy

0.6
CB [23]
0.5 CNN-2 [16]
CNN-4 [17]
CLDNN [19]
0.4
MRCNN [24]
SigNet [21]
0.3 MM-Net [22]
Pr p sed
0.2
−10 −5 0 5 10 15 20
SNR

Fig. (3) Accuracy Across SNRs: Proposed vs. SOTA.

positive influence of Multi-Head Attention (MHA), and the optimal complexity


achieved with three convolution layers. These insights contribute to a nuanced
understanding of the model’s components, guiding the selection of an effective
architecture for modulation classification tasks.

5.2 Performance Evaluation


Fig. 3 shows the plot for accuracy vs. SNR for all the models at N = 800 and
Pf /Pn = 4. The accuracy approaches almost 100% with the improvement in
the SNR of 20 dB. It is observed that, signals with an SNR exceeding 2.5 dB
can be accurately classified, achieving a classification accuracy exceeding 95%.
For low SNRs ranging from −5 dB to 5 dB, the accuracy of the proposed
model increases gradually from 60% to 97.8%. After 7.5 dB, the proposed
model accuracy saturates to 99.99%. For a scenario with N = 800 and a power
ratio of Pf /Pn = 4, the overall 82.15% accuracy is achieved considering all
SNR levels.
SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based Deep

Table (3) Overall Accuracy of all Methods on Different Datasets (in %)


Pf
S.No. N Pn
CB CNN- CNN- CL- MR- SigNet MMNet Proposed
method 2 4 DNN CNN model
1 200 2 62.86 66.30 67.79 65.92 69.19 55.66 61.11 70.83
2 800 2 72.96 71.39 73.07 67.59 76.77 64.61 67.82 77.61
3 100 4 65.76 68.00 69.16 69.51 70.36 58.95 51.15 71.54
4 200 4 71.23 69.85 73.83 73.08 74.96 61.02 64.59 75.87
5 400 4 74.85 75.62 76.65 75.32 78.36 64.85 68.14 79.16
6 800 4 77.93 77.60 71.39 71.27 81.30 67.83 71.48 82.15

The overall accuracy of a classifier is evaluated by testing it on a dataset


that comprises signals at various SNR levels. This provides a comprehensive
assessment of the classifier’s performance across different SNR conditions. This
accuracy measure considers the classifier’s performance across all SNR levels
and provides an overall assessment of its effectiveness in correctly classifying
the signals. In Table 3, the accuracy for six different scenarios considering
different Pf /Pn ratios and the number of samples N is tabulated. We can
observe that with increase in N and Pf /Pn ratio, the classification performance
improves. For higher values of N , the proposed model performs better due
to the high number of samples being fed to the model. With high Pf /Pn ,
the separation between far and near user increases making them easier to be
separated. The better performance of the proposed model can be attributed
to better feature extraction and capturing of time correlation of signals.
Fig. 4 displays three confusion matrices for the model with Power Ratio
= 4 and N = 800. The diagonal values are quite dense for SNR = 10 dB.
Thereby, most of the signals are correctly classified for high SNRs, and even
until SNR= 6 dB, most of the signals are correctly classified.
The confusion matrix depicted in Fig. 4c corresponds to the entire NOMA
dataset. When the far user at the base station adopts BPSK modulation,
there is a 97.37% probability that it will be accurately predicted at the near
user end. The accuracy of classification for QPSK, 8PSK, and 16QAM are
90.67%, 86.72%, and 88.79%, respectively. Our model could not predict about
11% to 14% of the signals because at low SNRs, correctly identifying modula-
tion schemes such as QPSK, 8PSK, and 16QAM becomes challenging for the
model. This difficulty arises because these modulation schemes share certain
constellation points, making their distinction more difficult in noisy environ-
ments. The confusion matrix for 0dB SNR is shown in Fig. 4a. For 0 dB SNR,
about 22% signals are considered to be classified as 16-QAM even when the
true label of the signals was 8-BPSK modulated and vice versa. Fig. 4b illus-
trates the confusion matrix for signals with an SNR of 10dB. At higher SNRs,
all modulation schemes can be classified with a perfect accuracy of 100 %.

5.3 Comparative Analysis with existing architectures


Fig. 3 compares the accuracy of proposed CNN-BiLSTM-MHA with the
cumulant-based (CB) method[23], MR-CNN[24], CNN-2 [16], CNN-4 [17],
CLDNN[19], SigNet [21], and MMNet [22] models for different SNR levels. Note
14 SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention base

BPSK
1.00 0.00 0.00 0.00 BPSK
1.00 0.00 0.00 0.00
True Labels

True Labels
QPSK
0.02 0.68 0.21 0.09 QPSK
0.00 1.00 0.00 0.00

8PSK
0.04 0.12 0.62 0.22 8PSK
0.00 0.00 1.00 0.00

16QAM
0.07 0.08 0.23 0.62 16QAM
0.00 0.00 0.00 1.00
BPSK QPSK 8PSK 16QAM BPSK QPSK 8PSK 16QAM
Predicted Labels Predicted Labels
(a) (b)

BPSK
0.97 0.01 0.01 0.01
True Labels

QPSK
0.02 0.74 0.16 0.08

8PSK
0.02 0.06 0.81 0.11

16QAM
0.03 0.05 0.16 0.76
BPSK QPSK 8PSK 16QAM
Predicted Labels
(c)

Fig. (4) The proposed model confusion matrices with Pf /Pn = 4 and N = 800 and (a)
For 0 dB SNR (b) For 10 dB SNR and (c) For All SNR’s
that, CNN-2, CNN-4, CLDNN, SigNet, and MMNet were developed for OMA
systems whereas there CB and MR-CNN are the only reported work for AMC
in NOMA system. We can observe from the figure that the SNR vs accuracy
curve for the proposed model is above all the other curves indicating the supe-
riority of the proposed architecture. The proposed method performs almost 15
% to 20 % better than the CB method, CLDNN model, and achieves almost
similar accuracies as CNN-4 and CNN-2 models at low SNRs. The proposed
model outperforms all other models for SNRs higher than 0 dB. Between 0 dB
to 5 dB SNR, there is a steep increase in accuracy from 75 % to 96 % in the
proposed model. After 10 dB SNR, the accuracy of our proposed model gets
saturated to about 100 %. At SNRs ranging from -5 dB to 0 dB, we observe an
accuracy between 60 % to 75 %. The SigNet and MMNet models performed
well for AMC in OMA systems but from the Fig. 3, it can be seen that these
models could not classify the NOMA signals effectively. The proposed model
outperfoms the SigNet and MMNet models by 14.32% and 10.67% respectively
when signal length is 800 and power ratio is 4, as given in Table 3. The clas-
sification accuracies of the proposed CNN-BiLSTM-MHA dual stream model
and the other models are also given in this table. We can see that the proposed
architecture outperforms all the other models for various power ratios and the
sample size. When the power ratio is reduced, there will be a smaller disparity
in the power levels between the far and near users, making it more challenging
to differentiate between the modulations. As mentioned in Table 4, using SVM
SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based Deep

Table (4) Accuracy of Proposed Model with Dense and SVM Classifiers.
Pf
S. N Pn
Dense SVM Classi-
No. Classifier fier (in %)
(in %)
1 200 2 69.70 70.83
2 800 2 76.24 77.61
3 100 4 70.22 71.54
4 200 4 70.48 75.87
5 400 4 78.12 79.16
6 800 4 80.64 82.15

as a classifier instead of a dense classifier, helped us to get about a 1 % - 1.5


% improvement in accuracy for all six datasets. The improvement in accuracy
is because SVMs are generally better at handling high-dimensional feature
spaces, and aid in mapping the input features to a higher-dimensional space
and SVMs are more robust to noise and outliers in the data, which makes them
more suitable for real-world applications and are also less prone to overfitting.

5.4 Complexity Analysis


The computational complexity of the proposed model and SOTA methods in
terms of FLOPS and number of trainable parameters is given in Table 5. The
complexity analysis is performed for signals with N = 800 and power ratio
4. The trainable parameters, indicative of the model’s capacity to learn, vary
significantly across the methods. It’s important to note that the number of
trainable parameters should not be too low or too high. A low number of
trainable parameters may reduce the model’s capacity to learn, while a high
number can lead to a more complex network that may be prone to overfitting.
The computational complexity, measured in millions of Floating Point Oper-
ations Per Second (FLOPs), provides insight into the computational demand
of each method. The proposed method necessitates 968.61 million FLOPs,
which is considerably less than the number of FLOPs required by CNN4,
CLDNN, SigNet and MMNet. The proposed method achieves an accuracy of
82.15%, outperforming all other method. Notably, MRCNN and CNN2, with
an accuracy of 77.6% and 81.30% respectively, also demonstrate competitive
performance. The proposed method appears to strike a balance between com-
putational demand and model performance, making it a promising candidate
for AMC.

6 Conclusion
In this paper, we have proposed CNN-BiLSTM-MHA dual-stream architecture
for automatic modulation classification and compared it with several existing
deep-learning neural network architectures on the NOMA dataset. The good
performance of our proposed CNN-BiLSTM-MHA dual-stream architecture is
because the combination of CNN and BiLSTM can extract more powerful fea-
tures. The signals are preprocessed to convert them into temporal I/Q format
16 SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention base

Table (5) Computational Complexity Comparison

S. Model No. of No. of Accuracy


No. Trainable FLOPS (%)
Parameters (in Mil-
lions)
1 CNN2 [16] 818,916 9.01 77.60
2 CNN4 [17] 3,32,47,812 1,595.5 71.31
3 CLDNN [19] 5,17,25,652 1,529.76 71.27
4 SigNet [21] 2,41,10,208 4,146.50 67.83
5 MMNet [22] 2,02,67,123 43,355.44 71.48
6 MRCNN [24] 62,900 31.90 81.30
7 Proposed 12,01,774 968.61 82.15

and A/P representation, which enhances the extraction of useful features. In


our model, the CNN, BiLSTM, and MHA are thoroughly combined (i.e., CNN-
BiLSTM-MHA) to extract the spatial and temporal features from each signal
representation. CNN helps in spatial feature extraction, BiLSTM in extract-
ing time domain features, and MHA facilitates the retention of lengthy data
sequences and assists in capturing long-term relationships within time series.
This mechanism aids in effective learning and understanding patterns that
extend over extended periods. Hence, capture the time correlation of the sig-
nals. We have used SVM as a classifier instead of a dense classifier, leading
to more than 1% boost in the overall accuracy of the model. Superior results
of SVM classifiers are because SVMs are generally better at handling high-
dimensional feature spaces, which makes them well-suited for tasks such as
image classification tasks and are also more robust to noise and outliers in the
data. Our proposed model achieved 100% classification accuracy even with less
number of samples for each signal at high SNR. Additionally, the proposed
model performs quite well for low SNRs as well. Thereby, even with varying
SNR, our proposed model is consistent with accuracy. One possible extension
of the work is to consider more number of modulation types while generating
the dataset and develop a classifier that works well for that scenario. Also,
we have not considered inter-channel interference between the NOMA signals
while generating the dataset. Such dataset can be developed to evaluate the
performance of the architectures on more realistic scenarios.

Author contributions: All authors contributed equally to this work.


Funding: Not applicable
Data availability: Data will be made available on request.
SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based Deep

Declarations
Conflict of interest: Not applicable

References
[1] Dai, L., Wang, B., Yuan, Y., Han, S., Chih-lin, I., Wang, Z.: Non-
orthogonal multiple access for 5G: solutions, challenges, opportunities,
and future research trends. IEEE Communications Magazine 53(9), 74–81
(2015). https://doi.org/10.1109/MCOM.2015.7263349

[2] O’Shea, T.J., Corgan, J., Clancy, T.C.: Convolutional radio modula-
tion recognition networks. In: International Conference on Engineering
Applications of Neural Networks, pp. 213–226 (2016). Springer

[3] O’Shea, T.J., Roy, T., Clancy, T.C.: Over-the-air deep learning based
radio signal classification. IEEE Journal of Selected Topics in Signal
Processing 12(1), 168–179 (2018)

[4] Rajendran, S., Meert, W., Giustiniano, D., Lenders, V., Pollin, S.: Deep
learning models for wireless signal classification with distributed low-cost
spectrum sensors. IEEE Transactions on Cognitive Communications and
Networking 4(3), 433–445 (2018)

[5] Zhang, R., Yin, Z., Wu, Z., Zhou, S.: A novel automatic modulation clas-
sification method using attention mechanism and hybrid parallel neural
network. Applied Sciences 11(3), 1327 (2021)

[6] Kim, S.-H., Kim, J.-W., Nwadiugwu, W.-P., Kim, D.-S.: Deep learning-
based robust automatic modulation classification for cognitive radio
networks. IEEE access 9, 92386–92393 (2021)

[7] Wang, D., Zhang, M., Li, Z., Li, J., Fu, M., Cui, Y., Chen, X.: Modulation
format recognition and OSNR estimation using CNN-based deep learning.
IEEE Photonics Technology Letters 29(19), 1667–1670 (2017)

[8] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning
applied to document recognition. Proceedings of the IEEE 86(11), 2278–
2324 (1998)

[9] Ali, A., Yangyu, F., Liu, S.: Automatic modulation classification of digital
modulation signals with stacked autoencoders. Digital Signal Processing
71, 108–116 (2017)

[10] Häring, L., Chen, Y., Czylwik, A.: Automatic modulation classification
methods for wireless OFDM systems in TDD mode. IEEE Transactions
on Communications 58(9), 2480–2485 (2010). https://doi.org/10.1109/
18 SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention base

TCOMM.2010.080310.090228

[11] Huang, Q.-S., Peng, Q.-C., Shao, H.-Z.: Blind modulation classifica-
tion algorithm for adaptive OFDM systems. IEICE transactions on
communications 90(2), 296–301 (2007)

[12] Wu, X., Wei, S., Zhou, Y., Liao, F.: TSN-A: An efficient deep learn-
ing model for automatic modulation classification based on intra-class
confusion reduction of modulation families. IEEE Communications Let-
ters 26(12), 2964–2968 (2022). https://doi.org/10.1109/LCOMM.2022.
3210586

[13] Usman, M., Lee, J.-A.: AMC-IoT: Automatic modulation classification


using efficient convolutional neural networks for low powered iot devices.
In: 2020 International Conference on Information and Communication
Technology Convergence (ICTC), pp. 288–293 (2020). https://doi.org/10.
1109/ICTC49870.2020.9289261

[14] Song, G., Jang, M., Yoon, D.: CNN-based automatic modulation classifi-
cation in OFDM systems. In: 2022 International Conference on Computer,
Information and Telecommunication Systems (CITS), pp. 1–4 (2022).
https://doi.org/10.1109/CITS55221.2022.9832989

[15] Huynh-The, T., Hua, C.-H., Pham, Q.-V., Kim, D.-S.: MCNet: An effi-
cient CNN architecture for robust automatic modulation classification.
IEEE Communications Letters 24(4), 811–815 (2020). https://doi.org/
10.1109/LCOMM.2020.2968030

[16] West, N.E., O’Shea, T.: Deep architectures for modulation recognition.
In: 2017 IEEE International Symposium on Dynamic Spectrum Access
Networks (DySPAN), pp. 1–6 (2017). https://doi.org/10.1109/DySPAN.
2017.7920754

[17] Liu, X., Yang, D., Gamal, A.E.: Deep neural network architectures for
modulation classification. In: 2017 51st Asilomar Conference on Signals,
Systems, and Computers, pp. 915–919 (2017). https://doi.org/10.1109/
ACSSC.2017.8335483

[18] Ramjee, S., Ju, S., Yang, D., Liu, X., Gamal, A.E., Eldar, Y.C.: Fast Deep
Learning for Automatic Modulation Classification. arXiv (2019). https://
doi.org/10.48550/ARXIV.1901.05850. https://arxiv.org/abs/1901.05850

[19] Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, long short-
term memory, fully connected deep neural networks. In: 2015 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pp. 4580–4584 (2015). https://doi.org/10.1109/ICASSP.2015.
7178838
SVM-Driven Modulation Classification in NOMA Systems with CNN-Attention based Deep

[20] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image
recognition. In: 2016 IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/
CVPR.2016.90

[21] Chen, Z., Cui, H., Xiang, J., Qiu, K., Huang, L., Zheng, S., Chen, S.,
Xuan, Q., Yang, X.: SigNet: A novel deep learning framework for radio sig-
nal classification. IEEE Transactions on Cognitive Communications and
Networking 8(2), 529–541 (2021)

[22] Triaridis, K., Doumanidis, C., Chatzidiamantis, N.D., Karagiannidis,


G.K.: MM-Net: A multi-modal approach towards automatic modulation
classification. IEEE Communications Letters (2023)

[23] Li, T., Li, Y., Dobre, O.A.: Modulation classification based on fourth-
order cumulants of superposed signal in NOMA systems. IEEE Trans-
actions on Information Forensics and Security 16, 2885–2897 (2021).
https://doi.org/10.1109/TIFS.2021.3068006

[24] Parmar, A., Captain, K., Satija, U., Chouhan, A.: Modulation clas-
sification for non-orthogonal multiple access system using a modified
residual-CNN. In: 2023 IEEE Wireless Communications and Networking
Conference (WCNC), pp. 1–6 (2023). IEEE

[25] Yan, C., Harada, A., Benjebbour, A., Lan, Y., Li, A., Jiang, H.: Receiver
design for downlink non-orthogonal multiple access (NOMA). In: 2015
IEEE 81st Vehicular Technology Conference (VTC Spring), pp. 1–6
(2015). IEEE

[26] Yin, X., Liu, Z., Liu, D., Ren, X.: A novel CNN-based Bi-LSTM paral-
lel model with attention mechanism for human activity recognition with
noisy data. Scientific Reports 12(1), 7878 (2022)

[27] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980 (2014)

You might also like