0% found this document useful (0 votes)
41 views

Bio Optimization of Deep Learning Network Architectures 22fguqp5

Uploaded by

gaurishukla412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Bio Optimization of Deep Learning Network Architectures 22fguqp5

Uploaded by

gaurishukla412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Hindawi

Security and Communication Networks


Volume 2022, Article ID 3718340, 11 pages
https://doi.org/10.1155/2022/3718340

Research Article
Bio-Optimization of Deep Learning Network Architectures

Shanmugavadivu P,1 Mary Shanthi Rani M ,1 Chitra P ,1 Lakshmanan S ,1 Nagaraja P,1


and Vignesh U 2
1
Gandhigram Rural Institute, Dindigul, India
2
Information and Communication Technology Department, Manipal Institute of Technology,
Manipal Academy of Higher Education, Manipal, Karnataka, India

Correspondence should be addressed to Vignesh U; [email protected]

Received 4 June 2022; Accepted 24 August 2022; Published 20 September 2022

Academic Editor: C. Venkatesan

Copyright © 2022 Shanmugavadivu P et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.

Deep learning is reaching new heights as a result of its cutting-edge performance in a variety of fields, including computer vision,
natural language processing, time series analysis, and healthcare. Deep learning is implemented using batch and stochastic
gradient descent methods, as well as a few optimizers; however, this led to subpar model performance. However, there is now a lot
of effort being done to improve deep learning’s performance using gradient optimization methods. The suggested work analyses
convolutional neural networks (CNN) and deep neural networks (DNN) using several cutting-edge optimizers to enhance the
performance of architectures. This work uses specific optimizers (SGD, RMSprop, Adam, Adadelta, etc.) to enhance the per-
formance of designs using different types of datasets for result matching. A thorough report on the optimizers’ performance across
a variety of architectures and datasets finishes the study effort. This research will be helpful to researchers in developing their
framework and appropriate architecture optimizers. The proposed work involves eight new optimizers using four CNN and DNN
architectures. The experimental results exploit breakthrough results for improving the efficiency of CNN and DNN architectures
using various datasets.

1. Introduction algorithms. The human brain evokes the neural network


structure, which will parade the neuron’s functions and
The current technology-driven application focuses on AI-based gestures to at least one another. The architecture of the neural
ways to realize practical problems for varied exercises. Deep network is shown in Figure 1.
learning could be a crucial technology for various applications The process of neural networks has some attributes in
with extensive information for the process. Deep learning their methodologies. The crucial options are deduced from
methods imply an optimization fashion for enhancing their the following:
performance [1, 2]. CNN is concerned as a category of deep
neural networks (DNN), which will fete and cluster-specific (i) Input: it is the set of options fed into the model for
components from photos and are loosely utilized for visual the accomplishment method. For illustration, the
activity photos. Their applications vary from image and video input in object discovery may be an array of con-
acknowledgment, image arrangement, clinical image exami- stituent values concerning a picture.
nation, laptop vision, and traditional language handling [3–5]. (ii) Weight: it is the main operation to provide sig-
Generally, AI-based uses neural configuration to faux the nificance to those options contributing to accom-
bumps achieving advanced delicacy with bottom time quality. plishment. It introduces scalar addition between the
Neural networks are tangled as artificial neural networks or input price and also the weight matrix. A negative
dissembled neural networks. It is also a set of machine ac- word would impact the choice of the sentiment
complishments and the core of deep accomplishment analysis model more than a brace of neutral words.
2 Security and Communication Networks

Neuron adjustments, and the loss function determines whether this


X1 Wk1 bias Activation improvement was successful or not. By performing their
bk function
values, the loss function is utilized to determine the total loss
Wk2 Outout in the dataset.
X2 ∑ φ (.)
uj yk

Summation
1.1.2. The Learning Rate. The score of weights by adding and
Xm Wkm abating too much can hamper the loss function. There is no
longer to jump for an optimal value for a given weight. This
Figure 1: The architecture of the neural network. term is defined as the literacy rate medium. This process can
apply to a small number like 0.001 that can multiply the
slants by spanning them.
(iii) Transfer function: the job of the transfer function is
to mix multiple inputs into one affair price, so the
activation function may be applied. It is done by 1.1.3. Regularization Process. Experimenters in machine
accessible information to the transfer function. literacy are constantly terrified of overfitting problems.
(iv) Activation Function: it introduces nonlinearity Overfitting occurs when a model performs well on the data
within the operating of the perceptron to con- used to train it but poorly on fresh data that arise in the
template variable one-dimensionality with the in- actual world. This is only possible if one parameter domi-
puts. While not this, the output would be a linear nates the formula and is counted excessively. To prevent this,
combination of input values and would not be regularization is a phrase that has been introduced to the
appropriate to introduce nonlinearity within the optimization process. The loss function has an additional
network. component that penalizes high weight values during regu-
Deep learning systems are large, complex, and frequently larization. If the predictions are accurate, penalties for
involve numerous layers and nonlinearity, which makes having accurate predictions with high weight values are
them difficult to optimize. Optimizers must be forced to stir obtained. This ensures that weights remain on the lower side,
up a complex system that is difficult to understand. Some improving their ability to generalize the new data.
deep learning systems only provide a small number of pa-
rameters that may be modified, which reduces their use- 1.2. Types of Optimizers. The various fundamental opti-
fulness. Deep learning models can still be improved and mizers to reduce the loss function are described as follows.
created more easily in some rational ways.

1.2.1. Gradient Descent (GD) Optimizer. The most funda-


1.1. Overview of Optimizers. Optimizers are techniques or mental optimizer is gradient descent, which is a smooth
algorithms used to reduce a loss function (error function) or process. This could reduce the loss by using the derivatives of
increase production efficiency. Optimizers are mathematical the loss operation and learning rate. Once the effective
operations that depend on the weights and biases. The parameters have been shared among all of the different
features of neural networks, such as weights and learning layers, this method will borrow the backpropagation in
rate, are modified using optimization algorithms and neural networks. While the gradient is calculated for the
techniques, which lower the losses that occur during their dataset, slowing down the algorithm, the weights are ef-
operation. Typically, optimizers are used to split up opti- fective. To create a resource-empty method, a considerable
mization tasks by minimizing the function. The weight is amount of RAM is required. If this algorithmic rule needs to
initialized using several starting procedures and is optimized be adjusted, the overall strategy is found better.
each time:
Wnew � Wold − lr ∗ ∇W L􏼁Wold . (1)
1.2.2. Stochastic Gradient Descent (SGD). A modified in-
The above equation updates the weights to reach the terpretation of the GD system where the model parameters
most accurate result. The stylish effect can be achieved using are efficient on each replication is called stochastic gradient
optimization strategies or algorithms called optimizers. descent. The loss operation is tested after every coaching
Colorful optimizers have been examined with their ad- sample, proving that the model is effective. These regular
vantages and disadvantages. The model’s literacy parame- updates allow for faster minimum compliance. The model
ters, such as weights and impulses, are used to define bridge will be created in the necessary place but at the cost of
optimizers, which are defined as fine functions. increased variance. The advantage of this approach is that it
uses less memory than the previous one because it is not
necessary to retain the most recent values of the loss
1.1.1. Loss Function. The foundation of machine learning functions. In the convolution setting, SGD-based optimizers
algorithms is the loss function. The model assessment system that employ various hyperactive parameters are regarded as
determines whether it is useful for forecasting. The per- competitive species similar to the complement of the
formance of the model is also improved through algorithmic optimizers.
Security and Communication Networks 3

1.2.3. Minibatch Gradient Descent. Another form of this GD similar crucial dynamic component, and varying it may
method is known as minibatch, where the model parameters change the tutoring tempo. A complex learning rate for a
are still useful for tiny batch sizes. To ensure that the model is dispersed purpose input is observed where the maximum of
paced towards minima gradually and to prevent frequent the values is zero to increase the fading gradient acting from
derailments, it is indicated that the model parameters are these lightweight options.
updated every ‘n’ batches. This leads to low variation within
the model and decreased memory usage.
1.2.9. AdaDelta. By addressing the issues of losing the ac-
quisition rate due to the monotonously increasing add of the
1.2.4. Momentum-Based Gradient Descent. The parameters court of slants, AdaDelta adheres to the broad interpretation
supplied by the first-order outgrowth of the loss function are of AdaGrad. AdaDelta compiles the total number of once
being updated by backpropagating the system. The number gradients; however, it only takes a few once slopes into
of updates inside the parameters is sometimes overlooked, account rather than all angles. Another method, like Ada-
even though the frequency of updates is frequently repli- Delta, to restore AdaGrad’s declining learning rate is
cated for every batch or every time. The term “initiation” in RMSProp.
this optimizer refers to the inclusion of this historical
component in later updates, which will speed up the overall
1.2.10. Adamax. The resolving movement estimation opti-
process.
mization algorithm has been extended by the Adamax
formula. It is an expansion of the gradient descent opti-
1.2.5. Nesterov Accelerated Gradient (NAG). The instiga- mization formula, which is used a lot in astronomy. The
tion-based largely GD is currently very widespread, down to formula was defined by Jimmy Lei Ba and Diederik Kingma.
the lowest levels. The system trials fluctuate, enter the
minimum boundary, and add to the total number of times. 1.2.11. NAdam. The reconciling movement estimation op-
The next technique is also not up to standard GD. But this timization is extended to include Nesterov’s accelerated
problem also needs an exhausted NAG repair. The strategy grade (Horse), also known as Nesterov instigation, which is a
adopted was to first develop the history component before complex type of momentum.
creating the parameters update. Calculations are made to the
outgrowth, which could cause it to advance or regress. This is
known as the “look-ahead strategy,” and it makes even more 1.2.12. FTRL. To estimate click-through rates, Google cre-
sense because the wind is blowing almost at the minimum. ated “Follow the Regularized Leader” (FTRL) in the early
2010s. According to McMahan, the shallow models work
better for large dispersed areas.
1.2.6. RMSProp. RMSProp frequently enhances the Ada-
grad optimizer. This optimizer uses an exponential tradi-
2. Related Works
tional of the slants to reduce the acquisition rate. Acquiring
rate reconciliation is still comprehensive because classic can This section presents a review of recent works of literature
manage various acquisition rates under settings with more based on the various optimizers and their performance using
small updates and a lesser rate under extremely complex CNN and DNN architectures.
update conditions. The authors proposed a frame using the DNN-based
optimization strategy for prognosticating the true optimum.
1.2.7. Adam. The RME optimizer combines the RMSprop The ways are proposed to discover operations in the early
and instigation-primarily based on GD methodologies. The stages of aerospace design [6]. In [7], the authors described a
possibility for stimulus in Adam optimizers to recover the random multimodal deep learning (RMDL) which is an
data from history results in balancing acquisition rate gain ensemble system to break the problem of finding a stylish
from the RMSprop. The technique demonstrates the im- deep learning structure. Principally, RMDL takes the mul-
portance of the Adam optimizer. Two hyperactive settings tiple aimlessly generated model for training using the deep
are introduced in this optimizer to fit the use case. neural network (DNN), convolutional neural network
(CNN), and intermittent neural network (INN) for
achieving better results. A double algorithm is defined as an
1.2.8. Adagrad-Reconciling Gradient Formula. Adagrad is a optimizer for mongrel anomaly discovery for intrusion
reconciling grade optimizer that updates the higher price discovery evaluation. The exploration work conducts the
(high acquisition rates) for parameters with infrequent trial with the anomaly bracket of IDS using DNN [8].
options and modifies the acquisition rate to a lower price for AdaSwarm is a gradient based on outgrowth-based
parameters associated with rush of options circumstances, optimization. It implies on functions a differentiable sphere.
particularly the justification for dealing with distributed AdaSwarm includes an exponentially weighted momentum
information. Although the intended work is what the model flyspeck swarm optimizer (EMPSO) for making effective
parameters are primarily focused on, they also have an analysis [9]. The authors discovered an ATMO (AdapTive
impact on our coaching because they are assigned consistent Meta Optimizers) which integrates two different optimizers
prices for the duration of the coaching. The learning rate is a for importing the benefactions and produces the result with
4 Security and Communication Networks

a single optimizer [10]. In [11], the authors determined an 3. Methodology


intertwined model as EO-ELM in a deep neural network
using R-R modeling. The efficacy of a model is estimated This section presents the ways which are included in this
using query analysis and two-tagged t-tests. The authors proposed work. This proposed work uses the CNN and DNN
described the Identifier-Actor-Optimizer (IAO) policy architecture using eight new optimizers to accelerate the
learning armature for applying a real-time optimum control architecture performance. This study reveals different results
for nonstop-time and nonlinear systems [12]. The authors using different optimizers. Each optimizer has demonstrated
presented a learning frame using evolutionary-based opti- using their dataset and architecture. During the trial, the
mizers using DNN armature with generated samples. In this optimizers were tested with different learning rates for
approach, the authors used a simulation of evolutionary- tuning better results.
based combinatorial optimizers [13]. Optimizers guide modifying the neural network’s
In [14], the authors proposed a system based on opti- weights and learning rate to minimize losses. The weights for
mization analysis using the previous electrode mock two- each epoch are adjusted during deep learning model training
dimensional (P2D) lithium-ion battery model. The model and reduce the loss function. An optimizer is a procedure or
DeepChess is described for the confluence of optimization, method that alters neural network properties like weights
and an inheritable algorithm is included for maximizes the and learning rates. As a result, it aids in decreasing total loss
folding of the optimization brace. The design of a combi- and raising precision. A deep learning model typically has
nation of DNNOpt using underpinning literacy inspired a millions of parameters, making the task of selecting the
deep neural network-based black-box optimization frame proper weights for the model challenging. It highlights the
for enforcing analog circuit sizing [15]. A population-based importance to select an optimization algorithm that is ap-
evolutionary stochastic gradient descent (ESGD) frame for propriate for your application. Therefore, before delving
optimizing deep neural networks. ESGD combines SGD and deeply into the subject, it is vital to comprehend these
grade-free evolutionary algorithms as reciprocal algorithms algorithms.
in one frame in which the optimization alternates between Different optimizers are used in the proposed work to
the SGD step and elaboration step to ameliorate the average adjust your weights and learning rate. The optimal optimizer
fitness of the population [16]. The authors described the to use, though, depends on the application. The major
layerwise literacy-based stochastic grade descent system limitation is to try every possibility and pick the one that
(LLb-SGD) for grade-based optimization of objective yields the best results. This might not seem like a big deal at
functions in deep literacy, which is simple and computa- first, but when working with hundreds of gigabytes of data,
tionally effective [17]. even one epoch can take a while. The proposed CNN and
As the nearest processing unit to the sensors, the authors DNN architecture with various optimizers is shown in
proposed a deep maker frame that intends to automatically Figures 2 and 3, respectively.
create several mainly reliable DNN infrastructures for elim-
inating bias [18]. The authors explored the CIFAR-10 datasets 3.1. Convolutional Neural Network. In CNN, the word
hyperparameter hunt approaches using vibrant optimization “convolution” refers to the fine capability of confusion, a
techniques [19]. The original hunt system combined with the remarkable type of direct action in which two capabilities are
mongrel system of inheritable algorithms optimizes both duplicated to produce a third capability that communicates
network architecture and network training. Most reviews and the condition of one capability is altered by the other. Two
analyses have been performed utilizing studies that stan- images that can be used as lattices are copied to provide a
dardize the use of DNN infrastructures for bracket and dis- problem that is used to assess the picture’s key features. The
covery using ML and DL algorithms [20–22]. The authors basic architecture of CNN is shown in Figure 4. There are
describe the detection of malaria disease using the CNN primarily two ways to access CNN engineering:
technique with SGD, RMSprop, and Adam optimizers [23].
The authors present an analysis of various optimizers on the (i) A confusing device known as point birth separates
deep convolutional neural network model in the application of and identifies the picture’s brightest components for
hyperspectral remote sensing image classification [16]. The inspection
authors propose the performance analysis of different opti- (ii) A related subcaste that guesses the class of the
mizers for deep learning-based image recognition [22]. The picture based on the elements removed in earlier
review has been assessed by using various kinds of techniques phases using the problem from the complexity cycle
for CNN and DNN architectures. The existing research works
Convolutional layers, pooling layers, and fully associated
demonstrated the performance according to their selection of
layers are the three types of layers that make up the CNN. A
optimizers and architectures. The proposed work focuses on
CNN engineering will take shape once these layers are
CNN and DNN architectures with various kinds of optimizers
stacked. In addition to these three levels, the dropout layer
on a trial-and-error basis [10, 24–27].
and the enactment capability, which are described below, are
The existing research works demonstrated the perfor-
two other important limits.
mance according to their selection of optimizers and ar-
chitectures. The proposed work focuses on CNN and DNN
architectures with various kinds of optimizers on a trial-and- 3.1.1. Convolutional Layer. The primary layer for removing
error basis. the various elements from the input images is this
Security and Communication Networks 5

DATASET
Input Dataset MNIST Numeric Dataset
MNIST Fashion Dataset
MNIST Medical Dataset

8 Novel Optimizers
SGD
CNN Architecture RMSprop
ADAM
Adadelta
Adagrad
Adamax
Nadam
FTRL

MNIST Numeric
Output Data Dataset
MNIST Fashion Dataset
MNIST Medical
Dataset
Figure 2: CNN architecture with various optimizers.

DATASET
Input Dataset MNIST Numeric Dataset
MNIST Fashion Dataset
MNIST Medical Dataset

8 Novel Optimizers
SGD
RMSprop
DNN Architecture ADAM
Adadelta
Adagrad
Adamax
Nadam
FTRL

MNIST Numeric Dataset


Output Data MNIST Fashion Dataset
MNIST Medical Dataset

Figure 3: DNN architecture with various optimizers.

Fully
Connected
Convolution

Pooling Output
Input

Feature Extraction Classification


Figure 4: Architecture of CNN.
6 Security and Communication Networks

convolutional layer. This layer involves performing the Simple Neural Network Deep Learning Neural Network

proper confusion activity between the information image


and a channel of a chosen size M × M. The speck item is
taken between the medium and the knowledge picture
passageway with the muck’s dimensions by sliding the
pipeline over the knowledge image (M × M).

Input Layer
3.1.2. Pooling Layer. A convolutional layer is typically fol- Hidden Layer
Output Layer
lowed by a pooling layer. The crucial step in this layer is to
reduce the convolved direct diagram’s size to save on Figure 5: Architecture of simple NN and DNN.
computational costs. This is accomplished by reducing the
linkages between layers and working separately on each
element map. There are many types of pooling jobs 3.2. Deep Neural Network. To integrate AI into the daily
depending on the framework used. The most important activities of self-driving cars, smartphones, games, drones,
element of max pooling is derived from the highlighted map. etc., deep neural networks (DNNs) have emerged as a
Common pooling operates outside the bounds of the fun- promising solution. Most often, DNNs were accelerated by a
damentals in a measured image segment that has been boy with several computing devices, like a GPU, but current
predefined. Sum pooling figures the total quantity of the technological advancements call for energy-efficient DNN
essentials in the predefined detail. Most of the time, the acceleration as the most advanced operations moved down
pooling subcaste acts as a ground between the convolutional to mobile computing devices. Neural processing unit (NPU)
layer and the FC layer. infrastructures focused on accelerating DNN with minimal
energy consumption become necessary. Numerous experi-
ments have shown that exercising lower bit perfection is
3.1.3. Fully Connected Layer. The Fully connected (FC) layer sufficient for a conclusion with minimal power consump-
connects the neurons between two different layers by tion, even if the training phase of DNN demands precise
combining the loads and incentives with the neurons. These number representations.
layers often sit before the problem subcaste and help to build DNNs outperform the more traditional ANN with
the final several layers of a CNN architecture. This smoothes numerous layers in terms of performance. Due to their
and takes care of the information picture from the preceding exceptional ability to learn both the initial structure of the
layers down to the FC subrank. The smoothed vector also input data vectors as well as the nonlinear input—affair
passes through numerous further FC levels, where the mapping, DNN models are currently becoming rather
majority of the advanced capability jobs take place. The popular. The majority of DNNs are feed forward networks
arranging cycle begins to take place at this point. (FFNNs), in which data go from the input layer to the output
layer without going backward 3 and the links between the
layers are only ever in the forward direction and never drop a
3.1.4. Dropout. In general, the preparation dataset can be loop again. Through backpropagation, supervised learning is
overfitted when every highlight is connected to the FC layer. used to complete the tasks using datasets with certain in-
Overfitting occurs when a certain model performs well on formation. The architecture of simple NN and DNN is
the preparation data but has negative effects on the model’s shown in Figure 5.
presentation when applied to other details. A dropout layer,
which reduces the size of the model by removing a large
number of neurons from the brain network during prepa- 3.3. Dataset Used
ration, is employed to solve this problem. Thirty of the (i) MNIST is a collection of manually written integers.
knocks are randomly removed from the brain organization The collection consists of test photos and training
after passing a dropout of 0.3. examples. The images’ boundaries are grayscale and
28 × 28 in size. The handwritten numbers on the
photos range from 0 to 9, totaling 10 classes.
3.1.5. Activation Functions. The CNN model’s actuation
capacity represents one of its primary long-term limits. They (ii) Fashion MNIST: a dataset of fashion-connected
are used to identify and investigate any kind of ongoing and pictures. The dataset consists of coaching exem-
intricate relationship between organizational constituent plifications and checks images. The reach of the
parts. In other words, it establishes which model data should photograph is square measure 28 × 28 and square
be fired in the forward direction. It gives the organization measure grayscale. The views contain coaching and
more nonlinearity. There are just a few commonly used check particulars of a jersey, trousers, pullover,
initiating capabilities, such as ReLU, softmax, tanH, and dress, coat, sandal, shirt, sneakers, bag, and mortise
sigmoid capabilities. There is a specific activity for each of joint boot has developed the style MNIST dataset.
these abilities. The sigmoid and softmax capabilities are (iii) Medical MNIST dataset was used to evaluate the
preferred for a CNN model with two groups such as mul- performance of the opposing dataset. The topics
ticlass order and softmax. covered include binary/multiclass, multilabel, and
Security and Communication Networks 7

Table 1: The performance of CNN architecture using the Fashion MNIST dataset.
CNN architecture
Dataset Epochs Batch size Optimizers Training accuracy Testing accuracy Loss
SGD 93.42 91.15 0.042
RMSprop 93.37 90.94 0.040
Adam 94.68 91.25 0.076
Adadelta 93.37 91.73 0.048
Fashion MNIST dataset 5 32
Adagrad 91.22 90.52 0.046
Adamax 93.28 91.63 0.050
NAdam 94.72 91.07 0.058
Ftrl 89.88 89.75 0.045

Table 2: The performance of DNN architecture using Fashion MNIST dataset.


DNN architecture
Dataset Epochs Batch size Optimizers Training accuracy Testing accuracy Loss
SGD 88.43 76.34 0.32
RMSprop 87.34 77.32 0.41
Adam 84.69 78.63 0.48
Adadelta 83.54 74.64 0.30
Fashion MNIST dataset 5 32
Adagrad 85.28 75.85 0.34
Adamax 86.37 74.73 0.38
NAdam 86.35 76.84 0.37
Ftrl 89.55 79.68 0.52

ordinal regression. The dataset sizes range from 100 architectures. The comparative analysis is performed using
to 100,000. It is as varied as possible since the VDD training and testing accuracy. Moreover, the loss value de-
and MSD fairly evaluate the performance of gen- scribes the qualitative result of the proposed work.
eralizable machine learning algorithms across a Table 1 exhibits the overall performance using CNN
range of contexts. However, real-time and three- architecture using eight novel optimizers. This result shows
dimensional medical specialist images are offered. It the comparative analysis between different optimizers using
primarily focuses on machine learning rather than the Fashion MNIST dataset. The performance report reveals
the end-to-end system like AN MNIST-like dataset better results for using all the optimizers. Through the ob-
assortment to do classification jobs on small photos. servation from Table 1, Adadelta achieves higher accuracy of
The 2828 (2D) or 282828 (3D) modest size is ideal 91.732% among the other optimizers. The next better ac-
for testing machine learning techniques. Medical curacy of 91.628% gives the Adamax optimizer. The Ftrl
specialty image analysis as a knowledge domain optimizer has obtained minimum accuracy among the other
analysis space is challenging for researchers from optimizers. The results of Table 2 show that the efficiency of
various communities since it requires baseline the proposed work achieves higher accuracy for the Ftrl
knowledge. optimizer. The other optimizers also reached higher accuracy
with slight differences. Also, the testing accuracy slightly
4. Results and Discussion reduces their accuracy compared with the training accuracy.
Table 3 shows the result using CNN architecture with
This research work has been conducted using two different eight novel optimizers. The experimental result demon-
data sets: Fashion MNIST and MNIST, testing eight novel strates the higher accuracy of using all the optimizers. Es-
optimizers. Python language is used for developing a model pecially for SGD optimizer has obtained better accuracy for
using eight novel optimizers. The proposed work has MNIST dataset than the other optimizers. So, the SGD
achieved 16 results for using CNN and DNN architecture for optimizer is well suited for the MNIST dataset. The next
each dataset. Overall performance has demonstrated the priority will be given to Adamax, RMSprop, and Adadelta
efficiency of the optimizers. Without optimizers, the result optimizers because these optimizers reach similar results for
will go down, and the loss may be increased. This approach their dataset.
could be a promising method to set a goal of better accuracy Table 4 shows the efficiency of the method which has
for different kinds of datasets and architectures. The ideology improved the level of accuracy of the CNN architecture. The
behind this proposed work aims to elevate the typical results overall report shows that the DNN architecture gives a better
to be higher. Comparative analysis of various optimizers result than the CNN architecture. Table 5 shows the result using
shows the variety of improvements that may change CNN architecture with eight novel optimizers using the
depending on architectures and datasets. The performance Medical MNISTdataset. The experimental result demonstrates
comparison uses eight novel optimizers with CNN and DNN overall accuracy has been improved using all optimizers.
8 Security and Communication Networks

Table 3: The performance of CNN architecture using the MNIST dataset.


CNN architecture
Dataset Epochs Batch size Optimizers Training accuracy Testing accuracy Loss
SGD 99.62 98.68 0.062
RMSprop 99.45 98.55 0.060
Adam 99.67 98.43 0.085
Adadelta 99.30 98.53 0.059
MNIST dataset 5 32
Adagrad 99.65 98.26 0.061
Adamax 99.50 98.58 0.060
NAdam 99.75 98.40 0.078
Ftrl 98.96 98.11 0.052

Table 4: The performance of DNN architecture using the MNIST dataset.


DNN architecture
Dataset Epochs Batch size Optimizers Training accuracy Testing accuracy Loss
SGD 99.86 99.27 0.046
RMSprop 99.75 99.34 0.047
Adam 99.67 99.31 0.046
Adadelta 99.73 99.23 0.043
MNIST dataset 5 32
Adagrad 99.67 99.17 0.041
Adamax 99.52 99.19 0.044
NAdam 99.25 99.02 0.038
Ftrl 99.79 99.57 0.056

Table 5: The performance of CNN architecture using the Medical MNIST dataset.
CNN architecture
Dataset Epochs Batch size Optimizers Training accuracy Testing accuracy Loss
SGD 99.65 99.53 0.034
RMSprop 99.58 99.46 0.032
Adam 99.78 99.47 0.031
Adadelta 99.43 99.39 0.029
Medical MNIST dataset 5 32
Adagrad 99.63 99.56 0.040
Adamax 99.64 99.57 0.042
NAdam 99.49 99.42 0.039
Ftrl 99.02 98.89 0.029

Table 6: The performance of DNN architecture using the Medical MNIST dataset.
DNN architecture
Dataset Epochs Batch size Optimizers Training accuracy Testing accuracy Loss
SGD 99.86 99.43 0.031
RMSprop 99.55 99.36 0.028
Adam 99.75 99.48 0.027
Adadelta 99.45 99.26 0.026
Medical MNIST dataset 5 32
Adagrad 99.68 99.42 0.035
Adamax 99.53 99.16 0.033
NAdam 99.47 99.28 0.028
Ftrl 99.12 98.76 0.039

Especially for SGD optimizer has obtained better accuracy for the dataset. The analysis of the results depicts the perfor-
Medical MNIST dataset than the other optimizers. mance of the proposed work exhibiting better results
The performance evaluation shows that the SGD opti- compared to training and testing accuracy. Moreover, the
mizer is well suited for the Medical MNIST dataset. The next results bring the efficacy of the outcomes compared with the
priority will be given to Adamax, RMSprop, and Adadelta existing architecture performance. The visualization report
optimizers because these optimizers reach similar results for demonstrates the overall performance of the architectures
Security and Communication Networks 9

CNN Architechture using Fashion MNIST Dataset DNN Architecture using Fashion MNIST Dataset

94.72
94.68

89.55
88.43
96 95

87.34

86.37

86.35
93.42

93.37

93.37

93.28

85.28
95

84.69

83.54
90
94

91.73

91.63

79.68
91.25

91.22

78.63
91.15

91.07
93 85

90.94

77.32

76.84
Accuracy
Accuracy

76.34
90.52

75.85

74.73
74.64
92

89.88
89.75
80
91
90 75
89
70
88
87 65

SGD

RMSprop

Adam

Adadelta

Adagrad

Nadam

Ftrl
Adamax
SGD

RMSprop

Adam

Adadelta

Adagrad

Nadam

Ftrl
Adamax
Optimizers Optimizers
Training Accuracy Training Accuracy
Testing Accuracy Testing Accuracy

Figure 6: Visualization of CNN and DNN using optimizers for Fashion MNIST dataset.

CNN Architecture using MNIST Dataset DNN Architecture using MNIST Dataset

99.86

99.79
99.75

99.75
99.67

99.73
99.65
99.62

100

99.67

99.67
99.45

99.5

99.57
100
99.3

99.52
99.8
98.96

99.5

99.34
98.68

99.31
99.6
98.58

99.27
98.55

98.53

99.25
99.23
98.43

99.19
99.17
Accuracy

99
98.4
98.26
Accuracy

99.4
98.11

99.02
98.5 99.2
98 99
97.5 98.8
97 98.6
SGD

RMSprop

Adam

Adadelta

Adagrad

Nadam

Ftrl
Adamax

SGD

RMSprop

Adam

Adadelta

Adagrad

Nadam

Ftrl
Adamax
Optimizers Optimizers

Training Accuracy Training Accuracy


Testing Accuracy Testing Accuracy

Figure 7: Visualization of CNN and DNN using optimizers for MNIST Numerical dataset.

CNN Architecture using Medical MNIST Dataset DNN Architecture using medical MNIST Dataset
99.86

99.75
99.78

99.68
99.65

99.64

99.55
99.63

99.53
99.58

100 100
99.48
99.57

99.47
99.56

99.45
99.43
99.53

99.42
99.49

99.36
99.47
99.46

99.43

99.42

99.8
99.28
99.26
99.39

99.8
99.16

99.12

99.6 99.6
99.4
Accuracy
99.02
Accuracy

98.76

99.4 99.2
98.89

99.2 99
99 98.8
98.8 98.6
98.4
98.6 98.2
98.4
SGD

RMSprop

Adam

Adadelta

Adagrad

Nadam

Ftrl
Adamax
SGD

RMSprop

Adam

Adadelta

Adagrad

Nadam

Ftrl
Adamax

Optimizers
Optimizers

Training Accuracy Training Accuracy


Testing Accuracy Testing Accuracy

Figure 8: Visualization of CNN and DNN using optimizers for MNIST Medical dataset.

using various optimizers, and the level can be improved architecture gives a better result than the CNN architecture.
accordingly. Table 6 shows the efficiency of the method The observation of Table 6 reveals high accuracy using the
which has improved the level of accuracy of the CNN ar- Adam optimizer for the Medical MNIST dataset than the
chitecture. The overall report shows that the DNN other optimizers. It is well suited for the Medical MNIST
10 Security and Communication Networks

dataset. The performance report has been compared with applications, future directions,” Journal of Big Data, vol. 8,
each optimizer and tested using various trials to find the no. 1, pp. 53–74, 2021.
best-suited optimizers for DNN architectures. [4] S. Indolia, A. K. Goswami, S. P. Mishra, and P. Asopa,
Overall result analysis presents that the comparative “Conceptual understanding of convolutional neural network-
performance analysis for CNN and DNN architecture is a deep learning approach,” Procedia Computer Science,
vol. 132, pp. 679–688, 2018.
presented from Figures 6 to 8.
[5] S. R. Dubey, S. Chakraborty, S. K. Roy, S. Mukherjee,
Through the observation, various optimizers are tested S. K. Singh, and B. B. Chaudhuri, “DiffGrad: an optimization
using different datasets, and it is noted that each optimizer method for convolutional neural networks,” IEEE Transac-
has unique attributes. The results included parameters like tions on Neural Networks and Learning Systems, vol. 31, no. 11,
the number of epochs, batch size, and learning rate. Finally, pp. 4500–4511, 2020.
the epochs will be fixed as 5, batch size will be 32, and the [6] R. Mohapatra, S. Saha, C. A. C. Coello, A. Bhattacharya,
learning rate as 0.01 has been taken for the higher accuracy S. S. Dhavala, and S. Saha, “AdaSwarm: augmenting gradient-
value. based optimizers in deep learning with swarm intelligence,”
IEEE Transactions on Emerging Topics in Computational In-
5. Conclusion telligence, vol. 6, no. 2, pp. 329–340, 2022.
[7] N. Landro, I. Gallo, and R. L. Grassa, “Combining optimi-
The proposed method presents an analysis of various novel zation methods using an adaptive Meta optimizer,” Algo-
optimizers used for fine-tuning the performance of CNN rithms, vol. 14, no. 6, p. 186, 2021.
and DNN architecture. The performance of the proposed [8] B. Roy, M. P. Singh, M. R. Kaloop et al., “Data-driven ap-
method has been evaluated using measures to display the proach for rainfall-runoff modelling using equilibrium op-
timizer coupled extreme learning machine and deep neural
accuracy and loss value. This novel approach has been ev-
network,” Applied Sciences, vol. 11, no. 13, p. 6238, 2021.
ident in achieving comparable results in various datasets. [9] L. Cheng, Z. Wang, F. Jiang, and J. Li, “An identifier-actor-
Each phase of implementation of the dataset and archi- optimizer policy learning architecture for optimal control of
tectures of CNN and DNN reveals different results ac- continuous-time nonlinear systems,” Science China Physics,
cordingly. The overall performance of this proposed work Mechanics & Astronomy, vol. 63, no. 6, pp. 264511-264512,
has been evaluated using such parameters as batch size, the 2020.
number of epochs, and learning rate. This research work is a [10] M. Yaqub, J. Feng, M. S. Zia et al., “State-of-the-art CNN
nutshell to compare the various optimizers for the different optimizer for brain tumor segmentation in magnetic reso-
architectures and the datasets. The comprehensive report has nance images,” Brain Sciences, vol. 10, no. 7, p. 427, 2020.
been constructed using multiple components to improve its [11] N. Dawson-Elli, S. Kolluri, K. Mitra, and V. R. Subramanian,
accuracy. The proposed work is extended to use the other “On the creation of a chess-AI-inspired problem-specific
datasets and architectures to test a comparable accuracy optimizer for the pseudo two-dimensional battery model
using neural networks,” Journal of the Electrochemical Society,
range.
vol. 166, no. 6, pp. A886–A896, 2019.
[12] X. Cui, X. ., W. Zhang, Z. Tuske, and M. Picheny, “Evolu-
Data Availability tionary stochastic gradient descent for optimization of deep
neural networks,” Advances in Neural Information Processing
The data are available upon request. Systems, vol. 31, 2018.
[13] Q. Zheng, X. Tian, N. Jiang, and M. Yang, “Layer-wise
Conflicts of Interest learning based stochastic gradient descent method for the
optimization of deep convolutional neural network,” Journal
The authors declare that they have no conflicts of interest to of Intelligent and Fuzzy Systems, vol. 37, no. 4, pp. 5641–5654,
report regarding the study. 2019.
[14] M. Loni, S. Sinaei, A. Zoljodi, M. Daneshtalab, M. Sjodin, and
Acknowledgments D. Maker, “DeepMaker: a multi-objective optimization
framework for deep neural networks in embedded systems,”
The proposed research trials were conducted in “The Ad- Microprocessors and Microsystems, vol. 73, Article ID 102989,
vanced Image Processing DST-FIST Laboratory,” Depart- 2020.
ment of Computer Science and Applications, the [15] N. M. Aszemi and P. D. D. Dominic, “Hyperparameter op-
Gandhigram Rural Institute (Deemed to be University), timization in convolutional neural network using genetic
Dindigul, Tamil Nadu, India. algorithms,” International Journal of Advanced Computer
Science and Applications, vol. 10, no. 6, pp. 269–278, 2019.
References [16] S. Bera and V. K. Shrivastava, “Analysis of various optimizers
on deep convolutional neural network model in the appli-
[1] I. H. Sarker, “Deep learning: a comprehensive overview on cation of hyperspectral remote sensing image classification,”
techniques, taxonomy, applications and research directions,” International Journal of Remote Sensing, vol. 41, no. 7,
SN Computer Science, vol. 2, no. 6, pp. 420–20, 2021. pp. 2664–2683, 2020.
[2] T. J. Sejnowski, “The unreasonable effectiveness of deep [17] Q. Zheng, D. Fu, Y. Wang, H. Chen, and H. Zhang, “A study
learning in artificial intelligence,” Proceedings of the National on global optimization and deep neural network modeling
Academy of Sciences, vol. 117, no. 48, pp. 30033–30038, 2020. method in performance-seeking control,” Proceedings of the
[3] L. Alzubaidi, J. Zhang, A. J. Humaidi et al., “Review of deep Institution of Mechanical Engineers - Part I: Journal of Systems
learning: concepts, CNN architectures, challenges, & Control Engineering, vol. 234, no. 1, pp. 46–59, 2020.
Security and Communication Networks 11

[18] I. Kandel, M. Castelli, and A. Popovic, “Comparative study of


first order optimizers for image classification using con-
volutional neural networks on histopathology images,”
Journal of Imaging, vol. 6, no. 9, p. 92, 2020.
[19] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and
V. G. Da Fonseca, “Performance assessment of multiobjective
optimizers: an analysis and review,” IEEE Transactions on
Evolutionary Computation, vol. 7, no. 2, pp. 117–132, 2003.
[20] V. Pavan, “Types of Optimizers in Deep Learning Every AI
Engineer Should Know,” 2020, https://www.upgrad.com/
blog/types-of-optimizers-in-deep-learning.
[21] Q. Berthet, M. Blondel, O. Teboul, M. Cuturi, J. P. Vert, and
F. Bach, “Learning with differentiable pertubed optimizers,”
Advances in Neural Information Processing Systems, vol. 33,
pp. 9508–9519, 2020.
[22] S. Postalcıoğlu, “Performance analysis of different optimizers
for deep learning-based image recognition,” International
Journal of Pattern Recognition and Artificial Intelligence,
vol. 34, no. 02, Article ID 2051003, 2020.
[23] A. Kumar, S. Sarkar, and C. Pradhan, “Malaria disease de-
tection using CNN technique with sgd, rmsprop and Adam
optimizers,” in Proceedings of the Deep Learning Techniques
for Biomedical and Health Informatics, pp. 211–230, Springer,
Cham, 2020.
[24] A. M. Taqi, A. Ahmed, F. Al-Azzo, and M. Milanova, “The
impact of multi-optimizers and data augmentation on Ten-
sorFlow convolutional neural network performance,” in
Proceedings of the IEEE Conference on Multimedia Informa-
tion Processing and Retrieval (MIPR), pp. 140–145, IEEE,
Miami, FL, USA, April 2018.
[25] M. Fradi, L. Khriji, and M. Machhout, “Real-time arrhythmia
heart disease detection system using CNN architecture based
various optimizers-networks,” Multimedia Tools and Appli-
cations, pp. 1–22. In press, 2021.
[26] M. Agarwal, A. Rajak, and A. Kumar Shrivastava, “Assess-
ment of optimizers impact on image recognition with con-
volutional neural network to adversarial datasets,” in Journal
of Physics: Conference Seriesvol. 1998, no. 1, IOP Publishing,
Article ID 012008, 2021.
[27] A. Shaf, T. Ali, W. Farooq, S. Javaid, U. Draz, and S. Yasin,
“Two classes classification using different optimizers in
convolutional neural network,” in Proceedings of the IEEE 21st
International Multi-Topic Conference (INMIC), pp. 1–6, IEEE,
Karachi, Pakistan, November 2018.

You might also like