Bio Optimization of Deep Learning Network Architectures 22fguqp5
Bio Optimization of Deep Learning Network Architectures 22fguqp5
Research Article
Bio-Optimization of Deep Learning Network Architectures
Copyright © 2022 Shanmugavadivu P et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
Deep learning is reaching new heights as a result of its cutting-edge performance in a variety of fields, including computer vision,
natural language processing, time series analysis, and healthcare. Deep learning is implemented using batch and stochastic
gradient descent methods, as well as a few optimizers; however, this led to subpar model performance. However, there is now a lot
of effort being done to improve deep learning’s performance using gradient optimization methods. The suggested work analyses
convolutional neural networks (CNN) and deep neural networks (DNN) using several cutting-edge optimizers to enhance the
performance of architectures. This work uses specific optimizers (SGD, RMSprop, Adam, Adadelta, etc.) to enhance the per-
formance of designs using different types of datasets for result matching. A thorough report on the optimizers’ performance across
a variety of architectures and datasets finishes the study effort. This research will be helpful to researchers in developing their
framework and appropriate architecture optimizers. The proposed work involves eight new optimizers using four CNN and DNN
architectures. The experimental results exploit breakthrough results for improving the efficiency of CNN and DNN architectures
using various datasets.
Summation
1.1.2. The Learning Rate. The score of weights by adding and
Xm Wkm abating too much can hamper the loss function. There is no
longer to jump for an optimal value for a given weight. This
Figure 1: The architecture of the neural network. term is defined as the literacy rate medium. This process can
apply to a small number like 0.001 that can multiply the
slants by spanning them.
(iii) Transfer function: the job of the transfer function is
to mix multiple inputs into one affair price, so the
activation function may be applied. It is done by 1.1.3. Regularization Process. Experimenters in machine
accessible information to the transfer function. literacy are constantly terrified of overfitting problems.
(iv) Activation Function: it introduces nonlinearity Overfitting occurs when a model performs well on the data
within the operating of the perceptron to con- used to train it but poorly on fresh data that arise in the
template variable one-dimensionality with the in- actual world. This is only possible if one parameter domi-
puts. While not this, the output would be a linear nates the formula and is counted excessively. To prevent this,
combination of input values and would not be regularization is a phrase that has been introduced to the
appropriate to introduce nonlinearity within the optimization process. The loss function has an additional
network. component that penalizes high weight values during regu-
Deep learning systems are large, complex, and frequently larization. If the predictions are accurate, penalties for
involve numerous layers and nonlinearity, which makes having accurate predictions with high weight values are
them difficult to optimize. Optimizers must be forced to stir obtained. This ensures that weights remain on the lower side,
up a complex system that is difficult to understand. Some improving their ability to generalize the new data.
deep learning systems only provide a small number of pa-
rameters that may be modified, which reduces their use- 1.2. Types of Optimizers. The various fundamental opti-
fulness. Deep learning models can still be improved and mizers to reduce the loss function are described as follows.
created more easily in some rational ways.
1.2.3. Minibatch Gradient Descent. Another form of this GD similar crucial dynamic component, and varying it may
method is known as minibatch, where the model parameters change the tutoring tempo. A complex learning rate for a
are still useful for tiny batch sizes. To ensure that the model is dispersed purpose input is observed where the maximum of
paced towards minima gradually and to prevent frequent the values is zero to increase the fading gradient acting from
derailments, it is indicated that the model parameters are these lightweight options.
updated every ‘n’ batches. This leads to low variation within
the model and decreased memory usage.
1.2.9. AdaDelta. By addressing the issues of losing the ac-
quisition rate due to the monotonously increasing add of the
1.2.4. Momentum-Based Gradient Descent. The parameters court of slants, AdaDelta adheres to the broad interpretation
supplied by the first-order outgrowth of the loss function are of AdaGrad. AdaDelta compiles the total number of once
being updated by backpropagating the system. The number gradients; however, it only takes a few once slopes into
of updates inside the parameters is sometimes overlooked, account rather than all angles. Another method, like Ada-
even though the frequency of updates is frequently repli- Delta, to restore AdaGrad’s declining learning rate is
cated for every batch or every time. The term “initiation” in RMSProp.
this optimizer refers to the inclusion of this historical
component in later updates, which will speed up the overall
1.2.10. Adamax. The resolving movement estimation opti-
process.
mization algorithm has been extended by the Adamax
formula. It is an expansion of the gradient descent opti-
1.2.5. Nesterov Accelerated Gradient (NAG). The instiga- mization formula, which is used a lot in astronomy. The
tion-based largely GD is currently very widespread, down to formula was defined by Jimmy Lei Ba and Diederik Kingma.
the lowest levels. The system trials fluctuate, enter the
minimum boundary, and add to the total number of times. 1.2.11. NAdam. The reconciling movement estimation op-
The next technique is also not up to standard GD. But this timization is extended to include Nesterov’s accelerated
problem also needs an exhausted NAG repair. The strategy grade (Horse), also known as Nesterov instigation, which is a
adopted was to first develop the history component before complex type of momentum.
creating the parameters update. Calculations are made to the
outgrowth, which could cause it to advance or regress. This is
known as the “look-ahead strategy,” and it makes even more 1.2.12. FTRL. To estimate click-through rates, Google cre-
sense because the wind is blowing almost at the minimum. ated “Follow the Regularized Leader” (FTRL) in the early
2010s. According to McMahan, the shallow models work
better for large dispersed areas.
1.2.6. RMSProp. RMSProp frequently enhances the Ada-
grad optimizer. This optimizer uses an exponential tradi-
2. Related Works
tional of the slants to reduce the acquisition rate. Acquiring
rate reconciliation is still comprehensive because classic can This section presents a review of recent works of literature
manage various acquisition rates under settings with more based on the various optimizers and their performance using
small updates and a lesser rate under extremely complex CNN and DNN architectures.
update conditions. The authors proposed a frame using the DNN-based
optimization strategy for prognosticating the true optimum.
1.2.7. Adam. The RME optimizer combines the RMSprop The ways are proposed to discover operations in the early
and instigation-primarily based on GD methodologies. The stages of aerospace design [6]. In [7], the authors described a
possibility for stimulus in Adam optimizers to recover the random multimodal deep learning (RMDL) which is an
data from history results in balancing acquisition rate gain ensemble system to break the problem of finding a stylish
from the RMSprop. The technique demonstrates the im- deep learning structure. Principally, RMDL takes the mul-
portance of the Adam optimizer. Two hyperactive settings tiple aimlessly generated model for training using the deep
are introduced in this optimizer to fit the use case. neural network (DNN), convolutional neural network
(CNN), and intermittent neural network (INN) for
achieving better results. A double algorithm is defined as an
1.2.8. Adagrad-Reconciling Gradient Formula. Adagrad is a optimizer for mongrel anomaly discovery for intrusion
reconciling grade optimizer that updates the higher price discovery evaluation. The exploration work conducts the
(high acquisition rates) for parameters with infrequent trial with the anomaly bracket of IDS using DNN [8].
options and modifies the acquisition rate to a lower price for AdaSwarm is a gradient based on outgrowth-based
parameters associated with rush of options circumstances, optimization. It implies on functions a differentiable sphere.
particularly the justification for dealing with distributed AdaSwarm includes an exponentially weighted momentum
information. Although the intended work is what the model flyspeck swarm optimizer (EMPSO) for making effective
parameters are primarily focused on, they also have an analysis [9]. The authors discovered an ATMO (AdapTive
impact on our coaching because they are assigned consistent Meta Optimizers) which integrates two different optimizers
prices for the duration of the coaching. The learning rate is a for importing the benefactions and produces the result with
4 Security and Communication Networks
DATASET
Input Dataset MNIST Numeric Dataset
MNIST Fashion Dataset
MNIST Medical Dataset
8 Novel Optimizers
SGD
CNN Architecture RMSprop
ADAM
Adadelta
Adagrad
Adamax
Nadam
FTRL
MNIST Numeric
Output Data Dataset
MNIST Fashion Dataset
MNIST Medical
Dataset
Figure 2: CNN architecture with various optimizers.
DATASET
Input Dataset MNIST Numeric Dataset
MNIST Fashion Dataset
MNIST Medical Dataset
8 Novel Optimizers
SGD
RMSprop
DNN Architecture ADAM
Adadelta
Adagrad
Adamax
Nadam
FTRL
Fully
Connected
Convolution
Pooling Output
Input
convolutional layer. This layer involves performing the Simple Neural Network Deep Learning Neural Network
Input Layer
3.1.2. Pooling Layer. A convolutional layer is typically fol- Hidden Layer
Output Layer
lowed by a pooling layer. The crucial step in this layer is to
reduce the convolved direct diagram’s size to save on Figure 5: Architecture of simple NN and DNN.
computational costs. This is accomplished by reducing the
linkages between layers and working separately on each
element map. There are many types of pooling jobs 3.2. Deep Neural Network. To integrate AI into the daily
depending on the framework used. The most important activities of self-driving cars, smartphones, games, drones,
element of max pooling is derived from the highlighted map. etc., deep neural networks (DNNs) have emerged as a
Common pooling operates outside the bounds of the fun- promising solution. Most often, DNNs were accelerated by a
damentals in a measured image segment that has been boy with several computing devices, like a GPU, but current
predefined. Sum pooling figures the total quantity of the technological advancements call for energy-efficient DNN
essentials in the predefined detail. Most of the time, the acceleration as the most advanced operations moved down
pooling subcaste acts as a ground between the convolutional to mobile computing devices. Neural processing unit (NPU)
layer and the FC layer. infrastructures focused on accelerating DNN with minimal
energy consumption become necessary. Numerous experi-
ments have shown that exercising lower bit perfection is
3.1.3. Fully Connected Layer. The Fully connected (FC) layer sufficient for a conclusion with minimal power consump-
connects the neurons between two different layers by tion, even if the training phase of DNN demands precise
combining the loads and incentives with the neurons. These number representations.
layers often sit before the problem subcaste and help to build DNNs outperform the more traditional ANN with
the final several layers of a CNN architecture. This smoothes numerous layers in terms of performance. Due to their
and takes care of the information picture from the preceding exceptional ability to learn both the initial structure of the
layers down to the FC subrank. The smoothed vector also input data vectors as well as the nonlinear input—affair
passes through numerous further FC levels, where the mapping, DNN models are currently becoming rather
majority of the advanced capability jobs take place. The popular. The majority of DNNs are feed forward networks
arranging cycle begins to take place at this point. (FFNNs), in which data go from the input layer to the output
layer without going backward 3 and the links between the
layers are only ever in the forward direction and never drop a
3.1.4. Dropout. In general, the preparation dataset can be loop again. Through backpropagation, supervised learning is
overfitted when every highlight is connected to the FC layer. used to complete the tasks using datasets with certain in-
Overfitting occurs when a certain model performs well on formation. The architecture of simple NN and DNN is
the preparation data but has negative effects on the model’s shown in Figure 5.
presentation when applied to other details. A dropout layer,
which reduces the size of the model by removing a large
number of neurons from the brain network during prepa- 3.3. Dataset Used
ration, is employed to solve this problem. Thirty of the (i) MNIST is a collection of manually written integers.
knocks are randomly removed from the brain organization The collection consists of test photos and training
after passing a dropout of 0.3. examples. The images’ boundaries are grayscale and
28 × 28 in size. The handwritten numbers on the
photos range from 0 to 9, totaling 10 classes.
3.1.5. Activation Functions. The CNN model’s actuation
capacity represents one of its primary long-term limits. They (ii) Fashion MNIST: a dataset of fashion-connected
are used to identify and investigate any kind of ongoing and pictures. The dataset consists of coaching exem-
intricate relationship between organizational constituent plifications and checks images. The reach of the
parts. In other words, it establishes which model data should photograph is square measure 28 × 28 and square
be fired in the forward direction. It gives the organization measure grayscale. The views contain coaching and
more nonlinearity. There are just a few commonly used check particulars of a jersey, trousers, pullover,
initiating capabilities, such as ReLU, softmax, tanH, and dress, coat, sandal, shirt, sneakers, bag, and mortise
sigmoid capabilities. There is a specific activity for each of joint boot has developed the style MNIST dataset.
these abilities. The sigmoid and softmax capabilities are (iii) Medical MNIST dataset was used to evaluate the
preferred for a CNN model with two groups such as mul- performance of the opposing dataset. The topics
ticlass order and softmax. covered include binary/multiclass, multilabel, and
Security and Communication Networks 7
Table 1: The performance of CNN architecture using the Fashion MNIST dataset.
CNN architecture
Dataset Epochs Batch size Optimizers Training accuracy Testing accuracy Loss
SGD 93.42 91.15 0.042
RMSprop 93.37 90.94 0.040
Adam 94.68 91.25 0.076
Adadelta 93.37 91.73 0.048
Fashion MNIST dataset 5 32
Adagrad 91.22 90.52 0.046
Adamax 93.28 91.63 0.050
NAdam 94.72 91.07 0.058
Ftrl 89.88 89.75 0.045
ordinal regression. The dataset sizes range from 100 architectures. The comparative analysis is performed using
to 100,000. It is as varied as possible since the VDD training and testing accuracy. Moreover, the loss value de-
and MSD fairly evaluate the performance of gen- scribes the qualitative result of the proposed work.
eralizable machine learning algorithms across a Table 1 exhibits the overall performance using CNN
range of contexts. However, real-time and three- architecture using eight novel optimizers. This result shows
dimensional medical specialist images are offered. It the comparative analysis between different optimizers using
primarily focuses on machine learning rather than the Fashion MNIST dataset. The performance report reveals
the end-to-end system like AN MNIST-like dataset better results for using all the optimizers. Through the ob-
assortment to do classification jobs on small photos. servation from Table 1, Adadelta achieves higher accuracy of
The 2828 (2D) or 282828 (3D) modest size is ideal 91.732% among the other optimizers. The next better ac-
for testing machine learning techniques. Medical curacy of 91.628% gives the Adamax optimizer. The Ftrl
specialty image analysis as a knowledge domain optimizer has obtained minimum accuracy among the other
analysis space is challenging for researchers from optimizers. The results of Table 2 show that the efficiency of
various communities since it requires baseline the proposed work achieves higher accuracy for the Ftrl
knowledge. optimizer. The other optimizers also reached higher accuracy
with slight differences. Also, the testing accuracy slightly
4. Results and Discussion reduces their accuracy compared with the training accuracy.
Table 3 shows the result using CNN architecture with
This research work has been conducted using two different eight novel optimizers. The experimental result demon-
data sets: Fashion MNIST and MNIST, testing eight novel strates the higher accuracy of using all the optimizers. Es-
optimizers. Python language is used for developing a model pecially for SGD optimizer has obtained better accuracy for
using eight novel optimizers. The proposed work has MNIST dataset than the other optimizers. So, the SGD
achieved 16 results for using CNN and DNN architecture for optimizer is well suited for the MNIST dataset. The next
each dataset. Overall performance has demonstrated the priority will be given to Adamax, RMSprop, and Adadelta
efficiency of the optimizers. Without optimizers, the result optimizers because these optimizers reach similar results for
will go down, and the loss may be increased. This approach their dataset.
could be a promising method to set a goal of better accuracy Table 4 shows the efficiency of the method which has
for different kinds of datasets and architectures. The ideology improved the level of accuracy of the CNN architecture. The
behind this proposed work aims to elevate the typical results overall report shows that the DNN architecture gives a better
to be higher. Comparative analysis of various optimizers result than the CNN architecture. Table 5 shows the result using
shows the variety of improvements that may change CNN architecture with eight novel optimizers using the
depending on architectures and datasets. The performance Medical MNISTdataset. The experimental result demonstrates
comparison uses eight novel optimizers with CNN and DNN overall accuracy has been improved using all optimizers.
8 Security and Communication Networks
Table 5: The performance of CNN architecture using the Medical MNIST dataset.
CNN architecture
Dataset Epochs Batch size Optimizers Training accuracy Testing accuracy Loss
SGD 99.65 99.53 0.034
RMSprop 99.58 99.46 0.032
Adam 99.78 99.47 0.031
Adadelta 99.43 99.39 0.029
Medical MNIST dataset 5 32
Adagrad 99.63 99.56 0.040
Adamax 99.64 99.57 0.042
NAdam 99.49 99.42 0.039
Ftrl 99.02 98.89 0.029
Table 6: The performance of DNN architecture using the Medical MNIST dataset.
DNN architecture
Dataset Epochs Batch size Optimizers Training accuracy Testing accuracy Loss
SGD 99.86 99.43 0.031
RMSprop 99.55 99.36 0.028
Adam 99.75 99.48 0.027
Adadelta 99.45 99.26 0.026
Medical MNIST dataset 5 32
Adagrad 99.68 99.42 0.035
Adamax 99.53 99.16 0.033
NAdam 99.47 99.28 0.028
Ftrl 99.12 98.76 0.039
Especially for SGD optimizer has obtained better accuracy for the dataset. The analysis of the results depicts the perfor-
Medical MNIST dataset than the other optimizers. mance of the proposed work exhibiting better results
The performance evaluation shows that the SGD opti- compared to training and testing accuracy. Moreover, the
mizer is well suited for the Medical MNIST dataset. The next results bring the efficacy of the outcomes compared with the
priority will be given to Adamax, RMSprop, and Adadelta existing architecture performance. The visualization report
optimizers because these optimizers reach similar results for demonstrates the overall performance of the architectures
Security and Communication Networks 9
CNN Architechture using Fashion MNIST Dataset DNN Architecture using Fashion MNIST Dataset
94.72
94.68
89.55
88.43
96 95
87.34
86.37
86.35
93.42
93.37
93.37
93.28
85.28
95
84.69
83.54
90
94
91.73
91.63
79.68
91.25
91.22
78.63
91.15
91.07
93 85
90.94
77.32
76.84
Accuracy
Accuracy
76.34
90.52
75.85
74.73
74.64
92
89.88
89.75
80
91
90 75
89
70
88
87 65
SGD
RMSprop
Adam
Adadelta
Adagrad
Nadam
Ftrl
Adamax
SGD
RMSprop
Adam
Adadelta
Adagrad
Nadam
Ftrl
Adamax
Optimizers Optimizers
Training Accuracy Training Accuracy
Testing Accuracy Testing Accuracy
Figure 6: Visualization of CNN and DNN using optimizers for Fashion MNIST dataset.
CNN Architecture using MNIST Dataset DNN Architecture using MNIST Dataset
99.86
99.79
99.75
99.75
99.67
99.73
99.65
99.62
100
99.67
99.67
99.45
99.5
99.57
100
99.3
99.52
99.8
98.96
99.5
99.34
98.68
99.31
99.6
98.58
99.27
98.55
98.53
99.25
99.23
98.43
99.19
99.17
Accuracy
99
98.4
98.26
Accuracy
99.4
98.11
99.02
98.5 99.2
98 99
97.5 98.8
97 98.6
SGD
RMSprop
Adam
Adadelta
Adagrad
Nadam
Ftrl
Adamax
SGD
RMSprop
Adam
Adadelta
Adagrad
Nadam
Ftrl
Adamax
Optimizers Optimizers
Figure 7: Visualization of CNN and DNN using optimizers for MNIST Numerical dataset.
CNN Architecture using Medical MNIST Dataset DNN Architecture using medical MNIST Dataset
99.86
99.75
99.78
99.68
99.65
99.64
99.55
99.63
99.53
99.58
100 100
99.48
99.57
99.47
99.56
99.45
99.43
99.53
99.42
99.49
99.36
99.47
99.46
99.43
99.42
99.8
99.28
99.26
99.39
99.8
99.16
99.12
99.6 99.6
99.4
Accuracy
99.02
Accuracy
98.76
99.4 99.2
98.89
99.2 99
99 98.8
98.8 98.6
98.4
98.6 98.2
98.4
SGD
RMSprop
Adam
Adadelta
Adagrad
Nadam
Ftrl
Adamax
SGD
RMSprop
Adam
Adadelta
Adagrad
Nadam
Ftrl
Adamax
Optimizers
Optimizers
Figure 8: Visualization of CNN and DNN using optimizers for MNIST Medical dataset.
using various optimizers, and the level can be improved architecture gives a better result than the CNN architecture.
accordingly. Table 6 shows the efficiency of the method The observation of Table 6 reveals high accuracy using the
which has improved the level of accuracy of the CNN ar- Adam optimizer for the Medical MNIST dataset than the
chitecture. The overall report shows that the DNN other optimizers. It is well suited for the Medical MNIST
10 Security and Communication Networks
dataset. The performance report has been compared with applications, future directions,” Journal of Big Data, vol. 8,
each optimizer and tested using various trials to find the no. 1, pp. 53–74, 2021.
best-suited optimizers for DNN architectures. [4] S. Indolia, A. K. Goswami, S. P. Mishra, and P. Asopa,
Overall result analysis presents that the comparative “Conceptual understanding of convolutional neural network-
performance analysis for CNN and DNN architecture is a deep learning approach,” Procedia Computer Science,
vol. 132, pp. 679–688, 2018.
presented from Figures 6 to 8.
[5] S. R. Dubey, S. Chakraborty, S. K. Roy, S. Mukherjee,
Through the observation, various optimizers are tested S. K. Singh, and B. B. Chaudhuri, “DiffGrad: an optimization
using different datasets, and it is noted that each optimizer method for convolutional neural networks,” IEEE Transac-
has unique attributes. The results included parameters like tions on Neural Networks and Learning Systems, vol. 31, no. 11,
the number of epochs, batch size, and learning rate. Finally, pp. 4500–4511, 2020.
the epochs will be fixed as 5, batch size will be 32, and the [6] R. Mohapatra, S. Saha, C. A. C. Coello, A. Bhattacharya,
learning rate as 0.01 has been taken for the higher accuracy S. S. Dhavala, and S. Saha, “AdaSwarm: augmenting gradient-
value. based optimizers in deep learning with swarm intelligence,”
IEEE Transactions on Emerging Topics in Computational In-
5. Conclusion telligence, vol. 6, no. 2, pp. 329–340, 2022.
[7] N. Landro, I. Gallo, and R. L. Grassa, “Combining optimi-
The proposed method presents an analysis of various novel zation methods using an adaptive Meta optimizer,” Algo-
optimizers used for fine-tuning the performance of CNN rithms, vol. 14, no. 6, p. 186, 2021.
and DNN architecture. The performance of the proposed [8] B. Roy, M. P. Singh, M. R. Kaloop et al., “Data-driven ap-
method has been evaluated using measures to display the proach for rainfall-runoff modelling using equilibrium op-
timizer coupled extreme learning machine and deep neural
accuracy and loss value. This novel approach has been ev-
network,” Applied Sciences, vol. 11, no. 13, p. 6238, 2021.
ident in achieving comparable results in various datasets. [9] L. Cheng, Z. Wang, F. Jiang, and J. Li, “An identifier-actor-
Each phase of implementation of the dataset and archi- optimizer policy learning architecture for optimal control of
tectures of CNN and DNN reveals different results ac- continuous-time nonlinear systems,” Science China Physics,
cordingly. The overall performance of this proposed work Mechanics & Astronomy, vol. 63, no. 6, pp. 264511-264512,
has been evaluated using such parameters as batch size, the 2020.
number of epochs, and learning rate. This research work is a [10] M. Yaqub, J. Feng, M. S. Zia et al., “State-of-the-art CNN
nutshell to compare the various optimizers for the different optimizer for brain tumor segmentation in magnetic reso-
architectures and the datasets. The comprehensive report has nance images,” Brain Sciences, vol. 10, no. 7, p. 427, 2020.
been constructed using multiple components to improve its [11] N. Dawson-Elli, S. Kolluri, K. Mitra, and V. R. Subramanian,
accuracy. The proposed work is extended to use the other “On the creation of a chess-AI-inspired problem-specific
datasets and architectures to test a comparable accuracy optimizer for the pseudo two-dimensional battery model
using neural networks,” Journal of the Electrochemical Society,
range.
vol. 166, no. 6, pp. A886–A896, 2019.
[12] X. Cui, X. ., W. Zhang, Z. Tuske, and M. Picheny, “Evolu-
Data Availability tionary stochastic gradient descent for optimization of deep
neural networks,” Advances in Neural Information Processing
The data are available upon request. Systems, vol. 31, 2018.
[13] Q. Zheng, X. Tian, N. Jiang, and M. Yang, “Layer-wise
Conflicts of Interest learning based stochastic gradient descent method for the
optimization of deep convolutional neural network,” Journal
The authors declare that they have no conflicts of interest to of Intelligent and Fuzzy Systems, vol. 37, no. 4, pp. 5641–5654,
report regarding the study. 2019.
[14] M. Loni, S. Sinaei, A. Zoljodi, M. Daneshtalab, M. Sjodin, and
Acknowledgments D. Maker, “DeepMaker: a multi-objective optimization
framework for deep neural networks in embedded systems,”
The proposed research trials were conducted in “The Ad- Microprocessors and Microsystems, vol. 73, Article ID 102989,
vanced Image Processing DST-FIST Laboratory,” Depart- 2020.
ment of Computer Science and Applications, the [15] N. M. Aszemi and P. D. D. Dominic, “Hyperparameter op-
Gandhigram Rural Institute (Deemed to be University), timization in convolutional neural network using genetic
Dindigul, Tamil Nadu, India. algorithms,” International Journal of Advanced Computer
Science and Applications, vol. 10, no. 6, pp. 269–278, 2019.
References [16] S. Bera and V. K. Shrivastava, “Analysis of various optimizers
on deep convolutional neural network model in the appli-
[1] I. H. Sarker, “Deep learning: a comprehensive overview on cation of hyperspectral remote sensing image classification,”
techniques, taxonomy, applications and research directions,” International Journal of Remote Sensing, vol. 41, no. 7,
SN Computer Science, vol. 2, no. 6, pp. 420–20, 2021. pp. 2664–2683, 2020.
[2] T. J. Sejnowski, “The unreasonable effectiveness of deep [17] Q. Zheng, D. Fu, Y. Wang, H. Chen, and H. Zhang, “A study
learning in artificial intelligence,” Proceedings of the National on global optimization and deep neural network modeling
Academy of Sciences, vol. 117, no. 48, pp. 30033–30038, 2020. method in performance-seeking control,” Proceedings of the
[3] L. Alzubaidi, J. Zhang, A. J. Humaidi et al., “Review of deep Institution of Mechanical Engineers - Part I: Journal of Systems
learning: concepts, CNN architectures, challenges, & Control Engineering, vol. 234, no. 1, pp. 46–59, 2020.
Security and Communication Networks 11