Intelligent Systems Algorithms Overview
Intelligent Systems Algorithms Overview
Himanshu Mittal
Satyasai Jagannath Nanda
Meng-Hiot Lim Editors
Proceedings
of International
Conference
on Paradigms
of Communication,
Computing and Data
Analytics
PCCDA 2024, Volume 2
Algorithms for Intelligent Systems
Series Editors
Jagdish Chand Bansal, Department of Mathematics, South Asian University,
New Delhi, Delhi, India
Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee,
Roorkee, Uttarakhand, India
Atulya K. Nagar, School of Mathematics, Computer Science and Engineering,
Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms
for intelligent systems with their applications to various real world problems. It
covers research related to autonomous agents, multi-agent systems, behavioral
modeling, reinforcement learning, game theory, mechanism design, machine
learning, meta-heuristic search, optimization, planning and scheduling, artificial
neural networks, evolutionary computation, swarm intelligence and other algo-
rithms for intelligent systems.
The book series includes recent advancements, modification and applications
of the artificial neural networks, evolutionary computation, swarm intelligence,
artificial immune systems, fuzzy system, autonomous and multi agent systems,
machine learning and other intelligent systems related areas. The material will be
beneficial for the graduate students, post-graduate students as well as the
researchers who want a broader view of advances in algorithms for intelligent
systems. The contents will also be useful to the researchers from other fields who
have no knowledge of the power of intelligent systems, e.g. the researchers in the
field of bioinformatics, biochemists, mechanical and chemical engineers,
economists, musicians and medical practitioners.
The series publishes monographs, edited volumes, advanced textbooks and
selected proceedings.
Indexed by zbMATH.
All books published in the series are submitted for consideration in Web of
Science.
Himanshu Mittal · Satyasai Jagannath Nanda ·
Meng-Hiot Lim
Editors
Proceedings of International
Conference on Paradigms
of Communication,
Computing and Data
Analytics
PCCDA 2024, Volume 2
Editors
Himanshu Mittal Satyasai Jagannath Nanda
Department of Artificial Intelligence Department of Electronics
and Data Sciences and Communication Engineering
Indira Gandhi Delhi Technical University Malaviya National Institute of Technology
for Women Jaipur, Rajasthan, India
New Delhi, Delhi, India
Meng-Hiot Lim
School of Electrical and Electronic
Engineering
Nanyang Technology University
Singapore, Singapore
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2025
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
General Chairs
Dr. Himanshu Mittal, Department of Artificial Intelligence and Data Science, Indira
Gandhi Technical University for Women (IGDTUW), Delhi
Dr. Satyasai Jagannath Nanda, Department of Electronics and Communication
Engineering, MNIT Jaipur, India
Prof. Anita Tomar, Head, Department of Mathematics, Sridev Suman Uttarakhand
University, Pt. L. M. S. Campus Rishikesh, Uttarakhand
Prof. G. K. Dhingra, Dean, Faculty of Science, Sridev Suman Uttarakhand University,
Pt. L. M. S. Campus Rishikesh, Uttarakhand
Prof. M. H. Lim, School of Electrical and Electronics Engineering, Nanyang
Technological University, Singapore
Organizing Chairs
Program Chairs
v
vi Organizing Committee
Organizing Secretary
Publicity Chairs
This book contains outstanding research papers from the 4th International Conference
on Paradigms of Communication, Computing and Data Analytics (PCCDA 2024)
organized by Pt. Lalit Mohan Sharma Campus, Rishikesh, Sri Dev Suman Uttarak-
hand University, Uttarakhand, India, technically sponsored by Soft Computing
Research Society, India. The conference was conceived as a platform for dissem-
inating and exchanging ideas, concepts and results of the researchers from academia
and industry to develop a comprehensive understanding of the challenges of the
advancements in communication, computing and data analytics and innovative solu-
tions for current challenges in engineering and technology viewpoints. This book
will help strengthen the affable networking between academia and industry. The
conference focused on machine learning, deep learning algorithms, models and their
applications.
We have tried our best to enrich the quality of the PCCDA 2024 through a
stringent and careful peer-review process. PCCDA 2024 received many technical
contributed articles from distinguished participants from home and abroad. PCCDA
2024 received 443 research submissions. After a very stringent peer-reviewing
process, only 45 high-quality papers were finally accepted for presentation and the
final proceedings.
This book presents the second volume of 23 research papers on communication,
computing and data analytics and serves as reference material for advanced research.
ix
Contents
xi
xii Contents
Dr. Meng-Hiot Lim is a faculty in the School of Electrical and Electronic Engi-
neering. He previously held appointment as deputy director for the [Link]. in Financial
Engineering and the Centre for Financial Engineering, anchored at the Nanyang Busi-
ness School. He is a versatile researcher with diverse interests, with research focus
xv
xvi About the Editors
1 Introduction
Over the years, with the constant evolution of tools and technologies, there has also
been a dynamic shift in the way Web Applications function. ‘Web applications these
days are a set of interactions between various APIs, Services, End-points, etc.’.
These interactions encompass a range of boundaries both inside and outside the
organization. This has made the task of securing these applications more difficult
because of their open and distributed nature. The attack surface is large and ever-
evolving, making it difficult to defend as explained in detail in [1].
With the advent of cloud computing and its growing prevalence, ensuring the
security of data has only become more challenging. In 2019, a configuration error
in a web application’s firewall led to one of the worst data breaches of the previous
decade (in terms of number of exposed records) where the credit card data of around
106 million individuals was ex-filtrated. Instead of employing a Zero-Day exploit, the
attacker exploited various well-known vulnerabilities, including Server Side Request
Forgery (SSRF) [2].
Server Side Request Forgery is a web security vulnerability that enables the
attacker to have the server make a connection to services and devices within the
organization’s internal network and gain access to sensitive data [3].
SSRF is usually performed by using malicious URLs to manipulate the server
into making requests that will give the attacker unauthorized access to confidential
data.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 1
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
2 K. Kulkarni et al.
2 Background
We did an extensive review of various SSRF reports from numerous sources including
but not limited to HackerOne [4–10] Orange Tsai URL [11], Gwendal Le Coguic
URL [12], Enguerran Giller URL [13], Shorebreak Security URL [14], etc., which
allowed us to gain insights into the multifaceted nature of SSRF attacks and their
diverse manifestations in real-world scenarios.
In-Band: An In-Band attack usually occurs when the attack payload is sent to the
channel where client-server HTTP communication is taking place, i.e. the malicious
payload is inserted in the requests which are sent to the server.
Out-Of-Band (OOB): An Out-of-Band attack occurs when a pointer to the malicious
payload is sent to the victim’s server. Once the server references this pointer, the
malicious message will be delivered.
Bypassing Security Controls: SSRF can be exploited to bypass firewalls and other
security controls. By making requests to internal services not directly accessible from
the Internet, attackers can navigate around established security measures.
Exploiting Trusted Relationships: The trust relationships between different com-
ponents of a system can be exploited through SSRF. Attackers may impersonate
trusted internal servers, making unauthorized requests on their behalf and circum-
venting security measures.
Infrastructure and Cloud Resource Abuse: In cloud environments, SSRF can be
used to abuse cloud resources and services, resulting in financial losses for affected
organizations. The misuse of cloud infrastructure introduces a new dimension to the
potential consequences of SSRF attacks.
Reputation Damage and Regulatory Concerns: Successful SSRF attacks can lead
to reputational damage for organizations, especially if they result in data breaches
or service disruptions. Additionally, organizations may face regulatory and legal
consequences for failing to protect sensitive data.
Cloud Environment: Web applications hosted in cloud environments operate on
instances or virtual servers. Each instance stores metadata which is in turn organized
in a metadata server, accessible through its corresponding service. Attackers leverage
SSRF vulnerabilities to manipulate requests originating from the application to the
metadata service, stealing access credentials. These stolen credentials grant unau-
thorized access to services and resources, as demonstrated in high-profile incidents
such as the data breach that happened at Capital One. Prominent cloud providers
have unveiled updated editions to enhance requests for metadata service security, but
vulnerabilities persist in instances still using previous versions.
On-Premise Environment: In on-premise environments, where web applications
are hosted within the internal network, SSRF vulnerabilities can have distinct conse-
quences. Internal services, usually not accessible from outside the firewall, become
vulnerable to unauthorized access. Attackers can make requests from the internal
services of the application, devising payloads compatible with the service’s commu-
nication protocol. While some services can communicate in HTTP natively, other ser-
vices might not sustain this protocol but can still accept inputs from HTTP. Attackers
leverage these vulnerabilities to exploit services, as observed in incidents involving
Memcached and Redis services.
3 Related Work
The paper by Spath et al. [19] delves into the realm of Server-Side Request Forgery
(SSRF) attacks within XML parsing, shedding light on the diverse attack vectors
and countermeasures in this domain. It explores both classic and innovative SSRF
techniques, showcasing how XML parsing vulnerabilities can be exploited to send
4 K. Kulkarni et al.
requests on behalf of the parser to internal network endpoints. The classic SSRF
attack, exemplified through a DOCTYPE-based vector, demonstrates the ability to
invoke sensitive operations remotely. Moreover, the paper introduces an innovative
SSRF attack vector based on XInclude, emphasizing the versatility and evolving
nature of SSRF threats. Challenges such as parser feature vulnerabilities and firewall
restrictions are discussed, highlighting the complexities in mitigating SSRF risks. The
proposed countermeasures advocate for the prevention of insecure parser features
and the implementation of input validation strategies using whitelists or blacklists.
The comprehensive analysis done in the paper helped us gain valuable insights into
the intricacies of SSRF attacks and the various possible attack vectors which can be
used to achieve it.
The researcher Al-talak et al. [15] discusses the use of Deep Learning techniques
such as LSTM to detect SSRF attacks in web applications. The dataset is sourced from
the ‘Canadian Institute of Cybersecurity, New Brunswick’. The machine learning
model is trained and LSTM is applied to create an intelligent model capable of
accurately identifying SSRF attacks. According to the paper, LSTM is designed to
learn long-range dependencies by having a repeating edge inside each cell with a
weight ‘w = 1’. The proposed model achieves an accuracy rate of 96.9%. Machine
Learning and Deep Learning tend to be highly adaptable, making them extremely
suited to detection of evolving cyberattacks like SSRF. We incorporated a similar
approach in training our machine learning model.
In article [16] an innovative approach to defend against SSRF is proposed. Incom-
ing HTTP requests are examined and pattern matching and detection is done with
the help of Lua to identify the presence of URLs/URIs. A server isolated from the
Organization’s Internal Network is created. The address of the server is appended to
the URLs and they’re redirected to it. The method was able to provide an extra layer
of security but the redirection occasionally caused slight delays.
In article [17], the research explores a novel approach to classify different types of
URLs, including benign, defacement, spam, phishing, and malware, using supervised
learning with a focus on lexical features. The study employs a selected set of features
derived from URL components and applies machine learning classifiers, such as K-
Nearest Neighbors, C4.5, and Random Forest, to achieve classification accuracy of
97%. Additionally, the paper creates a dataset which has been used in article [15]
and consequently in our paper too.
As per the insights from [20] by Zeravan Arif Ali et al., XGBoost stands out for its
ability to model complex systems effectively, offering superior prediction accuracy
and versatility in classification tasks. The paper emphasizes XGBoost’s design focus
on optimizing devices and leveraging machine learning principles to enhance com-
putational efficiency. Its adaptive sampling and feature selection strategies reduce
overfitting risks, improving model generalization and predictive performance. The
emphasis on interpretability allows researchers to identify key factors influencing the
modeled systems. XGBoost’s use of multi-threading further speeds up model train-
ing, making it a preferred choice, especially for handling large and diverse datasets.
Given the complexity and diversity of our dataset with numerous parameters, we
opted for XGBoost due to its proven effectiveness in addressing such challenges.
1 Mitigating SSRF Threats: Integrating ML … 5
4 Proposed Approach
Upon receiving a URL request, the IDS becomes the first line of defense. It meticu-
lously assesses the URL and ascertains whether it poses a threat to the system. This
examination involves the analysis of various attributes, including, but not limited to,
URL structure, source reputation, and payload characteristics. If the IDS determines
the URL to be benign, the reverse proxy seamlessly allows access to the requested
resource.
Using Machine Learning to classify the URLs makes the proposed solution more
flexible as a machine learning model would adapt to a new attack vector significantly
faster than a static Rule-based system would.
In cases where the IDS flags a URL as potentially malicious, the system adopts a
cautious approach to avoid false positives. Instead of outrightly blocking the URL,
the system forwards it to a dedicated helper server, which operates in isolation from
the main network infrastructure. This server acts as a secondary layer of defense,
executing the requested action in an environment devoid of access to internal network
components.
6 K. Kulkarni et al.
The isolated helper server provides a controlled environment for the execution of
potentially malicious requests. By segregating this server from the in-house network
(blocked from accessing private IP addresses and local host), any harmful effects
stemming from a confirmed malicious URL are contained within the isolated envi-
ronment. This strategic segregation ensures that in case the URL executes a malicious
payload, it cannot compromise other internal network elements.
To maintain optimal system performance, only URLs deemed suspicious by the IDS
are routed to the ‘Helper’ server for further analysis and execution. This selective
approach is motivated by the imperative to minimize latency and response time. By
focusing resources on potentially malicious requests, the system can swiftly and
efficiently process genuine threats without incurring unnecessary delays.
Initially, a user inputs a URL on a website interface, which triggers the process.
The system then extracts various features from the URL, such as domain token
count, argument length, and the presence of suspicious tokens, among others. These
features are sent to a script, which utilizes an XGBoost classifier to assess if the URL
is malicious or benign.
As seen in Fig. 2, if the algorithm’s prediction value for the URL is below a certain
threshold, the URL is classified as malicious. In response to a malicious request, the
workflow reroutes the request to a helper server that is isolated from the internal
network of the organization. This isolation is achieved by blocking specific ranges of
IP addresses, thereby preventing the helper server from accessing internal network
resources or local host addresses. The helper server attempts to fetch the data from the
URL, but due to the isolation, any access to sensitive internal resources is thwarted,
effectively mitigating the risk of SSRF attacks.
Figure 3 describes a scenario where the predicted value is greater than the thresh-
old, and the URL is classified as benign. The result is then displayed to the user, and
the URL is hence deemed safe. The request proceeds to the web server as usual.
8 K. Kulkarni et al.
5 Evaluation
The models were trained on a dataset sourced from the ‘Canadian Institute for Cyber-
security at the University of New Brunswick’. This dataset served as a resource,
providing instances of both benign and malicious URLs.
n
H (x) = − p(xi ) logb p(xi ) (1)
i=0
Here, H represents the entropy of the domain, P(x) denotes the probability of the
character x occurring in the domain, and log2 is the base-2 logarithm function. This
calculation method enables the identification of patterns indicative of attempts to
conceal malicious content or obfuscate the true nature of the URL.
Taking only the above-mentioned columns, the dataset is partitioned into training
and testing sets using a stratified splitting strategy. This ensures a representative distri-
bution of class labels in both sets, with 80% allocated for training and 20% for testing.
To mitigate the impact of varying scales among features, a standardization process is
applied. The features in the training set (X_train) are scaled using the Standard Scaler,
and the same transformation is applied to the testing set (X_test) for consistency. The
scaler, responsible for standardizing the features, is serialized and stored using the
joblib library. This serialized scaler, denoted as ‘modelName_scaler.joblib,’ serves
to standardize future data in a consistent manner during model evaluation.
After doing the above pre-processing on the dataset, a series of experiments were
conducted employing various ML algorithms. The evaluation aimed to assess the sys-
tem’s ability to accurately identify and classify URLs as benign or malicious. Several
algorithms were implemented and evaluated, and their corresponding accuracies are
summarized in Table 1.
In addressing the detection of Server-Side Request Forgery (SSRF) with URLs,
a combination of LSTM and Bi-LSTM models was selected due to their sequen-
tial processing capabilities, making them effective at capturing intricate patterns in
URL sequences. These models are well-suited for learning dependencies over time,
which is crucial for detecting anomalies and suspicious patterns in URL requests.
Additionally, ensemble methods such as Random Forest, Decision Tree, XGBoost,
and AdaBoost were included to enhance the overall robustness and accuracy of the
detection system.
The rationale behind incorporating ensemble methods lies in their ability to lever-
age the strengths of multiple models, thus improving the system’s resilience against
various types of attacks and data variations. Specifically, XGBoost’s exceptional
accuracy of 98.55% made it a compelling choice for integration as an intrusion
detection system (IDS) within the network architecture.
The optimization of the XGBoost algorithm involved using Bayesian Search to
fine-tune hyperparameters like Learning Rate and Max Depth. The search space
for hyperparameters was defined based on prior knowledge and empirical testing,
ensuring a thorough exploration of potential settings. For instance, a log-uniform
distribution between 0.1 and 1.0 was used for the learning rate, covering a broad range
of values and validated through empirical testing to optimize the model effectively.
6 Results
The algorithm achieved Precision, Recall, and F1 score values of 0.977, 0.994, and
0.985, respectively, as indicated in Table 2. These metrics indicate a high accuracy,
which is a fundamental requirement for an IDS. The high precision indicates that there
are fewer false negatives (Malicious URLs predicted to be Benign). Having minimum
false negatives is essential to ensure that the web server, which is connected to the
private network of the organization, is protected. Figure 4 illustrates the confusion
matrix, providing a deeper insight into the model’s performance. Table 3 details the
counts of true positives, false negatives, true negatives, and false positives extracted
from the confusion matrix.
Alongside the minimal occurrence of false positives (benign URLs predicted to
be malicious), the suggested network architecture effectively manages these cases
by redirecting requests to an isolated server, ensuring that the required data is fetched
unless the request attempts to access internal private network addresses or the Local
host.
7 Conclusion
In this paper, an efficient network architecture is proposed that uses Machine Learn-
ing for detecting and preventing SSRF attacks while also keeping in mind the man-
agement of web traffic by load balancing depending on the probability of an URL
being malicious. Numerous Machine Learning models were implemented, and their
respective inaccuracies were systematically compared and analyzed. Furthermore,
the model with the highest accuracy (98.55%) was selected and used to classify
URLs into malicious and benign. To offer a more comprehensive solution to the
problem, an IPS was proposed where malicious URLs are redirected to a dedicated
‘Helper’ Server which is isolated from the database and the internal network to limit
its access to sensitive information. There lies a risk of the additional server being
overwhelmed by requests classified as malicious. To mitigate this risk, the imple-
mentation of rate limiting is a viable strategy. Rate limiting controls the frequency
of client requests over a specified interval. A notable trade-off is the possibility of
erroneously classifying benign URLs as malicious, which may result in legitimate
14 K. Kulkarni et al.
requests being unfairly restricted. However, the low number of false positives noted
upon testing the classification model suggests that the likelihood of benign requests
being misclassified is sufficiently low for the trade-off to be feasible. Since an over-
whelmed additional server would imply a sudden influx in malicious requests, the
establishment of monitoring and alerting systems would also be helpful to ensure
proactive security measures can be taken.
While our study has demonstrated impressive accuracies in detecting malicious
URLs, it’s essential to acknowledge that the dataset, although valuable, may have
certain limitations. One such aspect is the dataset’s potential lack of diversity, which
could contribute to the high accuracies observed. This lack of diversity, while not
undermining the dataset’s significance, is a common challenge in the field of machine
learning-based threat detection. It’s worth noting that real-world scenarios often
present a wider range of malicious URL variations and obfuscation techniques that
may not be fully represented in the dataset. Our study highlights the effectiveness
of our approach within the context of the dataset provided, and future work could
explore strategies to enhance model robustness in more diverse and dynamic threat
landscapes.
References
1. Hoffman A (2020) Web application security: exploitation and countermeasures for modern
web applications. O’Reilly, USA
2. Khan S, Kabanov I, Hua Y, Madnick S (2022) A systematic analysis of the capital one data
Breach: critical lessons learned. ACM Trans Privacy Secur 26(1):29. [Link]
3546068
3. PortSwigger (2023) Server side request forgery. [Link]
Accessed 25 Nov 2023
4. HackerOne (2023) SSRF in [Link] via ?url= parameter. [Link]
514224. Accessed 12 May 2023
5. HackerOne (2023) SSRF in exchange leads to ROOT access in all instances. [Link]
com/reports/341876. Accessed 18 May 2023
6. HackerOne (2023) SSRF on (blank) allowing internal server data access. [Link]
com/reports/326040. Accessed 23 May 2023
7. HackerOne (2023) Bypass of the SSRF protection in Event Subscriptions parameter. https://
[Link]/reports/386292. Accessed 23 May 2023
8. HackerOne (2023) SSRF in upload IMG through URL. [Link]
Accessed 1 June 2023
9. HackerOne (2023) Server-side request forgery using Javascript allows exfill data from Google
metadata. [Link] Accessed 2 Jun 2023
10. HackerOne (2023) SSRF in webhooks leads to AWS private keys disclosure. [Link]
com/reports/508459. Accessed 2 June 2023
11. Tsai O (2023) How i chained 4 vulnerabilities on GitHub enterprise, from SSRF execution chain
to RCE! [Link] Accessed 3
June 2023
12. 10degres (2023) AWS takeover through SSRF in JavaScript. [Link]
ssrf-javascript/. Accessed 4 June 2023
13. Opnsec (2023) Into the Borg – SSRF inside Google production network. [Link]
2018/07/into-the-borg-ssrf-inside-google-production-network/. Accessed 4 June 2023
1 Mitigating SSRF Threats: Integrating ML … 15
14. Shorebreak Security (2023) SURF’s up! Real world server-side request forgery
(SSRF). [Link]
forgery-ssrf/. Accessed 5 June 2023
15. Detecting server side request forgery (SSRF) attack by using deep learning. IJACSA: Int J Adv
Comput Sci Appl 12(12). [Link]
Server_Side_Request_Forgery.pdf
16. Jabiyev B, Mirzaei O, Kharraz A, Kirda E (2021) Preventing server-side request forgery attacks.
In: Proceedings of the 36th annual ACM symposium on applied computing (SAC ’21), associ-
ation for computing machinery, New York, NY, USA, pp 1626–1635. [Link]
3412841.3442036
17. Mamun M, Rathore M, Habibi Lashkari A, Stakhanova N, Ghorbani A (2016) Detecting mali-
cious URLs using lexical analysis, pp 467–482. [Link]
1_30
18. Vanin P, Newe T, Dhirani LL, O’Connell E, O’Shea D, Lee B, Rao M (2022) A study of
network intrusion detection systems using artificial intelligence/machine learning. Appl Sci
12(22):11752. [Link]
19. Späth C, Mainka C, Mladenov V, Schwenk J (2016) SoK: XML parser vulnerabilities.
In: Proceedings of the 10th USENIX conference on offensive technologies (WOOT’16).
USENIX Association, USA, pp 141–154. [Link]
woot16/[Link]
20. Ali Z, Abduljabbar Z, Tahir H, Sallow A, Almufti S (2023) Exploring the power of eXtreme
gradient boosting algorithm in machine learning: a review. Acad J Nawroz Univ 12:320–
334. [Link] [Link]
ajnu/article/view/1612
21. GitHub Repository:ssrf-prevention-main. [Link]
Chapter 2
Multi-directional Molecular
Communication Among Nanomachines:
Modeling and Verification
1 Introduction
Nanomachines are the core operational units of nanosystems. They perform specific
functions like communication, calculation, data preservation, detection, and action at
the atomic level [1, 2]. Nanomachines working together form nanonetworks, which
enhance their individual capabilities. These networks allow nanomachines to collab-
orate on complex tasks, expanding their potential applications [3, 4]. Molecular com-
munication is a promising method for enabling nanomachines to interact. Inspired by
biological systems that use molecules to communicate, this approach involves send-
ing information-carrying molecules between nanomachines. These molecules trigger
biochemical reactions at their destination, allowing for effective communication at
the nanoscale [2, 5, 6]. Understanding how molecules move through a medium is a
key challenge in molecular communication [1, 7]. In diffusion-based molecular com-
munication, the molecular channel is the pathway between nanomachines [8–10].
Calcium signaling is a molecular communication method that employs calcium ions
as information carriers. These ions are released by a sender nanomachine and act as
messengers to convey information to one or multiple recipient nanomachines [11].
Calcium signaling is a biological process where cells communicate directly with
each other. This occurs through gap junctions in cell membranes, which permit the
transfer of molecules, including calcium ions, between adjacent cells [1].
In [12], the central objective is to establish a model for a time-slotted communica-
tion system among nanoscale machines within a one-dimensional environment. This
system incorporates bio-inspired rules assessed during each interval. To validate the
A. J. Jani (B)
Al-Nahrain University, Kadhmiya, Baghdad, Iraq
e-mail: athraajuhi@[Link]
J. J. Jani
Ministry of Labour and Social Affairs, Baghdad, Iraq
e-mail: jafaraljani@[Link]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 17
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
18 A. J. Jani and J. J. Jani
proposed model, diverse network sizes were examined using the probabilistic model
checking tool PRISM.
The authors in [13] propose a channel model for molecular single input multiple
output (SIMO) systems to estimate their channel response, where, the model charac-
terized by its recursive nature, provides an analytically derived closed-form solution
for the channel response of molecular 2-Rx SIMO systems. Additionally, a simpli-
fied model with lower complexity has been presented, offering a trade-off between
computational efficiency and slightly reduced accuracy in channel estimation. These
models have been extended to encompass molecular SIMO systems with more than
two receivers. The performance of these methods has been evaluated across various
topologies with different parameters, and the model’s accuracy has been verified
by comparing it to computer-simulated channel estimations, employing quantitative
error metrics such as root-mean-squared error. The effectiveness of the simplified
model is affirmed by assessing the level of deviation, indicating satisfactory channel
modeling performance with reduced computational requirements.
The authors in [14] explore enhancing interaction with biological systems using
nano-scale devices that communicate through molecular signals. Micro-scale inter-
mediaries facilitate communication, with specific molecules representing informa-
tion bits. Diffusion-related interference is mitigated through statistical methods.
The authors present a transmission model and numerical simulations, evaluating
the impact of transmitter scheduling and signal strength on the bit error rate.
In [15] the authors explore molecular communication, by envisioning nanonet-
works facilitating collective communication on attributes like scent, taste, light, or
chemical states. The focus has been on modeling different communication channels
within nanonetworks—molecular multiple-access, broadcast, and relay channels—
with calculated capacity expressions. Numerical results suggested that, with optimal
molecular communication parameters, multiple nanomachines can efficiently access
a single nanomachine. Molecular broadcast enabled one nanomachine to commu-
nicate with several others. The integration of molecular multiple-access and broad-
cast channels has highlighted the potential of molecular relay channels to enhance
communication capacity between two nanomachines through a relay nanomachine.
In [16], the primary focus lies in developing a capacity expression for a single
molecular channel, involving communication between a Transmitter Nanomachine
(TN) and a single Receiver Nanomachine (RN). Additionally, the authors explore
the capacity of a molecular multiple-access channel, where multiple TNs commu-
nicate with a single RN. Numerical findings have indicated that both single and
multiple-access molecular channels have the potential to achieve high molecular
communication capacities.
This paper introduces a novel multi-directional molecular communication model
for nanomachines, inspired by natural systems. The model allows nanomachines in
a network to exchange information by releasing and detecting molecular concentra-
tions. Each nanomachine operates on a synchronized schedule, taking turns to send
and receive messages. The proposed model improves reliability by considering vari-
ous transmission methods, including relay nodes, and accounting for potential com-
munication failures. Inspired by interconnected biological systems and incorporating
2 Multi-directional Molecular Communication Among Nanomachines … 19
aspects of calcium signaling and the Abelian Sandpile Model, the communication
channel is designed to simulate the complex interactions between nanomachines [17].
The paper employs the PRISM model checker to simulate and analyze the proposed
multi-directional molecular communication model across different network sizes.
By evaluating transmission and reception success rates under various conditions, the
study provides valuable insights into the strengths and weaknesses of this communi-
cation method in nanoscale environments. The findings contribute to understanding
the reliability and efficiency of molecular communication in complex networks.
2 Proposed Model
In our setup, there are four nanomachines denoted as Si , where i belongs to the
set 1, 2, . . . , n. Each nanomachine has two functions: it can either send information
(transmission mode) or receive it (reception mode). These nanomachines commu-
nicate with each other through a network of connected spots or nodes, as shown in
Fig. 1.
Figure 1 shows a representation of the proposed model of a multi-directional
nanonetwork consisting of four nanomachines (S1 , S2 , S3 , and S4 ) which are con-
nected through channel nodes. The channel nodes are represented by squares; they
are labeled as Ch x y Z , where x y indicates the connected nanomachines and Z indi-
cates the node’s position within the channel. For example, the node labeled Ch 24 2
is connected to nanomachines S2 and S4 , and it is the second node in the channel
between them. Thus, to give another example, S1 and S2 are connected by a channel
containing the following nodes Ch 12 1, Ch 12 2, and Ch 12 3.
Then
Ch x y (Z + 1)(τ ) = Ch x y (Z + 1)(τ −1) + e Z (1)
repeat
for each nanomachine Si do
if state flag is R then
Receive α molecular concentration
Update τlocal += τ
else if state flag is T then
if τlocal < log(q/L) then
if random number < p then
if C[xi]1 ≥ q then
Send q molecules to connected channel nodes (Ch [xi] Z ): C[xi] Z += q (with
failure probability )
Update C[xi]1 −= q
else
Not enough concentration; remain in transmission mode
Update τlocal += τ
end if
else
Remain in transmission mode
Update τlocal += τ
end if
else
Switch state flag to R
Reset τlocal = 0
end if
end if
end for
for each channel node Ch [x y] Z do
if C[x y] Z > L then
Calculate excess molecules: e Z = C[x y] Z − L
Send e Z to neighboring node (if possible): C[x y]( Z +1) += e Z (only if buffer allows and
not exceeding E)
Update current concentration: C[x y] Z = L
end if
end for
until termination condition is met
2 Multi-directional Molecular Communication Among Nanomachines … 23
The algorithm progresses through each time slot, where a time slot represents
a discrete time interval during which communication operations take place. Con-
currently, it updates the local clock of each nanomachine to ensure synchronization
across the network. Utilizing this synchronized clock, each nanomachine evaluates
whether it should engage in transmission or reception mode during the current time
slot. When a nanomachine operates in transmission mode, it computes the transmis-
sion probability ( p), determining its transmission action based on this probability.
If transmission occurs, the nanomachine computes the quantity of molecular con-
centration (q) to be transmitted. Should this transmitted concentration surpass the
predefined threshold (L), surplus molecules are distributed to neighboring nodes
within the channel. When a nanomachine operates in reception mode, it accepts
molecular concentrations from neighboring nodes within the channel and adjusts
its local molecular concentration accordingly. Special cases, such as transmitting to
both sides of a nanomachine, are considered within the algorithm, alongside checks
for transmission and reception failures. The algorithm iterates through the aforemen-
tioned steps for each time slot until either the communication process concludes or a
termination condition is satisfied. Hence, the total time complexity of the algorithm
is contingent upon both the quantity of time slots and the count of nanomachines.
We used the PRISM verification tool to thoroughly assess and compare the outcomes
of our model. This advanced tool not only enhances our model’s capabilities but also
helps us effectively evaluate results, especially in complex scenarios.
PRISM is a flexible tool tailored for handling probabilistic models in real-world
situations. It allows for defining probabilities within the model itself and in the prop-
erties under examination. Additionally, the software can determine the probability
of failure for specific properties once the verification process is complete [22]. This
tool operates by taking a system description as input, typically written in PRISM’s
system description language. This language is an extension of Reactive Modules,
tailored for probabilistic scenarios. Subsequently, the tool constructs a model based
on this description and generates a set of reachable states [23]. PRISM offers a
noteworthy feature: step-by-step simulation. This functionality allows users to select
system variables for manipulation and define their initial values. During simulation,
users have the option to manually guide the process by organizing or randomizing
steps. When opting for randomization, users can specify the number of random steps
the program should simulate [22]. The model was conceptualized as a discrete-time
Markov Chain (DTMC) and subjected to examination using the probabilistic model
checker PRISM. A DTMC module functions as a transition system in which each
transition is associated with a probability, guaranteeing that the cumulative proba-
bility of all outgoing transitions from a particular state equals one. This arrangement
creates a probability space encompassing infinite paths within the model, enabling
the quantitative assessment of the likelihood of specific events transpiring [23].
24 A. J. Jani and J. J. Jani
Results Verification: The PRISM model was created in accordance with the guide-
lines outlined in Algorithm 1. As per the algorithm, during each time interval τlocal ,
a nanomachine has the opportunity to transmit a molecular concentration q, with a
probability of p = 41 . In transmission mode, a nanomachine can send this concentra-
tion to a neighbor on one side or to neighbors (i.e., channel nodes) on both sides by
transmitting q twice within the same τlocal . However, in reception mode, a nanoma-
chine cannot receive molecular concentrations sent from neighbors on both sides,
as this could lead to communication jamming, resulting in transmission or reception
failure.
Two experiments have been implemented on a model with a channel of two
nodes Ch [x y]Z between each two nanomachines Si , and a model with a channel
of five nodes Ch [x y]Z between each two nanomachines Si . The initial values of
molecule concentration in all the channel nodes are L. The values of thresholds in both
experiments were E = 15, L = 5; the initial values of the molecular concentration
in the four nanomachines were 8, 6, 7, 9; and the value of q = 3.
Four properties were subjected to verification, focusing on the success and fail-
ure of both transmission and reception processes. Flags are employed to indicate
the mode of nanomachines, with Si send = tr ue signifying transmission mode and
Si r eceive = tr ue indicating reception mode.
The first property, called Send-Success, is satisfied under the following condi-
tions:
• One nanomachine is in transmission mode while all other nanomachines are in
reception mode.
• The transmitting nanomachine sends q molecular concentration to its neighbor.
Failure to meet any of these conditions disables the Send-Fail property, which is
the second property verified.
Similarly, the property Receive-Success holds true under the following circum-
stances:
• One nanomachine is in reception mode while all other nanomachines are in
transmission mode.
• The receiving nanomachine successfully receives q molecular concentration from
its neighbor.
Failure to meet any of these conditions disables the Receive-Fail property.
Hence, the properties subject to verification are as follows:
The figures presented below display the results of the property verification:
Figure 2 displays the outcomes of verifying two properties: (3) and (4). In this
figure, the dotted line signifies the probability of failure in sending (i.e., the result of
verifying Property (4)) in two experiments conducted on networks of different sizes.
Specifically, the square symbol represents the experiment with a network featuring a
two-node channel, while the circle symbol represents the experiment with a network
featuring a five-node channel. Conversely, the solid line in Fig. 2 illustrates the
probability of success in sending (i.e., the result of verifying Property (3)) in the two
experiments conducted on networks of different sizes.
Figure 3 illustrates the outcomes of verifying the success and failure properties
in receiving. In this figure, the dotted line indicates the probability of failure in
receiving (i.e., the result of verifying Property (6)) in two experiments conducted on
networks of different sizes. Specifically, the square symbol represents the experiment
with a two-node channel network, while the circle symbol represents the experiment
with a five-node channel network. Conversely, the solid line in Fig. 3 depicts the
probability of success in receiving (i.e., the result of verifying Property (5)) in the
two experiments conducted on networks of different sizes.
26 A. J. Jani and J. J. Jani
4 Conclusion
References
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 29
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
30 K. S. Shivesh et al.
in these health records. Existing access control mechanisms helped strengthen the
shared access of EHRs through the use of blockchain technology. This approach
helped improve data confidentiality and security [1]. The introduction of Role Based
Access Control aims to enhance the existing mechanism through the use of various
roles such as doctors, nurses, lab technicians, and patients. By storing the EHR in an
InterPlanetary File System (IPFS), we ensure data integrity and efficient data acces-
sibility. This paper aims to introduce a new paradigm in EHR access management,
enabled by the immutability of blockchain networks and by the secure storage of the
IPFS.
Role Based Access Control (RBAC) is a proven access control technique that provides
a simple and efficient way to manage who has access to what within an organization’s
digital ecosystem [2]. It specifies the roles that are to be carried out by the users like
doctors, nurses, lab technicians, and patients, and then specifies the possible actions
or access to data that can be done according to their roles. This ensures that the right
people have the right level of access to the information. It also safeguards against
data breaches, unauthorized access, and misuse of sensitive information.
This paper provides a comprehensive analysis of the performance and resource
utilization aspects of an RBAC system. Specifically, a meticulously designed script is
used to benchmark the time to perform operations for granting access and verifying
access, as well as CPU and memory usage while performing those pivotal tasks in an
RBAC system. The script compares two types of RBAC models, Flat and Hierarchy,
using a Python library. Additionally, the graphical representations demonstrate the
comparison of how the two RBAC models perform in the system. The major portion
of the graphical presentation shows that the Hierarchy model is superior to the Flat
model in terms of access time and resource consumption. The following chapters
elaborate on the results of the two RBAC models.
2 Related Works
In their discussion of the security concerns with moving electronic health records
to cloud storage, Zhou et al. [2] emphasized the significance of limiting unwanted
access. To enforce policies, they suggested cryptographic access control systems and
role-based encryption mechanisms.
Sookhak et al. [1] proposed the usage of blockchain-related concepts to provide
strong access control which explores the impact of technologies like eHealth on
secure EHR management. It highlighted the important security requirements in
designing fault-tolerant access control.
3 Secure Sharing of Electronic Health Records Using Smart Contracts … 31
Saini et al. [3] addressed issues with centralization of health care with scattered
electronic medical records (EMRs), proposing a blockchain-based access control
framework. Smart contracts are employed to secure EMR sharing among different
entities in the smart healthcare system. Various types of smart contracts were used
for user access authorization, monitoring activities, and access revocation. Hashes
are kept in the blockchain, and patient digital health records are saved in the cloud in
an encrypted form. A private Ethereum system was used to evaluate the effectiveness
of the system.
Guo et al. [4] and Yang et al. [5] proposed the combination of blockchain and edge
nodes for controlling access to Electronic Health Record (EHR) data. Off-chain edge
nodes hold medical data while blockchain manages access to records. Identification
and access control policies are managed by a blockchain-based controller. Their anal-
ysis, which is centered on preventing unwanted data extraction, uses the Hyperledger
Composer Fabric blockchain to measure transaction processing and response time
while analyzing the effectiveness of smart contracts and policy implementation.
Ekblaw et al. [6] introduced MedRec, a blockchain-based EHR management
system facilitating secure access, authentication, confidentiality, and data sharing
for patients; Linn et al. [7] proposed a blockchain based access control manager
for medical records. This supports precision medicine helping in industry inter-
operability. MedRec encourages stakeholders to act as “miners”, promotes data
economics, and interacts with current solutions. These developments could improve
patient health accountability and research.
The attitude of patients toward blockchain-enabled health information exchange
(HIE) was examined by Esmaeilzadeh et al. [8]. It dealt with the ignorance about
how patients feel about this technology. Respondents from 2013 participated in
sixteen web-based tests that examined various HIE scenarios. Remarkably, patients
show a propensity for blockchain-based privacy-preserving and information-sharing
systems. The report emphasizes how blockchain technology in health information
exchange (HIE) can be used to protect and commercialize health data interchange,
while also outlining its drawbacks.
Dagher et al. [9] discuss how efforts to improve the security of electronic health
records have not stopped recurrent data breaches involving patient information. It
showcases the Ancile blockchain-based platform, which aims to preserve patient
data privacy while offering safe and conveniently accessible medical information
for patients, providers, and other stakeholders. Ancile uses advanced cryptography
and Ethereum-based smart contracts to achieve improved access control and data
obfuscation. The purpose of the article is to examine how Ancile responds to the
demands of different stakeholders.
Mettler [10] emphasized how blockchain technology is becoming more and more
flexible in a variety of industries, including health care. Although its use has primarily
been in the financial sector, it is expanding to other industries, like health care.
The paper examines a number of blockchain-related applications in health care,
including managing public health, user-centered medical research, and combating
pharmaceutical counterfeiting.
32 K. S. Shivesh et al.
As illustrated in Fig. 1, all the actors interact with the blockchain in 2 ways. The
metadata about the medical records, data about access control, and actor details are
stored in the core blockchain. The actual medical records are stored in the Inter-
Planetary File System (IPFS) which is a blockchain platform designed especially for
file storage. Actors interact with the core blockchain and IPFS. Patients can upload
their medical record to the IPFS. The access rights of the files can be modified by
either granting or revoking access to external entities. The access rights are stored
in the core blockchain. Doctors can request access to a particular patient’s medical
record. When granted access, they can view records or update them according to
the access permission granted. Administrators can assign patients to doctors. Other
entities include lab technician, nurse, etc. Other entities can only view the records
and cannot write.
The user interaction and functionality of the system contain actions that ensure
secure sharing of EHRs. The login process is the initial step, requiring authentication
to guarantee the legitimacy of user access. This login action can be further improved to
provide feedback in case of invalid usernames or passwords to improve the interface.
View Medical Records, Share Medical Records, and Record Tracking and Revocation
are some of the actions found in the dashboard page, the page where the user is
navigated to after logging in.
Users have to undergo a verification process based on Role-Based Access Control
(RBAC) to access medical data. This mechanism ensures that users can only view
records for which they are authorized. The system displays the respective medical
records after authorizing the users.
As shown in Fig. 2, users can share their own medical records, or the ones that they
are authorized to share, to other users. Once the user shares to recipients, the system
generates access control rules. Then the selected recipients are notified. The access
control rules are checked when a user shares the medical records. Record Tracking
and Revocation functionality allows the users to monitor and track who has access
to their medical records. This transparency improves user awareness and helps them
control their sensitive information. Users can also revoke access granted to specific
individuals whenever necessary. This revocation feature helps in maintaining the
privacy and security of medical records. This allows users to adapt access permissions
in response to changing circumstances or preferences.
4 System Implementation
RBAC module offers strong characteristics which help in developing a safer system
for controlling the access requests, permissions, and user roles for any system in
Python. Such a concrete example is the creation of a specific domain, represented
by the “MedicalRecord” class, which is precisely designed to deal with medical
records within an EHR system. Roles define user rights with some of them like
“can_create”, “can_read”, and “can_update” created to define rights with aspects
of specificity. Accesses like create, read, and update are mutual with permissions
and duties as well. Even though the library utilizes the “add_permission” method
to define the correspondence between roles and permissions, allowing the users to
have a relatively broad set of rights within the system, they gain a rather detailed
set of permissive permissions to do something. For instance, the role “can_create” is
assigned with the “create” permission and therefore, any user assigned this role will
be able to start the process of creating a medical record.
Healthcare consumers make up the highest population of the system’s users, and
hence they will be represented by the “User” class. What is more, users are assigned
to particular authorities and privileges creating the broad opportunities to control
their actions step by step. On the other hand, the “Patient” class represents the actual
users of this application and is given the absolute decision-making power on whether
to allow or not permit the doctors to access personal data. The “request_permission”
function is used to help doctors to make an application for permission, and patients
who receive such an application have every right to agree or reject such an application
after much evaluation. The dynamic interchange here involves shifting from an open
system of decision-making structure that encompasses the end-user, the patient as the
case may be, which would facilitate the patient-centric model of care. This is done by
3 Secure Sharing of Electronic Health Records Using Smart Contracts … 35
using this “[Link]” method, which basically will control the user permission while
avoiding users to access the information they are not allowed to. When users have
no authorization to a particular data, then the system throws an exception to the user
that he/she is unauthorized to access particular data.
The large-scale RBAC system required to manage interactions between patient and
physician is established to support these large-scale engagements. It exemplifies the
system’s extensibility and flexibility, and it presents doctors with the capability to get
permission for tasks that include viewing or altering medical information. Because
of this RBAC approach of tracking and managing user access, the RBAC library
36 K. S. Shivesh et al.
Smart contracts are tested for their effectiveness and security. This can be done on a
local blockchain environment with the help of various software such as Ganache to
mimic the blockchain performance or on a testnet, which is a simulated network that is
replicated on real mainnet conditions but with some hypothetical data. To achieve this,
proper testing has to be conducted in order to discover potential bugs and also check
and ascertain that the smart contract undergoes the right processes. The smart contract
in the proposed system’s architecture contains a data structure called “Record” which
incorporates the patient’s details, namely the patient’s name and their details. By
using two mappings, the contract maps “records” and “authorizedDoctors”, the first
of which maps Ethereum addresses of the patient to records. The “records” mapping
stores addresses of the records, and the “authorizedDoctors” mapping preserves
authorized Ethereum addresses for these records.
Access control within the smart contract is implemented through two distinct
modifiers: That is why they were given labels such as “onlyOwner” and “onlyDoc-
tor”. The “onlyOwner” modifier restricts some functions so that they would only be
called by the contract owner who developed and deployed the contract. The addition
of the modifier “onlyDoctor” helps to guarantee that some functions can only be
performed by Ethereum addressees who have been approved as doctors to prevent
unauthorized modification of patient’s records. Contract owners will be able to add
or remove authorized doctors. The authorized doctors can create new patient records,
modify the records, and retrieve the records that are related to the Ethereum address
of the patient. This has been made possible through the blockchain base computa-
tion system that guarantees the security and exclusive access to the patient’s data,
bringing along the aspect of transparency in the management of healthcare records.
5 Evaluation Metrics
The proposed system utilizes a script to measure how long access grant and access
verification operations take and to look for resource utilization—specifically, CPU
and memory—during the operations in a Role-Based Access Control (RBAC)
system. The test program uses an rbac object to represent a hypothetical RBAC
system which has methods for creating users and authorizing different actions. The
objective of the first test is to find the time taken to grant access to a user. The
function, “measure_access_grant_time” initializes a timer and simulates a system of
creating 10,000 users and authorizing various actions among them like create, read,
3 Secure Sharing of Electronic Health Records Using Smart Contracts … 37
and update. The total time we get after this simulation is then converted to millisec-
onds. This is labeled as “Access Grant Time”. For the next test, our objective is to
calculate the time taken to verify access for a user. The function “measure_access_
verification_time” catches all the authorization errors using the [Link] method. The
time taken to perform all the access verification operations is labeled as “Access
Verification Time”.
The final part of the script is to measure the total amount of CPU and memory
used during 1,000,000 access grant operations. The “measure_resource_utilization”
method retrieves the amount of CPU and memory used, via the psutil library. The
change in CPU usage is calculated and displayed as “CPU Usage Change” while the
change in memory usage is displayed as “Memory Usage Change”. The following
test is to compare the performance and usage of resources during access control
between the flat and the hierarchical models of the RBAC system.
By utilizing a Python script, we simulated and compared the two RBAC models:
Flat and Hierarchy. The following evaluation metrics are obtained by averaging the
values got from 10 simulations of access grant and verification along with resource
utilization. Significant differences are noted in the evaluation of the performance
metrics between the two models during simulation. The Access Grant Time and the
Access Verification Time for the Flat model are 25.56 ms and 11.36 ms respectively.
The change in CPU usage is 53.90%, while the change in memory usage is 1.74%. For
the Hierarchy model, the Access Grant Time averaged 16.896 ms whereas the Access
Verification Time took nearly 8.43 ms. Also, the change in CPU usage averages to
46.83% and the change in memory usage averages to 1.73%. On observing these
results, we can clearly see the efficiency of the Hierarchical model over the Flat
model. The obtained performance results gives the Hierarchical model a huge edge
over the Flat model in terms of access times and resource utilization. Overall, the
Hierarchy model gives faster access times and consumes less amount of resources.
The graphical representation of the comparisons of the flat and hierarchical RBAC
models are illustrated below where the blue line highlights the flat model while the
orange line accounts for the Hierarchical RBAC model. The number of iterations
is denoted by the x axis while the y axis represents various performance evaluation
metrics.
From the graph in Fig. 3, we can see how efficient and fast the Hierarchical model
is in granting access to the requested roles. The hierarchical model is fast as opposed
to the flat model, in the low and high number of iterations, respectively, for granting
access in less time. It is, on average, faster by 4 ms. From the graph in Fig. 3, we can
notice how the Hierarchical model is slightly faster than the Flat model for access
verification time. At 0 to 3000 and beyond 9000 iteration accesses, the hierarchical
model is faster than the flat model by approximately 1 ms. At 3000 to 9000 iterations,
the hierarchical model somewhat outbalances the flat model.
From the graph in Fig. 4, it can be observed that less CPU is consumed by the
Hierarchical model in the execution of access grant and verification processes. For
the number of iterations in the range of 0 to 2000, the flat model consumes less CPU
as compared to the hierarchical model. However, when the number of iterations
crosses a limit, the CPU utilized in the hierarchical model is 30% lesser than that of
38 K. S. Shivesh et al.
the flat model. This indicates that the hierarchical model is suitable for use cases that
have a large number of users.
From the graph in Fig. 4, it can be analyzed that the hierarchical model performs
a bit better than the flat model in terms of the memory utilized during the access veri-
fication and grant processes. Though there is an increase in the number of iterations,
which leads to some peaks and troughs in the graphs, the average memory consump-
tion is lesser in the case of the hierarchical model compared to that of the flat model.
Based on all the above aspects, the hierarchical model performs better than the flat
model, and therefore, it can be integrated with the proposed blockchain-based EHR
management system.
6 Conclusion
The blockchain technology can assure that EHRs are tamper-free, and it can provide
a trustworthy audit trail on any action done within the application. This fault-resistant
auditability is crucial for compliance, accountability, and traceability in the health-
care ecosystem and adds trust to a blockchain-based medical records management
system. Hierarchical Role-Based Access Control plays an important role in main-
taining security and privacy. This enables fine-grained control over access permis-
sions, allowing patients to provide custom access rights according to the roles and
responsibilities of people within the system. Using access control rules, the system
ensures that only authorized users gain access to medical records. This reduces the
risk of unauthorized access and maintains confidentiality of healthcare data. Hier-
archical Role Based Access Control proves to have better performance than the
Flat Role Based Access Control in terms of time, processing power, and memory
utilization. The powerful combination of RBAC, blockchain, and IPFS in this system
establishes a new standard for healthcare data management. This provides a robust
foundation for advancing patient care, medical research, and the overall efficiency of
healthcare delivery. In this digital age, the confidentiality, integrity, and availability
of sensitive medical data can be upheld with this innovative approach.
References
1. Sookhak M, Jabbarpour MR, Safa NS, Yu FR (2021) Blockchain and smart contract for access
control in healthcare: A survey, issues and challenges, and open issues. J Netw Comput Appl,
Elsevier 178:102950
2. Zhou L, Varadharajan V, Gopinath K (2016) A secure role-based cloud storage system for
encrypted patient-centric health records. Comput J 59(11):1593–1611
3. Saini A, Zhu Q, Singh N, Xiang Y, Gao L, Zhang Y (2020) A smart-contract-based access
control framework for cloud smart healthcare systems. IEEE Internet Things J 8(7):5914–5925
4. Guo H, Li W, Nejad M, Shen C-C (2019) Access control for electronic health records with
hybrid blockchain-edge architecture. In: 2019 IEEE International Conference on Blockchain
(Blockchain), IEEE
5. Yang Y, Shi R-H, Li K, Wu Z, Wang S (2022) Multiple access control scheme for ehrs combining
edge computing with smart contracts. Futur Gener Comput Syst 129:453–463
6. Ekblaw A, Azaria A, Halamka JD, Lippman A (2016) A Case Study for Blockchain in
Healthcare: “MedRec” prototype for electronic health records and medical research data. In:
Proceedings of IEEE Open & Big Data Conference, Chicago
7. Linn LA, Koo MB (2016) Blockchain for health data and its potential use in health IT and
health care related research. In: ONC/NIST Data Brief
8. Esmaeilzadeh P, Mirzaei T (2019) The potential of blockchain technology for health infor-
mation exchange: Experimental study from patients’ perspectives. J Med Internet Res
21:6
9. Dagher GG, Mohler J, Milojkovic M, Marella PB, Ancillotti E (2018) Ancile: Privacy-
preserving framework for access control and interoperability of electronic health records using
blockchain technology. Sustain Cities Soc, Elsevier 39:283–297
10. Mettler M (2016) Blockchain technology in healthcare: The revolution starts here. In: IEEE
18th International Conference on e-Health Networking, Applications and Services (Healthcom)
40 K. S. Shivesh et al.
11. Liang X, Zhao J, Shetty S, Liu J, Li D (2017) Integrating blockchain for data sharing and
collaboration in mobile healthcare applications. In: IEEE 28th Annual International Symposium
on Personal, Indoor, and Mobile Radio Communications (PIMRC)
12. Loh CM, Chuah CW (2021) Electronic medical record system using ethereum blockchain and
role-based access control. Appl Inf Technol Comput Sci, 2
13. Neelavathy Pari S, Rajashree S, Prakash A, Shanthosh RM (2022) Role based access control
framework for healthcare records using Hyperledger fabric. In: 3rd International Conference
on Issues and Challenges in Intelligent Computing Techniques (ICICT), pp 1–7
14. Han Y, Zhang Y, Vermund SH (2022) Blockchain Technology for Electronic Health Records.
Int J Environ Res Public Health
15. Ding Y, Feng L, Qin Y, Huang C (2022) Blockchain-based access control mechanism of feder-
ated data sharing system. In: 2020 IEEE Intl Conf on Parallel & Distributed Processing with
Applications, Big Data & Cloud Computing, Sustainable Computing & Communications,
Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), pp 277–284
16. Thwin TT, Vasupongayya S (2019) Blockchain-based access control model to preserve privacy
for personal health record systems. In: Security and Communication Networks
17. McFarlane C, Beer M, Brown J, Prendergast N (2017) Patientory: A healthcare peer-to-peer
EMR storage network v1.0. In: IEEE International Conference on Healthcare Informatics
(ICHI)
18. Jain M, Pandey D, Singh NP (2023) EHR: Patient electronic health records using blockchain
security framework. In: International Conference on Innovative Data Communication Tech-
nologies and Application (ICIDCA), Uttarakhand, India
19. Jayasinghe JGLA, Shiranthaka KGS, Kavith T, Jayasinghe MHDV, Abeywardena KY, Yapa
K (2022) Blockchain-based secure environment for electronic health records. In: 13th Interna-
tional Conference on Computing Communication and Networking Technologies (ICCCNT),
pp. 1–6
20. Mamun AA, Azam S, Gritti C (2022) Blockchain-based electronic health records management:
A comprehensive review and future research direction. IEEE Access 10:5768–5789
21. Kasthuri Bai M, Vijayalakshmi S (2021) Blockchain enabled services—Survey. In: Proceed-
ings of the First International Conference on Combinatorial and Optimization, ICCAP 2021,
December 7–8 2021
22. UmaMaheswari J, Vijayalakshmi S, Karpagam GR (2020) Blockchain technology and its
applications—An overview. Int J Res Appl Sci, Eng Technol (IJRASET) 8(7):228–232
Chapter 4
RFID Application Using Dual-Band
Monopole Antenna by Enhancing
the Bandwidth
1 Introduction
RFID technology has become indispensable for automatic identification and tracking
in various sectors like supply chain management, transportation, and health care.
The efficiency of RFID systems heavily relies on the communication between RFID
readers and antennas. However, a major challenge faced by RFID systems is limited
bandwidth, hindering their adaptability to different frequency environments. This
paper proposes a solution to this issue by introducing a dual-band monopole antenna.
Traditional RFID antennas often struggle to operate at multiple frequencies, limiting
their flexibility and performance. The dual-band monopole antenna design aims to
overcome these limitations by providing enhanced features and greater flexibility for
RFID systems, enabling compatibility with various RFID standards and frequency
distributions. By broadening the antenna’s resonance across multiple frequencies,
the system can accommodate a variety of standards, thus improving overall perfor-
mance. The flexibility offered by dual-band operation allows for increased data
throughput and reliability, making it suitable for deployment in diverse applica-
tions ranging from inventory management to access control. Additionally, the rela-
tively low cost and simplicity of dual-band monopole antennas make them attractive
options for widespread adoption, offering a cost-effective and reliable solution for
businesses and industries. In summary, the proposed dual-band monopole antenna
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 41
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
42 M. Perla et al.
holds promise in addressing the evolving needs of RFID applications and contributing
to the continuous development and optimization of RFID technology.
2 Literature Survey
The paper used UWB which is the ultra-wideband antenna where coplanar surge
protectors with ground CPWG were used to remove the non-reflecting phenomena.
The shape was changed from hexagonal to tape type to increase the frequency range
[1].
The paper used a monopole UWB antenna. The antenna is highly omnidirec-
tional in low frequency ranges and is used in multi frequency purposes; they have a
symmetrical L-shaped patch on both sides [2].
The paper used high gain dual-band monopole antenna along with the fabrication
of the AMC. The AMC is used to increase the gain of the antenna. The model has a
peak gain of 5dBi and 7.5dBi [3].
A CPW antenna model was used which had a quarter disk for radiating purpose
and a microstrip line of L shape, having the advantage of widening or increasing the
impedance [4].
The paper used a compact antenna, the SWB antenna. This antenna was tried and
tested by the software, namely HFSS ANSYS, using the intelligence radio application
and had an increased gain of 6 dB [5].
The paper used a small fed folded dipole antenna, having application of RFID
and was operating on two frequencies. It had great impedance and easy design, not
complex [6].
The paper presented a novel compact dual-band hybrid monopole-ASA. It shows
more radiation patterns like ASA. The resonance of the second antenna can be
modulated independently of the first [7].
The flexible CPW fed monopole antenna is shown in this paper which has a U slot
in the CPW to achieve the dual band. The substrate of this antenna is made flexible
and transparent for the conformal working [8].
The antenna design incorporates a rectangular array with multiple wrapped stubs
to enhance gain and impedance bandwidth. A linear structure positioned at the
monopole’s base, along with perpendicular metal plates, amplifies radiation from
the edges, ensuring a uniform omnidirectional radiation pattern. Additionally, the
antenna integrates a mobile frequency selective top unit (FSS) layer and a top hat
structure, enhancing directivity without compromising omnidirectional radiation.
This design offers a promising solution for wireless networks and sensor applications,
combining high gain with robust performance in various operating environments [9].
RFID is a widely used radio frequency identification technology for many appli-
cations such as logistics, security, and access control. The performance of the RFID
system is highly dependent on the antenna design. The antenna has a loss of less than
−10 dB in two frequency bands suitable for RFID applications. The measurement
results show that the proposed antenna agrees with the simulation results. The results
4 RFID Application Using Dual-Band Monopole Antenna by Enhancing … 43
of this study show that the proposed dual-band monopole antenna was a reasonable
choice for RFID applications [10].
3 Design Methodology
In the design methodology, there are several designs which are considered for
the better working of an antenna. Figure 1 is the 3D model of antenna, designed
on ANSYS Electronic Desktop software under HFSS design, for the dual-band
monopole antenna. This 3D model is a structure which is like an “F”-shaped antenna
on an FR4 Epoxy material. Further, on observing Fig. 2, it shows a meshed struc-
ture diagram, in other words it is showing the radiation boundary for the proposed
antenna. The results such as the terminal “S” parameters, radiation pattern, gain,
and various other parameters are observed within this radiation boundary. Figure 3
shows the 3D polar plot, i.e., the gain of the antenna in every possible direction in a
3D space. It is one of the important parameters of the antenna because the 3D polar
plot gives us a visualization of azimuth and elevation plane of antenna pattern. The
3D polar plot also helps to catch flaws or disturbances in the antenna pattern. This
type of 3D pattern is used to test new antenna designs or unknown antenna types.
Lastly, Fig. 4 shows the radiation pattern at 0 degree for better results because if this
radiation boundary or meshed structure is absent, then the signal is affected, and it
changes the desired results of the antenna which is not acceptable or considered.
Some Analytical Proposed Equations:
For Length:
λg
L1 = L2.4 GHz = (1)
4
λo
λg = (2)
εreff
where
εreff = Effective permittivity.
λo = wavelength corresponding to frequency
4 RFID Application Using Dual-Band Monopole Antenna by Enhancing … 45
1
νr + 1 νr − 1 12h − 2
νreff = 1+ (3)
2 2 W
where
εr = relative permittivity.
W = Monopole Antenna width.
h = substrate thickness
60 8h W
Z= ln ln + (4)
νreff W 4h
where
Z = 50 (port Impedance).
h = substrate thickness.
W = Width of the monopole
λg
L2 = L3.5GHz = (5)
4
εreff = Effective electrical permittivity of the di-electric substrate
Lw = 6h + W (6)
where
Ls = Substrate Length.
46 M. Perla et al.
Ws = Substrate Width.
4 Result Obtained
4.1 Software
Figure 5 shows the designed antenna—like F-shaped structure. This design shows us
the RFID antenna which will work on different microwave frequencies. This design
also shows the radiating boundary, made up of vacuum in which it will radiate the
signal having a size of λ/4. In the above results (Fig. 6), it depicts the return loss of
the designed antenna where the two frequency ranges are achieved: one maximum
value is achieved at 1.39 GHz and another value is achieved at 5.84 GHz. Figure 6 is
the S11 parameter which depicts the operating range of the RFID applications where
the range is specified accordingly with the reference attenuation value of below −
10 dB. Further, Fig. 7 shows the 3D polar plot of the designed antenna which plots
the negative values along the same axis. It is measured in dBi; this 3D polar plot
determines the closed loop stability. This polar plot contains the entire frequency
response characteristics in a single plot. Finally, Fig. 8 depicts the radiation pattern
of the antenna which is the direction pattern of the radio waves in which it emits.
This radiation pattern is the graphical representation of antenna radiation property
as a space function, describing how it will receive the energy.
4.2 Hardware
The above Figs. 9 and 10 show the front view and back view of the designed
antenna after fabrication process in which it is soldered with the SMA connec-
tors to the ground, and from the feeder Fig. 11 shows the S11 parameter graph on
the Virtual Network Analyzer (VNA) in which the markers M1 and M3 show the
RFID microwave frequencies that we have achieved first at 2.88 GHz and second at
6.11 GHz.
Table 1 shows the results that we have obtained at two different ranges. First is
from 1.39 GHz to 2.04 GHz where the gain obtained is of 8.23 dB and return loss is
− 10 dB.
48 M. Perla et al.
Second range is from 4.63 GHz to 7.3 GHz where the gain is 4.03 dB with the
loss of − 21.75 dB.
The above Table 2 shows the comparison of different papers that we have gone
through and have compared our work with it which is having a moderate dimension
than others along with increased range as well as compared to others.
4 RFID Application Using Dual-Band Monopole Antenna by Enhancing … 49
5 Conclusion
6 Future Scope
The future scope for RFID applications with dual-band monopole antennas entails
expanding frequency coverage, enhancing reliability, and integrating with emerging
technologies. This can be achieved through ongoing research to optimize antenna
design, improve signal processing algorithms, and ensure compatibility with evolving
standards. Additionally, advancements in fabrication techniques may lead to smaller
and more efficient antennas, while collaborations across industries can drive
innovation and adoption in various sectors.
4 RFID Application Using Dual-Band Monopole Antenna by Enhancing … 51
References
1. Park S, Jung K-Y (2022) Novel compact UWB planar monopole antenna using a ribbon-shaped
slot. IEEE Access 10:61951–61959
2. Wang Z, Wang M, Nie W (1929) A monopole UWB antenna for WIFI 7/Bluetooth and satellite
communication. Symmetry 2022:14. [Link]
3. Abdelghany MA, Fathy Abo Sree M, Desai A, Ibrahim AA (2022) Gain improvement of a
dual- band CPW monopole antenna for Sub-6 GHz 5G applications using AMC structures.
Electronics 11:2211. [Link]
4. Ma Z, Chen J, Li C, Jiang Y (2023) A monopole broadband circularly polarized antenna with
coupled disc and folded microstrip stub lines. J Wirel Commun Netw 2023:30. [Link]
10.1186/s13638-023-02238-3
5. Kumar P, Urooj S, Sahu BJR (2021) Design of compact super-wideband monopole antenna for
spectrum sensing applications. Research Gate. [Link]
6. Aggarwal I, Ranjan Tripathy M, Pandey S, Mittal A (2021) A dual-band monopole antenna for
RFID Application, IEEE
7. Haskou A, Pesin A, Le Naour J-Y, Louzir A (2019) Compact, Dual- Band, Hybrid Monopole-
ASA, Antenna. In: Annular Slot Antenna (ASA); wideband monopole; hybrid antenna; dual-
band antenna, IEEE 2019, pp 1123–1124
8. Saraswat K, Harish AR (2019) Flexible dual band dual polarized CPW fed monopole antenna
with discrete frequency reconfigurability, IET
9. Danuor P, Moon J-I, Jung Y-B (2023) High-gain printed monopole antenna with dual-band
characteristics using FSS-loading and top-hat structure. Journal. [Link]
598-023-37186-x
10. Kaur S, Aquib Jameel Khan M, Wali Khan F, Alam M, Singh M (2023) Designing and analysis
of dual-band monopole antenna for RFID applications. IJAEM pp 1029–1035 Volume 5
Chapter 5
The Determinants of Ethical Governance
Policies for Artificial Intelligence
in the Financial Sector of Moroccan
Companies
1 Introduction
The relatively short period of time during which artificial intelligence (AI) has
emerged as a major innovator means there is an ongoing revolution in many sectors,
including finance. As opposed to this, the financial services companies in Morocco are
quick at the adoption of AI, to speed up the operations and to make timely informed
decisions. In the Moroccan financial sphere, the applications of AI comprise business
process automation, advanced data analysis, and the provision of novel solutions to
the hardest problems. The role in which AI plays in the Moroccan financial system
has enhanced the smooth running of processes that save time and deliver accurate
outcomes. Also, it develops computer-adjusting systems into different circumstances.
The role of AI in financial data processing is progressively growing in evidence; the
AI systems become essential tools for sifting through financial data in the stream
and allowing to identify patterns, fix risk, and maximize portfolios. AI will have a
great involvement in the formation of highly personalized financial services to face
A. Hama
Economic and Social Sciences of Souissi, Université Mohammed V, Rabat, Morocco
e-mail: [Link]@[Link]
M. Mkik
Higher Institute of Nursing Professions and Health Techniques of Rabat, Rabat, Morocco
K. Khaddouj
National School of Arts and Trades of Rabat, Université Mohammed V, Rabat, Morocco
e-mail: [Link]@[Link]
A. Hebaz (B)
National Higher School of Electricity and Mechanics, Hassan II University, Casablanca, Morocco
e-mail: Hebaz.a@[Link]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 53
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
54 A. Hama et al.
such possible customer needs and to provide the user with a better experience at the
end. Furthermore, the quick implementation of CBDCs in the banking and financial
sectors will raise the same ethical and governance issues. They ought to implement
recovery procedures for mistakes identified and corrected, hold themselves account-
able for the outcome of their AI services, and establish paths for complainants in
the event of likely damage. On the other hand, privacy becomes the ultimate issue in
the implementation of the AI. This framework ensures adoption of good data protec-
tion rules within the companies, as their customers’ information is handled well and
according to security privacy standards. In this case, is the healthy and conscientious
practice of ethical governance within the organizational system the catalyst for a
culture of integrity, accountability, and transparency?
Beyond the evolution of equity and diversity in AI development as a moral obli-
gation, the issue is emphasized. Organizations are recommended not to use artificial
intelligence algorithms which are components of discrimination, produce discrim-
inatory outcomes, and promote diversity in ideation in the process of creation of
Artificial Intelligence (AI) systems to omit harming parts of the population. This
ethical-management framework of AI in the Moroccan financial sector is designed
to provide a set of guiding rules for ethical management of AI which covers emerging
ethical dilemmas within the main ethical principles. The integration of AI into the
financial sector of Morocco could be done by adopting these ethical rules as they not
only lead to responsible AI development but also embody the society’s commitment
to corporate social responsibility, stakeholder trust, and AI successful incorporation
in the financial sector.
2 Literature Review
The core of ethical governance lies in the combination of training and awareness
programs. The purpose of those programs is to enlighten the company employees
about the ethical principles, ethical standards, and actual consequences of business
ethics. Training is an effective tool that serves to teach employees the organiza-
tion’s code of ethics and highlight that ethical decisions should be made and every
employee is responsible for their actions [13]. It could include the topics of busi-
ness ethics, anti-corruption, human rights protection, and sustainability. Awareness,
however, is different in that it goes further than just gaining knowledge to create
deeper knowledge of ethical issues [2, 14, 15].
It fosters critical analysis of the potential ethical dilemmas and establishes
a climate where employees are encouraged to communicate their ethical issues,
actively use the grievance mechanisms, and take part in the continuous improvement
of the company’s ethical procedures. These training and awareness initiatives include
not only those who are in the company, but also go outside of employees, creating
new relationships between customers, suppliers and other partners. Ultimately, an
ethical corporate culture should be achieved, where ethics are the foundation of daily
decision-making and professional communications [16–18]. Training and aware-
ness are done regularly to develop and improve an ethical corporate culture that is
constantly evolving, and which is affected by social and technological changes as
well as the environment. They reinforce the organization’s reputation by showing its
dedication to ethical practices and building trust among stakeholders which are the
parties that have an impact on the accomplishment of the organization’s goals. In the
end, these initiatives will play a part in building an ethical governance framework
that is strong and provides direction to the company in realizing an economic and
ethical future [19, 20].
According to Table 1, it represents the explanatory variables and the variable to
be explained by determining the items of each variable with its symbol and authors.
This research is based on several hypotheses aimed at assessing the impact of
various factors on ethical governance within organizations. First, Hypothesis 1 posits
that Responsible Actors have a positive effect on ethical governance, suggesting
that engaged and responsible leaders contribute to high ethical standards. Simi-
larly, Hypothesis 2 argues that transparency in decision-making processes positively
influences ethical governance, emphasizing the importance of clearly and openly
disclosing decision-making mechanisms within the organization. This third hypoth-
esis indicates that Ethics and Corrective Mechanisms positively influence ethical
5 The Determinants of Ethical Governance Policies for Artificial … 57
(continued)
58 A. Hama et al.
Table 1 (continued)
Variables Variable contents Items of the variable Symbol Authors
3. Understanding the FS3 [2, 15]
ethical
implications
Ethical Ethical governance means the Principles of Ethical GE1 [10, 14]
governance creation of ethics principles Governance
and practices that will be the
basis for decision-making and
actions of the organization.
They will ensure
accountability, transparency,
and respect to the rights and
values. All of this contributes
to a sustainable and ethical
corporate culture
3 Methodology of Research
In order to answer the central problem of our research, we choose a basic sample
(financial managers) that resides in financial companies, namely banks and insurance
companies (7 conventional banks and 5 insurance companies) with a total number
of 50 responses. The applied research methodology is quantitative (confirmatory) in
nature, with a multiple regression method that provides relevant results in order to
accept or reject the research hypotheses presented at the end of the literature review.
The rigor of this methodology provides the opportunity to accurately assess research
hypotheses, which contributes to a deeper understanding of factors related to ethical
governance in the financial context, thus providing valuable insights for decision-
making and the implementation of ethical practices within these organizations.
Table 3 presents the descriptive statistics using the kurtosis, the Shapiro–Wilk p,
the asymmetry coefficient, and the deviation standards.
As far as normality is concerned, the Shapiro–Wilk statistic (W) is 0.81 with a
p-value of 0.004. The value of W is not close to 1, and the low value of p suggests
a rejection of the normality hypothesis. Based on the descriptive statistics of this
study, the table below provides detailed statistical information on different variables,
including standard deviation, skewness coefficient, kurtosis, Shapiro–Wilk statistic
(W), and the corresponding p-value. To illustrate, let’s take the example of the ED1
variable. Its standard deviation is 2.31, indicating a moderate dispersion of data
around the mean. The skewness coefficient of − 0.92 suggests a slight leftward tilt
in the distribution of the data, and the kurtosis of 1.48 signals some concentration
around the mean.
The table above uses the multiple regression method for estimating adjustment
elements of a statistical model. The correlation coefficient (R) is 0.762, which means
that there is a positive association between the variables in the model. The coefficient
of determination (R2 ) with the value of 0.580 implies that 58% of the variance of
the dependent variable is explained by the model, which is considered significant
but with room for further improvement. The Akaike Information Criterion (AIC) is
553, an indicator of model quality adjusted for complexity. A lower AIC is preferred,
indicating better model fit. The Bayesian Information Criterion (BIC) of 574 is also
taken into account for the assessment of model quality, and it is generally used to
compare different models. The square root of the root mean squared error (RMSE)
is 0.666, representing the mean deviation between the values predicted by the model
and the observed values. The lower the RMSE, the better the predictive accuracy of
the model. The F-statistic of 90.6 with 4 degrees of freedom for the numerator and a
p-value less than 0.001 indicates that the model has a significant value to explain the
variance in the dependent variable. In summary, this model has a positive correla-
tion, a moderate explanatory ability with an R2 of 0.580, a good fit according to the
AIC, the BIC, and a high statistical significance according to the F test. However,
improvements could be considered to enhance predictive accuracy.
Table 4 describes the good-of-fit indicators that appear in the R, R-squared, AIC,
BIC, and RMSE based on the model’s probability of significance (it is perfectly
significant).
The table shows the results of an ANOVA omnibus test for different variables. The
columns include the sum of squares, mean squares, the F statistic, and the associated
p-value. The test assesses whether the group means of each variable are statistically
different. Here’s how the results are interpreted: The sum of squares represents the
total change in the data, and mean squares are calculated by dividing the sum of
squares by the number of degrees of freedom. The higher the sum of the squares
relative to the mean squares, the more variability there is between groups. The F-
statistic is used to compare between-group variability with within-group variability.
High F values indicate a significant difference between groups. In this case, all
variables (ED1, ED2, ED3, etc.) have high F values, suggesting significant differences
between groups. The associated p-values indicate the probability of obtaining results
as extreme as those observed if the null hypothesis were true. The very low p-values
(< 0.001) in most cases suggest a clear rejection of the null hypothesis, indicating
that the means of the groups are indeed different. We can thus state that the results
of this ANOVA omnibus test suggest that the group means for each variable are
statistically different, which reinforces the validity of the analysis of variance and
highlights significant variations between groups.
Thus, Table 5 determines the second step of the omnibus ANOVA test, which is
significant for all items except for ED1, ED3, and ECIE2.
The table presents the results of the GE1 model with estimates, standard errors,
t-values, p-values, and standard estimates for each predictor. The intercept has an
estimate of 0.495 with a standard error of 0.2079, a t of 2.38, and a p-value of 0.018.
The Q1O2 predictor has an estimate of 0.241 with a standard error of 0.0570, a
t of 4.22, and a p-value of less than 0.001. Similarly, the other predictors (Q1O3,
Q1Q4, Q1O5, ED1, ED2, ED3, TPD1, TPD2, TPD3, MRC1, MRC2, MRC3, ECIE1,
ECIE2, ECIE3, FS1, FS2, and FS3) have associated estimates, standard errors, t, and
p-values. In general, the predictors appear significant with p-values less than 0.05,
indicating a statistically significant relationship with the dependent variable of the
GE1 model. Standard estimates provide an indication of the relative strength of each
predictor in the model. Consequently, this table is of great importance as it supplies
data that can be used to analyze the effect of each predictor in the GE1 regression
model.
Table 6 presents the regression coefficients with respect to the variable to be
explained. The latter are mostly significant except for a few items.
The autocorrelation hypothesis testing table enables one to decipher on the
stability of the model. The Hypothesis H1, where autocorrelation is assessed, the
value of 2.50 and the p-value of 0.400 indicate low autocorrelation; therefore, this
hypothesis should be accepted. This result suggests that the successive measurements
of the model are weakly related to one another. The second hypothesis, H2, according
to the Durbin-Watson (DW) test with a value of 1.80 and a p-value of 0.740, does
not show presence of significant positive autocorrelation meaning its acceptance is
validated. Hypothesis 3, with a DW value of -0.020 and a p-value of 0.850, further
proves that the time-series data does not exhibit a negative autocorrelation and so it
should be accepted. Based on Hypothesis 4, the DW value of 2.20, and the p-value
of 0.500, the consistency of the rejection of the existence of a significant positive
autocorrelation is confirmed. Lastly, Hypothesis 5 that has DW value as -0.045 and p-
value as 0.680 clearly confirms the absence of negative autocorrelation that validates
its acceptance.
5 Conclusion
Ethical governance is far more than just following the law; it aims to build an organiza-
tional framework that incorporates a high ethical code of practice into its culture. The
cause is to build a culture where accountability, transparency, and human rights are
the company’s main mission. Highlighting the importance of adherence to stringent
ethical principles, the organization aims at building an entity that is very conscious of
its societal, environmental, and moral effects [23]. This implementation of the ethical
governance is based on direct tools to make its theories come true. Introducing trans-
parency in decision-making is being viewed as a critical foundation, where open and
complete communication regarding relevant decisions, the criteria, data, and debates
drives stakeholder trust. Redress and correction mechanisms, formal and informal,
create avenues through which ethical issues can be brought out, cases reported and
redress sought, and this further enhances accountability and transparency in the orga-
nization. Another important feature is the continuous evaluation of ethical impact,
which drives the organization to frequently check the practices, policies, and deci-
sions on a constant basis. This appraisal is deeper than just simple compliance with
the rules, involving a lengthy consideration of broader ethical implications of all
the steps undertaken. These mechanisms were corroborated by a quantitative study,
thus adding the credibility of ethical governance in financial context. The training
and awareness are the most important factors in this process; the members of the
company are taught ethical principles, conduct standards, and ethical governance
practice. Such approaches, in turn, cultivate such an ethical corporate culture that is
flexible, responsive to social, technological and environmental changes. The results
of the study indicate that the ethical governance model remains stable in the absence
64 A. Hama et al.
References
1. Manimuthu A, Venkatesh VG, Shi Y, Sreedharan VR, Koh SL (2022) Design and development
of automobile assembly models using federated artificial intelligence with smart contracts. Int
J Prod Res 60(1):111–135
2. Naz F, Karim S, Houcine A, Naeem MA (2022) Fintech growth during COVID-19 in the
MENA region: current challenges and future prospects. Electron Commer Res 1–22
3. Allioui H, Mourdi Y (2023) Unleashing the potential of AI: Investigating cutting-edge
technologies that are transforming businesses. Int J Comput Eng Data Sci (IJCEDS) 3(2):1–12
4. Ben Youssef A, Dahmani M (2023) Examining the drivers of E-commerce adoption by
Moroccan firms: A multi-model analysis. Information 14(7):378
5. Owens E, Sheehan B, Mullins M, Cunneen M, Ressel J, Castignani G (2022) Explainable
artificial intelligence (xai) in insurance. Risks 10(12):230
6. el Kadiri K, Bentoumia S, McHich H, Marouane MKIK (2023) Les concepts durables et
technologiques menant à l’attractivité des investissements étrangers au Maroc: cas du secteur
touristiques. Int J Account, Financ, Audit, Manag Econ 4(3–2):805–824
7. Khan HU, Malik MZ, Alomari MKB, Khan S, Al-Maadid AAS, Hassan MK, Khan K (2022)
Transforming the capabilities of artificial intelligence in GCC financial sector: A systematic
literature review. Wirel Commun Mob Comput
8. Mkik M, Mkik S (2023) Acceptability aspects of artificial intelligence in Morocco: Managerial
and theoretical contributions. In: The International Conference on Digital Technologies and
Applications (pp 65–74). Cham: Springer Nature Switzerland
9. Marouane M, Salwa M, Hebaz A (2023) The acceptance of artificial intelligence in the commer-
cial use of crypto-currency and Blockchain systems. In: The International Conference on Infor-
mation, Communication and Computing Technology (pp 163–177). Singapore: Springer Nature
Singapore
10. Nuseir MT, Al Kurdi BH, Alshurideh MT, Alzoubi HM (2021) Gender discrimination at work-
place: Do artificial intelligence (AI) and machine learning (ML) have opinions about it. In: The
international conference on artificial intelligence and computer vision (pp. 301–316). Cham:
Springer International Publishing
11. Secinaro S, Calandra D, Secinaro A, Muthurangu V, Biancone P (2021) The role of artificial
intelligence in healthcare: a structured literature review. BMC Med Inform Decis Mak 21:1–23
12. Karim M, Ktit J, Soussi NO, Sobhi K (2021) Potential areas for investment in Morocco. An
analysis using inputs-outputs model. Adv Manag Appl Econ 11(1):47–72
13. Enholm IM, Papagiannidis E, Mikalef P, Krogstie J (2022) Artificial intelligence and business
value: A literature review. Inf Syst Front 24(5):1709–1734
14. Mohsen SE, Hamdan A, Shoaib HM (2024) Digital transformation and integration of artificial
intelligence in financial institutions. J Financ Report Account
15. Hicham N, Nassera H, Karim S (2023) Strategic framework for leveraging artificial intelligence
in future marketing decision-making. J Intell Manag Decis 2(3):139–150
16. Kiemde SMA, Kora AD (2022) Towards an ethics of AI in Africa: rule of education. AI and
Ethics, 1–6
5 The Determinants of Ethical Governance Policies for Artificial … 65
17. Albastaki YA, Razzaque A, Sarea AM (Eds.) (2020) Innovative strategies for implementing
FinTech in banking. IGI Global
18. Singh C, Dash MK, Sahu R, Kumar A (2023) Artificial intelligence in customer retention: a
bibliometric analysis and future research framework. Kybernetes
19. Gwagwa A, Kraemer-Mbula E, Rizk N, Rutenberg I, De Beer J (2020) Artificial Intelligence
(AI) deployments in Africa: benefits, challenges and policy dimensions. Afr J Inf Commun
26:1–28
20. Yuen MK, Ngo T, Le TD, Ho TH (2022) The environment, social and governance (ESG)
activities and profitability under COVID-19: evidence from the global banking sector. J Econ
Dev 24(4):345–364
21. Nejjari Z, Aamoum H (2020) The role of ethics, trust, and shared values in the creation
of loyalty: Empirical evidence from the Moroccan University™. Bus, Manag Econ Eng
18(1):106–126
22. Gwagwa A, Kachidza P, Siminyu K, Smith M (2021) Responsible artificial intelligence in
Sub-Saharan Africa: landscape and general state of play
23. Al-Sartawi AMM (Ed.) (2022) Artificial intelligence for sustainable finance and sustainable
technology. In: Proceedings of ICGER 2021 (Vol. 423). Springer Nature
24. Bhardwaj B, Sharma D, Dhiman MC (Eds.) (2023) AI and emotional intelligence for modern
business management. IGI Global
Chapter 6
An Ensemble Machine Learning-Based
Approach Toward Accuracy in Bitcoin
Price Prediction
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 67
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
68 S. Puja et al.
2 Literature Survey
Qing et al. 2022 [21] integrate deep learning with autoencoder algorithms for
effective dimensionality reduction [22]. However, the challenge lies in finding
algorithms that preserve fluctuating features while improving signal-to-noise ratio.
The study by Lyu 2022 [23] focuses on short-term trading and compares 10
algorithms for time series forecasting, with Gradient Boosting showing promising
results. Limitations include the need for larger datasets and attributes to enhance
model performance.
The effort by Encean and Zinca, 2022[24], toward Cryptocurrency Price Predic-
tion uses a Gated Recurrent Unit (GRU) and long short-term memory (LSTM)
networks that exhibit effectiveness in predicting cryptocurrency evolution [25, 26].
Yet, the study’s reliance solely on historical price and social media sentiment data
limits its predictive scope. Long short-term memory (LSTM) networks achieve higher
prediction accuracy by considering external factors influencing cryptocurrency fluc-
tuations. However, the model’s efficiency is limited due to using data from a specific
time frame.
3 Methodology
After careful and rigorous research, we determined that random forest and gradient
boosting are ideal choices for forecasting Bitcoin prices in the volatile cryptocurrency
market due to their excellent ability to handle large datasets and their predictive accu-
racy. Therefore, we began developing separate prediction algorithms. The dataset we
utilized consisted of 7 columns and 1449 rows. It encompassed data from January
11, 2019, to January 11, 2023, with each row representing the highest value, lowest
value, open price, close price, volume, date, and adjacent close price for the respec-
tive day. The input features for training included date, open, high, low, adjacent close,
and volume along with the newly engineered features are discussed later in the paper.
Random Forest and Gradient Boosting are strong ensemble learning strategies
broadly utilized in machine learning to improve predictive performance. Random
Forest is a collection of various decision trees that are developed during training, and
predictions are made by averaging or voting over the individual tree forecasts. Every
tree is based on an arbitrary subset of the preparation data and utilizes an arbitrary
subset of elements at each split. This randomness decreases overfitting and floods
the model’s strength. Random Forest is known for its adaptability, dealing with both
classification and regression tasks effectively.
70 S. Puja et al.
Then again, Gradient Boosting is a boosting method that forms trees consecutively,
with each tree rectifying the blunders of the past one. In this technique, the model is
prepared stage by stage, and each new tree centers around limiting the blunders of
the consolidated ensemble. Gradient Boosting is especially successful in taking care
of perplexing connections within the data and succeeds in predictive tasks where
high accuracy is essential. Common implementations include “Adaptive Boosting
(AdaBoost)”, as well as the more refined Extreme Gradient Boosting (XGBoost)
and Light Gradient Boosting Machine (LightGBM).
While both Random Forest and Gradient Boosting aim to improve the predic-
tive accuracy through ensembling, they vary in their methodologies. Random Forest
forms autonomous trees equally, while Gradient Boosting assembles trees in a steady
progression to address mistakes. The decision between them frequently depends on
the distinct attributes of the dataset and the suitable balance between interpretability
and predictive power.
To understand the relationship between Bitcoin price and the engineered features,
various visualizations were used. These visualizations included line plots showing
the daily price range, change, and range change ratio over time. They are useful for
understanding historical price movements and identifying patterns such as trends,
seasonality, and volatility. Figure 1 shows the volatile surges that make the market
very unstable.
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 71
Candlestick charts are frequently employed in the financial market for technical
analysis. It offers a graphic depiction of price changes over a certain time frame.
As Fig. 2 illustrates, these charts help spot pricing patterns and trends as well as
forecast future price changes. This was primarily used to assess the varied surges
and depreciations that occurred during the year.
The correlation heatmap as shown in Fig. 3 was created to analyze the relation-
ships between Bitcoin price and the engineered features. The heatmap visualizes the
correlation between different attributes in the dataset. This can be very significant
for feature selection in modeling and evaluating the impact of different attributes on
the target variable. Open, high, low, and close were the attributes with the highest
correlation among themselves.
The relationship between the price and trading volume of Bitcoin is visualized in
Fig. 4 by plotting the price and volume trend of Bitcoin. It helps figure out possible
trends and patterns as well as comprehend how price changes and trade activity are
related.
Figure 5’s box plot illustrates how the price of Bitcoin varies by the day in a
week. It helps comprehend how prices fluctuate on different days and spot any trends
or unusual price movements on a particular day of the week. The 50-day moving
average price of Bitcoin superimposed upon the actual price is visualized in Fig. 6.
It highlights and helps to smooth out short-term pricing data swings and identify
long-term trends.
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 73
Using Random Forest for Bitcoin price prediction. Random Forest combines
several decision trees to provide predictions of Bitcoin price. The Random Forest
algorithm was used in the current work for regression analysis to predict the next
day’s “Close” price of Bitcoin as seen in Fig. 7. Upon deploying the model, it was
found that the mean squared error obtained was approximately 6.38. This metric
measures the average of the squares of the errors or deviations, which provides a
measure of the quality of the model’s predictions. In general, the Random Forest
Regressor showed solid predictive performance, as shown by the low mean squared
error and low R-squared score of 0.87. These outcomes propose that the model
performed satisfactorily in anticipating the following day’s “Close” cost of Bitcoin
given the chosen features.
Using Gradient Boosting for Bitcoin price prediction. One more way to deal
with foreseeing the cost of Bitcoin is to utilize Gradient Boosting, which is also an
ensemble machine learning algorithm. Gradient Boosting is known for its capacity to
deliver profoundly precise prediction which is accomplished by utilizing the qualities
of weak models and iteratively refining them. Gradient Boosting can frequently
outflank other machine learning algorithms in terms of precision in prediction.
It is powerful at detecting non-linear connections and complex examples in the
data. This makes it reasonable to demonstrate the unpredictable elements of monetary
business sectors, for example, the value developments of Bitcoin, which frequently
show non-direct and complex ways.
After considering the feature-engineered variables, the mean square error (MSE)
was observed as 2.43. This demonstrates that the Gradient Boosting model displayed
in Fig. 8 achieved a value extremely close to the actual “Close” cost of Bitcoin for
the following day, further indicating the viability of the model.
4 Ensemble of Models
While analyzing the results of both models, a disparity was seen between them.
Approximately 45% of the close price values anticipated by Random Forest were
lower than the actual close price, while 56% of the values predicted by Gradient
Boosting were higher than the actual closing price. If a lower value was predicted by
Random Forest, a higher value was predicted by Gradient Boosting, and vice versa.
We realized that the best approach was to combine the predictions from both the
76 S. Puja et al.
Random Forest and Gradient Boosting models to obtain a more precise forecast of
the cost of Bitcoin, which represents our novelty. Moving forward with the ensemble,
we employed four different types of ensemble: a stacking model, a bagging model,
a blending model, and a voting model.
The stacking model is an ensemble learning strategy that consolidates various base
models to work on predictive performance. The stacking model involves training
multiple distinct base models on the dataset, such as Gradient Boosting Regressor
and Random Forest Regressor, for Bitcoin price prediction. Rather than directly
utilizing their forecasts, the stacking model finds a way to join these predictions by
utilizing a meta-learner (e.g., Extreme Gradient Boosting (XGBoost) Regressor) to
make the final prediction.
We used stacking because it aims to capture various aspects of the underlying
data and takes advantage of the variety of predictions made by various base models.
By joining the qualities of numerous models, stacking might overcome the limits of
individual models and accomplish better prediction accuracy. Also, stacking provides
greater adaptability in identifying complex connections within the Bitcoin cost data,
prompting more accurate predictions. Here are the explanations for the parameters
used.
Base Models: The variety of predictions and their performance are determined by the
selection of base models. We have used Gradient Boosting Regressor and Random
Forest Regressor as base models.
Meta-Learner: The meta-learner, such as the Extreme Gradient Boosting (XGBoost)
Regressor, oversees learning how to effectively combine base model predictions.
Extreme Gradient Boosting (XGBoost) was used as a meta-learner for the project.
Stacking Architecture: The stacking architecture determines how predictions from
base models are integrated to train the meta-learner. This involves deciding whether
to use the already defined probabilities or other modified representations as input
features for the meta-learner. We independently trained the two base models on the
training data. After training the base models, predictions were obtained on a holdout
validation set using both base models. Subsequently, the predictions from both base
models were stacked horizontally to create a new feature matrix. Each row of the
matrix contains the predictions made by both base models for a specific sample
in the validation set. The meta-learner, an Extreme Gradient Boosting (XGBoost)
Regressor, was then trained on the stacked predictions along with the corresponding
actual values. The meta-learner learns to effectively combine the predictions from
the base models to determine the final prediction. Finally, the meta-learner predicts
the target variable (Bitcoin price) using the stacked predictions obtained from the
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 77
test set. The predictions made by the meta-learner represent the final prediction of
the stacking model, as depicted in Fig. 9.
Strategy for Cross-Validation: Stacking generally includes a cross-validation
methodology to prevent overfitting.
To provide a more detailed understanding Fig. 10 and to depict the differences
between the actual price and the predicted price, Fig. 11 are used.
Fig. 11 Stacking the difference between the actual and predicted prices
effect of individual trees’ overfitting, and bagging reduces the variance of individual
Gradient Boosting Regressor models, therefore increasing their robustness. Figure 12
depicts the basic workings of a bagging model.
To provide a more detailed understanding Fig. 13 and to depict the differences
between the actual price and the predicted price, Fig. 14 are used.
Fig. 12 Bagging
Fig. 14 Bagging with difference between the actual and predicted prices
Voting in the context of ensemble learning is calculating the arithmetic mean of the
predictions given by several different individual models. To get the final prediction,
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 81
this straightforward ensemble technique averages the predictions of each base model.
Furthermore, we tried altering the weights given to each of these model’s predictions
and found out 0.6 and 0.4 for Gradient Boosting and Random Forest respectively
was the best combination. To provide a more detailed understanding Fig. 17 and to
depict the differences between the results of the actual price and the predicted price,
Fig. 18 are used.
5 Result Analysis
Fig. 16 Blending the difference between the actual and predicted prices
S. Puja et al.
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 83
Fig. 18 Voting with difference between the actual and predicted prices
compared to the Random Forest model and by 16% compared to Gradient Boosting
on average. These computations further show that Voting (0.6,0.4) performs better
than the other ensemble methods with the least maximum error, mean squared error,
and highest R-square value. Hence, an ensemble approach by voting to predict the
subsequent day’s Bitcoin price is the best approach.
84 S. Puja et al.
6 Conclusion
References
1. Nakamoto S (2008) Bitcoin: A peer-to-peer electronic cash system. Decentralized Bus Rev, pp
21260
2. Tomov YK (2019) Bitcoin: Evolution of Blockchain technology. In: 2019 IEEE XXVIII Inter-
national Scientific Conference Electronics (ET), Sozopol, Bulgaria, pp 1–4. [Link]
1109/ET.2019.8878322
3. Ma J, Zhu Y, Xu J, Li Y, Zhang Y, Wang J (2022) Research on Bitcoin price prediction based
on Support Vector Regression and its variant combination model. In: 2022 18th International
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 85
21. Qing Y, Sun J, Kong Y, Lin J (2022) Fundamental multi-factor deep- learning strategy for
cryptocurrency trading. In: 2022 IEEE 20th International Conference on Industrial Informatics
(INDIN), Perth, Australia, pp 674–680. [Link]
22. Nan L, Tao D (2018) Bitcoin mixing detection using deep autoencoder. In: 2018 IEEE Third
International Conference on Data Science in Cyberspace (DSC), Guangzhou, China, pp 280–
287. [Link]
23. Lyu H (2022) Cryptocurrency price forecasting: A comparative study of machine learning
model in short-term trading. In: 2022 Asia Conference on Algorithms, Computing and Machine
Learning (CACML), Hangzhou, China, pp 280–288. [Link]
2022.00054
24. Encean A-A, Zinca D (2022) Cryptocurrency price prediction using LSTM and GRU
networks. In: 2022 International Symposium on Electronics and Telecommunications (ISETC),
Timisoara, Romania, pp 1–4. [Link]
25. Khan MS, Bazai SU, Ghafoor MI, Marjan S, Ameen M, Shah SAA (2023) Forecasting cryp-
tocurrency prices using a gated recurrent unit neural network. In: 2023 International Conference
on Energy, Power, Environment, Control, and Computing (ICEPECC), Gujrat, Pakistan, pp 1–6.
[Link]
26. Li T (2022) Prediction of bitcoin price based on LSTM. In: 2022 International Conference
on Machine Learning and Intelligent Systems Engineering (MLISE), Guangzhou, China, pp
19–23. [Link]
Chapter 7
Examining Transfer Learning Models
to Classify Brain Tumors from MRI
Images: A Comparative Analysis
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 87
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
88 U. Sachdev et al.
2. The work demonstrates the efficacy of these models in achieving high accuracy
in tumor classification, even with a relatively limited dataset of 5712 images.
3. It offers a comparative study of the models, highlighting the strengths and
weaknesses of each in terms of accuracy, precision, recall, and F1-score.
4. It suggests the application of these models in early tumor detection, which could
significantly improve patient outcomes.
The remainder of the paper is organized as follows: Sect. 2 provides a review of
the literature, while Sect. 3 outlines the research methodology, including details on
the dataset and its preprocessing, the evaluation metrics used for assessing model
performance, the experimental setup, and a concise overview of the transfer learning
methods employed. Section 4 discusses the obtained results, and Sect. 5 concludes.
2 Literature Review
Convolutional neural networks are mainly used for image classification tasks because
they are based on the biological neurons in the human visual cortex. Therefore, they
learn the features responsible for accurate classification of images [4].
Abdalla and Esmail [5] proposed a method for brain tumor detection using arti-
ficial neural networks (ANNs). This method uses ANNs to classify normal and
tumorous MRI images of the brain. The method uses statistical feature analysis
to extract features from the images and detect tumors. Another automatic brain
tumor detection method was proposed by Pereira et al. [6] using CNNs with 3 ×
3 kernels. This architecture was used to classify brain MRI images from the BRaT
dataset. Because the dataset was very large (almost 7 GB), training the architecture
on multi-core GPU systems took 30 h.
Abd-Ellah et al. [7] proposed a two-step multi-model automatic brain tumor diag-
nosis where CNNs were used to classify MRI images into normal and abnormal
images. This method uses CNN models Alex-Net, VGG16, and VGG19, and the
method is tested on images from the RIDER neuro MRI database. This database
contains 349 images, including 240 healthy and 109 unhealthy images. In this study,
the CNN was trained with transfer learning because the dataset used was insufficient
to train a model from scratch.
Noreen et al. [8] outline a methodology for extracting and merging multi-level
features to facilitate early detection of brain tumors. The efficacy of this approach
is evaluated using two pre-trained deep learning models, namely Inception-v3 and
DenseNet201. Two distinct scenarios for brain tumor detection and classification
are explored utilizing these models. Khan et al. [9] propose an automated brain
tumor detection model employing pre-trained VGG16 and Inception-V3, utilizing
a dataset comprising 253 images, encompassing 155 tumor images and 98 healthy
images. However, the dataset size proves insufficient for fine-tuning the CNNs, and
the test dataset lacks adequacy for accurately assessing the model’s performance.
Amin et al. [10] introduce a brain tumor detection model utilizing VGG16 with the
90 U. Sachdev et al.
BRaTs dataset, achieving 84% accuracy through transfer learning, and fine-tuning
over 50 epochs.
Srivastava et al. [11] introduce a dropout technique aimed at mitigating overfitting
in neural networks by randomly deactivating units and their connections. Dvorák et al.
[12] opt for convolutional neural networks as the learning algorithm due to their
aptitude in handling feature correlation. Their technique is validated on the public
BRATS2014 dataset, yielding state-of-the-art results for brain tumor segmentation
tasks with 254 multimodal volumes, each processed in only 13 s.
Irsheidat et al. [13] construct a model based on Artificial Convolutional Neural
Networks, which utilizes mathematical formulas and matrix operations to analyze
magnetic resonance images, predicting the likelihood of brain tumor presence.
Trained on MRI images of 155 normal and 98 tumorous brains, the model demon-
strates its predictive capability based on a collection of 253 magnetic resonance
images.
Sravya et al. [14] investigated brain tumor detection and presented some important
challenges and techniques. An automated brain tumor detection system was proposed
and studied by Dipu et al. [15] using the YOLO model and the deep learning library
FastAi with the BRATS 2018 dataset, which contained 1,992 MRI scans of the
brain. The authors achieved 85.95% accuracy for YOLO and 95.78% for the FastAi
classification model. A brain tumor detection application was proposed by Gaikwad
et al. [16] to classify MRI images as tumorous and non-tumorous using the VGG16
model. The authors used the Kaggle dataset for training and showed an improvement
in accuracy.
Monirul et al. [17] utilized four transfer learning architectures—InceptionV3,
VGG19, DenseNet121, and MobileNet—with a dataset sourced from three bench-
mark databases, figshare, SARTAJ, and Br35H, to validate the models. These
databases encompass four classes: pituitary, no tumor, meningioma, and glioma.
Image augmentation techniques were employed to ensure class balance. Exper-
imental findings indicate that MobileNet surpasses other methods, achieving an
accuracy of 99.60%.
Kasi et al. [18] examined various deep transfer learning methods, including
InceptionResNet-V2, ResNet50, MobileNet-V2, and VGG16, to determine the
optimal model for detecting brain tumors from a publicly available MRI dataset.
Additionally, CLAHE was applied as an image enhancement technique to enhance
the quality of the image dataset before serving as input for the models. Consequently,
the proposed approach achieved a prediction accuracy of up to 100%.
Anirudh et al. [19] explore the application of transfer learning in detecting brain
tumors using publicly available MRI images of the brain. Initially, they train a deep
learning model on a large dataset of natural images (ImageNet), a task facilitated by
Keras. They then fine-tune this model using a smaller dataset containing brain MRI
images. A comparison between this approach and traditional deep learning methods
indicates that transfer learning succeeds at comparable performance with a signifi-
cantly reduced dataset size. This research underscores the effectiveness of transfer
learning in training deep learning models for brain tumor detection, particularly in
7 Examining Transfer Learning Models to Classify Brain Tumors … 91
scenarios where data availability is limited. Such an approach shows promise for
improving brain tumor detection in clinical settings.
Vinod et al. [20] assessed three fundamental models in computer vision: AlexNet,
VGG16, and ResNet-50. The outstanding performance of both the VGG16 and
ResNet-50 models prompted their integration into a novel hybrid VGG16-ResNet-
50 model. This model was then employed on the dataset, resulting in remarkable
accuracy, sensitivity, specificity, and F1 score. Comparative analysis with alterna-
tive models indicates that the proposed framework demonstrates a high degree of
reliability in effectively identifying various cerebral neoplasms.
Reviewing literature reveals the promising prospects of Convolutional Neural
Networks (CNNs) in accurate image classification tasks. CNNs’ adeptness in feature
extraction using kernels and feature maps contributes to dimension reduction in
inputs, thereby enhancing model efficiency within time and memory constraints.
Insufficient training data often leads to diminished accuracy. In such scenarios,
transfer learning emerges as a potent technique, requiring fewer computations and
parameters compared to training from scratch, thereby enabling higher accuracy even
with limited data. Additionally, fine-tuning stands as a valuable strategy to enhance
model accuracy, particularly when dealing with classification problems differing
from the source of transfer learning.
3 Research Methodology
This section outlines the comprehensive process flow employed for assessing the
transfer learning models covered in this research. In addition, it also presents a brief
description of the dataset used for model training and the evaluation metrics adopted
to assess the model performance.
The investigations conducted in this study were performed using a publicly available
brain tumor dataset acquired from Kaggle. This research studies the performance of
four transfer learning models to classify three different types of brain tumors namely
Glioma, Meningioma and Pituitary, and normal cases, i.e., no tumor.
The distribution of data within each class of training and testing datasets is
illustrated in Fig. 1.
To standardize the dataset, images of varying dimensions were uniformly resized
to (200,200). To address the common issue of noise in MRI images, cropping tech-
niques were utilized to refine the images, focusing on specific areas for training
purposes. Implementing the proposed method necessitated resizing the images to
200 × 200 dimensions as a fundamental step. Furthermore, to augment the dataset
and increase its diversity, random transformations such as rotation and flipping were
92 U. Sachdev et al.
NO TUMOUR 1595
405
PITUITARY 1457
300
MENINGIOMA 1339
306
GLIOMA 1321
300
0 500 1000 1500 2000
Training Data Testing Data(unseen)
The assessment of the proposed brain tumor detection methods hinges on four pivotal
performance metrics: True Positives (TP), True Negatives (TN), False Positives
(FP), and False Negatives (FN). These metrics form the foundation for gauging
the effectiveness of the experimental method.
1. Accuracy (ACC) serves as a metric indicating the model’s effectiveness in accu-
rately identifying brain tumors. It is calculated as the ratio of correctly identified
instances to the total number of instances, as depicted in Eq. (1):
TP + TN
Accuracy = (1)
TP + TN + FP + FN
2. Precision indicates the correctness of positive predictions, and signifies the
percentage of correctly identified positive instances among those projected as
positive. This measure is determined by the formula represented in Eq. (2):
TP
Precision = (2)
TP + FP
3. Recall or sensitivity measures the proportion of accurately classified instances
within each classification category. This metric is calculated according to the
formula depicted in Eq. (3):
TP
Recall = (3)
TP + FN
7 Examining Transfer Learning Models to Classify Brain Tumors … 93
4. F1-score is calculated as the harmonic mean of precision and recall. Its calculation
is based on Eq. (4) as provided below
2 × TP
F1-score = (4)
2 × TP + (FP + FN)
This section presents a brief description of the four transfer learning models employed
in this study.
3.4.1 InceptionResNet-V2
followed by a dropout layer. The final dense layer has 4 units, corresponding to the
classes in the classification.
3.4.2 MobileNet
3.4.3 ResNet50
3.4.4 VGG16
4.1 InceptionResnet-V2
4.2 MobileNet
The evaluation metrics obtained by MobileNet are shown in Table 2. The model
demonstrates strong performance in all classes, with the No_tumor class showing
high scores, indicating effective identification. The Glioma, Meningioma, and Pitu-
itary classes also have high scores, contributing to a balanced overall performance.
4.3 ResNet50
4.4 VGG16
The metrics obtained by the model are shown in Table 4. The model shows good
performance across all classes, with high precision, recall, and F1-score values. The
No_tumor class stands out with perfect precision and a high F1-score, indicating that
the model is very accurate in identifying No_tumor cases. The Glioma class has a
lower recall, suggesting that the model might have missed some instances of Glioma.
The Pituitary class has a lower precision, indicating a higher false positive rate.
The comparison of the four models based on their testing accuracy is presented
in Table 5.
Comparative Insights
Table 5 Comparison of
Model Test accuracy (%)
proposed models based on
their testing accuracies InceptionResNet-V2 98.78
MobileNet 98.25
ResNet50 98.78
VGG16 96.26
7 Examining Transfer Learning Models to Classify Brain Tumors … 97
• InceptionResNet-V2 and ResNet50 emerge as the top performers with equal test
accuracy, indicating their superior capability in brain tumor classification.
• While slightly less accurate, MobileNet’s performance is commendable, espe-
cially considering its potentially lower computational demands, making it suitable
for resource-constrained environments.
• Despite its lower accuracy, VGG16 could be a viable option in scenarios with
specific requirements and computational constraints.
• The comparable accuracies of ResNet50 and InceptionResNet-V2 highlight
the effectiveness of transfer learning, particularly in limited dataset scenarios,
achieving higher accuracy more rapidly than training models from scratch.
• The implementation of fine-tuning alongside transfer learning allows for model
weight adjustments to better fit specific problems, enhancing accuracy levels.
References
1. Van Meir EG, Hadjipanayis CG, Norden AD, Shu HK, Wen PY, Olson JJ (2010) Exciting new
advances in neuro-oncology: the avenue to a cure for malignant glioma. CA: Cancer J Clin
60(3):166–193
2. Deangelis LM (2001) Brain tumors. N Engl J Med 344(2):114–123
3. Scegedy C, Liu W, Jia Y (2015) Going deeper with convolutions in proceedings of the IEEE
conference on computer vision and pattern
4. Prakash RM, Kumari RSS (2019) Classification of MR brain images for detection of tumor
with transfer learning from pre-trained CNN models. In: 2019 international conference on
wireless communications signal processing and networking (WiSPNET). IEEE, pp 508–511
5. Abdalla HEM, Esmail MY (2018) Brain tumor detection by using artificial neural network.
In: 2018 international conference on computer, control, electrical, and electronics engineering
(ICCCEEE). IEEE, pp 1–6
6. Pereira S, Pinto A, Alves V, Silva CA (2016) Brain tumor segmentation using convolutional
neural networks in MRI images. IEEE Trans Med Imaging 35(5):1240–1251
7. Abd-Ellah MK, Awad AI, Khalaf AA, Hamed HF (2018) Two phase multi-model automatic
brain tumour diagnosis system from magnetic resonance images using convolutional neural
networks. EURASIP J Image Video Process
8. Noreen N, Palaniappan S, Qayyum A, Ahmad I, Imran M, Shoaib M (2020) A deep learning
model based on concatenation approach for the diagnosis of brain tumor. IEEE Access
9. Khan HA, Jue W, Mushtaq M, Mushtaq MU (2020) Brain tumour classification in MRI image
using convolutional neural network. Math Biosci Eng 17(5):6203–6216
10. Amin J, Sharif M, Haldorai A, Yasmin M, Nayak RS (2022) Brain tumor detection
and classification using machine learning: a comprehensive survey. Complex Intell Syst
8(4):3161–3183
11. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple
way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
12. Dvorák P, Menze B (2015) Structured prediction with convolutional neural networks for
multimodal brain tumour segmentation. In: MICCAI multimodal brain tumor segmentation
challenge (BraTS), pp 13–24
13. Irsheidat S, Duwairi R (2020) Brain tumor detection using artificial convolutional neural
networks. In: 2020 11th international conference on information and communication systems
(ICICS). IEEE, pp 197–203
14. Sravya V, Malathi S (2021) Survey on brain tumor detection using machine learning and
deep learning. In: 2021 international conference on computer communication and informatics
(ICCCI), pp 1–3
15. Dipu NM, Shohan SA, Salam KMA (2021) Deep learning based brain tumor detection and
classification. In: 2021 international conference on intelligent technologies (CONIT), pp 1–6
16. Gaikwad S, Patel S, Shetty A (2021) Brain tumor detection: an application based on machine
learning. In: 2021 2nd international conference for emerging technology (INCET), pp 1–4
17. Islam M, Barua P, Rahman M, Ahammed T, Akter L, Uddin J (2023) Transfer learning architec-
tures with fine-tuning for brain tumor classification using magnetic resonance imaging. Healthc
Anal 4:1–10
18. Tenghongsakul K, Kanjanasurat I, Archevapanich T, Purahong B, Lasakul A (2022) Deep
transfer learning for brain tumor detection based on MRI images. J Phys Conf Ser
19. Mitta AB, Hegde AH, KP AR, Gowrishankar S (2023) Brain tumor detection: an applica-
tion based on transfer learning. In: 7th international conference on trends in electronics and
informatics (ICOEI), pp 1424–1430
20. Dhakshnamurthy VK, Govindan M, Sreerangan K, Nagarajan MD, Thomas A (2024) Brain
tumor detection and classification using transfer learning models. 2nd Comput Congress
Chapter 8
Supervised Machine Learning
for Recognition of Gujarati Handwritten
Characters with Modifiers
1 Introduction
One of the primary problems in pattern recognition and image processing is hand-
written character recognition (HCR) [1]. One method that can turn text included in a
digital image into editable text is optical character recognition. It makes use of optical
systems to enable character recognition in machines. Ideally, the OCR’s output and
the formatting input should match. The procedure entails pre-processing the image
file and then learning crucial information regarding written text [2].
Character recognition can be divided into two primary categories: online and
offline. An online character recognition system generates a document first, scans it
optically, digitises it, stores it on a computer, and then takes it for processing and
testing. Characters are handled in the online systems while they are being created, in
contrast to the off-line class. The pattern’s points are therefore dependent on various
factors such as time, pressure, speed, slant, and strokes [3, 4].
A system that transforms incoming text into a machine-encoded format is called
optical character recognition (OCR) [5]. These days, optical character recognition
(OCR) aids in both the digitisation of handwritten mediaeval manuscripts [6] and the
conversion of typewritten materials into digital format [7]. Because of this, obtaining
the necessary information is now simpler because it is no longer necessary to sift
through mountains of papers and files in order to find it. Organisations are meeting
demands for legal records [8], historical data [9], educational persistence [10], and
other digital preservation needs.
It is necessary to create optical character recognition in order to recognise
various languages. The language spoken in Gujarat is called Gujarati. There are
many handwritten Gujarati documents available. To convert handwritten documents
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 99
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
100 S. Shukla and P. Tanna
2 Related Work
Kajale [11] had trained the model with 36 alpha numerals and special symbols
like., ‘ “ ? / \ | ! #, etc. in 200 samples. Models can be trained using a supervised
machine learning approach. The model first identifies the type of character, whether
it is handwritten or printed, then matches the character ID with the specific dataset
of handwritten or printed character datasets and provides an accuracy of 90% in
recognising nameplates written in English.
Ehsan Shirzadi [12] had implemented LVQ as a supervised machine learning
algorithm to identify lower case machine written English characters. Within the field
of machine learning and artificial neural networks, the Learning Vector Quantisation
(LVQ) algorithm can be used to solve supervised classification issues [13]. The
researcher used Matlab as a platform, and after applying LVQ, he achieved near-to-
100% accuracy.
Mahajan et al. [14] created and proved the optical correlator for character recogni-
tion using neural network architecture, in 2014. The suggested method forms the basis
for an optical character recognition (OCR) system trained using the back propagation
algorithm, where each typed English character is encoded as a binary value. These
binary numbers are entered into a feature extraction system. The ANN is then given
both the system’s output and its input. After implementation of the Feed Forward
8 Supervised Machine Learning for Recognition of Gujarati Handwritten … 101
now able to create algorithms and strategies that can more accurately recognise hand-
written manuscripts, thanks to advancements in machine learning and deep learning
[21].
Additionally noted was the increased use of Convolutional Neural Networks
(CNN) by researchers to recognise handwritten and machine-printed characters. This
is because recognition tasks where the input is an image are a good fit for CNN-based
systems. The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 216
is one example of an image problem for which CNN was first applied [21]. For
visual recognition tasks, some of the popular CNN-based designs are AlexNet [22],
GoogLeNet [23], and ResNet [24].
3 Classification Process
3.2 Segmentation
Each character image segmented and stored in the labelled folder will be further
transformed to grayscale image. The image is generated using pixels of different
colours from an RGB combination. The image is heavy to load with RGB, but we
can convert it to grayscale, which will give us an image with less dimension, so it is
easy to use in the system.
Machine learning is the process of enabling machines to learn from data and predict
sum results based on the classification used. Machine learning allows computers
to learn automatically from data and generate predictions. One of the two main
categories of machine learning is supervised learning, which enables a model to
forecast future results after it has been trained using historical data. In supervised
learning, the model is trained using pairs of input and output or labelled data with the
aim of generating a function that is sufficiently approximated to be able to predict
outputs for given inputs when they are introduced [26].
Supervised machine learning learns the relationship between input and output.
If features are extracted from the input and compared with the training dataset,
then a prediction can be made. There are many algorithms using which we can
implement supervised machine learning, like Linear Regression, Support Vector
Machine (SVM), Decision Tree, Random Forest, K-Nearest Neighbour (kNN), Naive
Bayes, Gaussian Naive Bayes, and Logistic Regression.
For the presented research study, the dataset is ready, and it is converted to a
numpy array. The prepared numpy array can be used directly as input for model
building. The following steps are performed to prepare a model:
• Step 1: Convert the dataset into a numpy array of data and label.
• Step 2: Load data and target. [[Link]() helps load data from a numpy array.]
• Step 3: Data Preprocessing. [All the elements of the array are checked; if it finds
unlabeled data, it will be deleted from the array.]
• Step 4: Train and test split. [x_train, x_test, and y_train, y_test are generated. In
the study, 80% of the data is used as training, 10% for validation, and 10% for
testing.]
• Step 5: Reshape the arrays x_train and x_test. [This can be used to flatten an
array.]
• Step 6: Setting Hyperparameters for Hyperparameter Tuning. [Grid search can be
used to set the hyperparameters.]
• Step 7: Building a Model and Evaluation.
The process of determining the ideal collection of hyperparameters for a machine
learning model to achieve optimal performance is called hyperparameter tuning,
104 S. Shukla and P. Tanna
Decision Trees are a type of supervised learning algorithm. They can read labelled
dataset and learn about the features of input and connect them to the target labels.
[Link] has DecisionTreeClassifier() which can be used to implement a decision
tree classification algorithm. We are using max_depth parameter, min_sample_leaf
parameter, and min_sample_split parameters as hyperparameters and achieving an
accuracy of 64.7%. The report of classification is presented in Table 1.
8 Supervised Machine Learning for Recognition of Gujarati Handwritten … 105
Gaussian Naive Bayes is a special version of the Naive Bayes algorithm, and it is
intended for data whose features are thought to have a Gaussian (normal) distribution.
It is particularly well-suited for classification tasks involving continuous features.
These algorithms give us an accuracy of 34%. The report of classification is described
in Table 3.
106 S. Shukla and P. Tanna
3.10 KNN
Random Forest algorithm trains large number of decision trees and learns from
the results of it. The class with mean prediction and mode of the classes will be
returned. The stability and efficacy of random forest method make it one of the most
108 S. Shukla and P. Tanna
widely used and adaptable ones. We are using max_depth parameter, min_sample_
leaf parameter, and min_sample_split parameter as hyperparameters and achieving
an accuracy of 77%. The report of classification is presented in Table 7.
All the machine learning algorithms give different accuracy for our study. Table 8
represents the comparative table for the accuracy of all the supervised machine
learning algorithms.
The comparison shows that we can achieve maximum accuracy using a random
forest and support vector machine algorithms (see Fig. 2). It shows the comparison
of implemented classification algorithms.
8 Supervised Machine Learning for Recognition of Gujarati Handwritten … 109
4 Conclusion
Using eight different classification algorithms, we find that the Random Forest and
Support Vector Machine algorithms perform the best for our needs. Other algorithms
don’t deliver results that are satisfactory. Three classes of modifiers in the set of
handwritten Gujarati letters can be identified with 75 and 77% accuracy using SVM
and Random Forest, respectively.
This study focuses on three Gujarati character classes. In Gujarati, there are a
total of 476 classes that combine consonants with modifiers. It took three to four
hours to train and assess the model in the experiment that was given. The accuracy
will decrease, and the training duration of the model will increase with the number
of classes. This dataset is likewise constrained. It is possible to increase the dataset
by acquiring more data from individuals. We can employ a deep learning algorithm
with a larger dataset and a large number of classes for classification.
References
25. Suthar SB, Thakkar AR (2023) Dataset generation for Gujarati language using handwritten
character images. PREPRINT (Version 1) available at Research Square. [Link]
21203/[Link]-3041349/v1
26. Sen PC, Hajra M, Ghosh M (2020) Supervised classification algorithms in machine learning:
a survey and review. In: Mandal J, Bhattacharya D (eds) Emerging technology in modelling
and graphics. Advances in intelligent systems and computing, vol 937. Springer, Singapore.
[Link]
Chapter 9
Gujarati Handwritten Conjunct
Consonant Recognition Using Deep
Learning
1 Introduction
The Gujarati script is derived from the Devanagari script. Over 50 million individuals,
primarily Gujarati people from Gujarat, India, and around the world, use Gujarati
vocabulary. Today, the majority of document management systems in government
buildings are primarily text-based. There is a vast quantity of publications and books
available in printed or scanned formats. You need a solution that can accurately iden-
tify individuals from scanned paper documents. An optical character’s recognition
system is a computer system that can recognize character types from an image or
document and process them automatically.
Researchers are focusing on developing several OCR models for different
languages. Utilizing Artificial Neural Networks for pattern and image recognition
can enable the creation of a self-learning model that can identify raw scanned images
based on existing models.
Handwritten words encompass a range of writing styles, dimensions, and curves,
making interpretation challenging. Performing OCR in Gujarati language is chal-
lenging due to the sensitivity to little variations in writing style that can lead to
inaccurate character identification.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 113
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
114 R. Chaudhari and P. Tanna
2 Literature Review
3 Proposed Method
Deep learning techniques rely on Artificial Neural Networks (ANN) that mimic the
brain’s information processing to offer a self-learning representation feature. Deep
learning, similar to a trained robot with self-learning, utilizes several algorithms to
construct models. Deep learning models can autonomously identify and prioritize
relevant features with minimal programmer intervention. During the training process,
the algorithm uses unfamiliar data as input to categorize objects, perform feature
extraction, and provide meaningful information. Deep learning models provide both
feature extraction and classification as shown in Fig. 1.
When dealing with a high volume of inputs as well as outputs, Deep Learning
can be utilized. It is used to imitate human behavior. Deep Learning is executed by
Neural Networks, which are composed of Neurons. Deep learning models consist of
several algorithms, and it is essential to choose the best suitable one for a specific
task.
CNN is a deep learning algorithm specifically created for processing data with a grid-
like layout, like photographs. It is created using the visual brain structure of animals
to gather spatial data hierarchies that include basic and complex patterns. CNN is a
mathematical model consisting of three main types of learning layers: convolutional
neural networks, layer pooling, and fully connected layers as shown in Fig. 2. The
initial two layers, convolution and pooling, identify features, whereas the third layer,
a layer that is completely interconnected, transforms these characteristics into the
ultimate output, like classification. A convolution layer is a fundamental component
of CNNs that consists of a series of mathematical operations, including convolution,
which is a specific sort of linear operation.
The CNN structure uses Forward propagation as the process of transforming input
data into output across layers, while backward propagation is the reverse process.
9 Gujarati Handwritten Conjunct Consonant Recognition Using Deep … 117
A pooling layer reduces the size of feature maps, creating translation invariance to
small shifts and distortions, and reducing the number of trainable parameters in subse-
quent layers. The layers of pooling do not possess any trainable parameters. The filter
size, stride, and padding are hyper parameters used in pooling procedures, similar to
convolution processes. Max pooling is a frequently utilized pooling technique that
retains just the maximum value inside the specified kernel size.
Max pooling: Max pooling is the most commonly used pooling procedure. It
involves selecting patches from the provided feature maps, identifying the highest
value for each patch, and discarding the rest.
The final convolutions or pooling layer’s output feature maps are usually flattened into
a one-dimensional array of numbers and then linked to a number of fully connected
layers, also called dense layers, where each input is linked to every output through a
trainable weight.
118 R. Chaudhari and P. Tanna
The suggested model, which is depicted in Fig. 3, explains how the data is processed
and which technique is applied for recognition using the stages that are listed below.
Steps:
(1) In this paper, we use a custom database in which we create more than 960 classes
and more than 15 k images in the lab.
(2) After load data use normalization and pre-processing to remove noise and filter
images.
(3) Divide data in Training, Testing, and validation to apply data for CNN-based
classification approach.
(4) Data enhancement is a method that creates additional training information by
applying different modifications to the images already present in a dataset. Use
ImageDataGenerator for the image Augmentation approach which uses scaling
and reshaping images as required.
(5) CNN-based EfficientNetB3 architecture used for classification and calculated
accuracy.
and its parameters created and implemented in the proposed model and use of other
parameters also which affect performance of the system final outcome.
Figure 5 shows the model result for epoch run based on given value. Figure 5 illus-
trates the number of epochs, accuracy achieved after running, as well as graphs
depicting time, loss, and accuracy depending on epochs. Here, according to the
result, the model stops running at epoch value 5 so there is no chance for over-fitting
as used early stop functionality. The training and validation Accuracy & Loss graph
is shown in Fig. 6.
We achieved superior performance by implementing the proposed model using
CNN with Efficient B3, resulting in high precision and recall in classifying all tested
input samples. Figures 4, 5, 6 and 7 display the parameters from Table 2 used to
execute the model, leading to improved outcomes. The figures illustrate the number
of epochs, accuracy achieved after running, as well as graphs depicting time, loss, and
accuracy depending on epochs. The data displayed in Table 2 shows classification
reports of some of the conjunct consonants from 960 classes with precision, recall,
f1 score, and support received from the proposed model.
9 Gujarati Handwritten Conjunct Consonant Recognition Using Deep … 121
4 Conclusion
This research showcases the application of a deep learning model for identifying
handwritten Gujarati joint characters. The main objective was to develop algorithms
for identifying joint characters in handwritten Gujarati text and then evaluating their
results. Thorough research is conducted on identifying handwritten joint characters.
The models for implementation have been chosen accordingly. To achieve the goals,
a dataset consisting of 80,000 photos was developed first. The dataset was divided
into training, testing, and validation sets. The photographs’ dimensions were altered
according to the requirements of the algorithms. Utilize the CNN-based Efficient-
NetB3 model to test and validate handwritten Gujarati joint characters, achieving an
accuracy of over 83.88%.
124 R. Chaudhari and P. Tanna
References
1. Vyas AN, Goswami MM (2015) Classification of handwritten Gujarati numerals. In: Interna-
tional conference on advances in computing, communications and informatics (ICACCI), pp
1231–1237
2. Macwan SJ, Vyas AN (2015) Classification of offline Gujarati handwritten characters. In: Inter-
national conference on advances in computing, communications and informatics (ICACCI),
pp 1535–1541
3. Chandarana J, Kapadia M (2013) A review of optical character recognition. IJERT. ISSN:
2278-0181
4. Patel C, Desai A (2013) Gujarati handwritten character recognition using hybrid method based
on binary tree-classifier and k-nearest neighbour. IJERT. ISSN(O): 2278-0181
5. Chaudhary M, Shikkenawis G, Mitra SK, Goswami MM (2012) Similar looking Gujarati
printed character recognition using locality preserving projection and artificial neural networks.
In: Third international conference on emerging applications of information technology, pp
153–156
6. Chaudhari SA, Gulati RM (2013) An OCR for separation and identification of mixed English—
Gujarati digits using kNN classifier. In: International conference on intelligent systems and
signal processing (ISSP), pp 190–193
7. Mendapara MB, Goswami MM (2014) Stroke identification in Gujarati text using direc-
tional features. In: International conference on green computing communication and electrical
engineering (ICGCCEE), pp 1–5
8. Antani S, Agnihotri L (1999) Gujarati character recognition. In: Proceedings of the fifth
international conference on document analysis and recognition (ICDAR 99), pp 418–421
9. Goswami MM, Prajapati HB, Dabhi VK (2011) Classification of printed Gujarati charac-
ters using SOM based k-Nearest Neighbor Classifier. In: International conference on image
information processing, pp 1–5
10. Vasant AR, Vasant S, Kulkarni GR (2012) Performance evaluation of different image sizes for
recognizing offline handwritten Gujarati digits using neural network approach. In: International
conference on communication systems and network technologies, pp 270–273
11. Desai A (2015) Support vector machine for identification of handwritten Gujarati alphabets
using hybrid feature space. In: CSI transactions on ICT, pp 235–241
12. Gohel C, Goswami M, Prajapati V(2015) On-line handwritten Gujarati character recognition
using low level stroke. In: Third international conference on image information processing
(ICIIP), pp 130–134
13. Dutt SS, Amin JD (2016) Handwritten Gujarati text recognition using artificial neural network
and error correction using probabilistic neural network in recognized text. IJARIIE. ISSN(O):
2395-4396
14. Joshi DS, Risodkar YR (2018) Deep learning based Gujarati handwritten characters. In:
International conference on advances in communication and computing technology (ICACCT)
15. Das D, Nayak DR, Dash R, Majhi B (2020) MJCN: multi-objective Jaya convolutional network
for handwritten optical character recognition. In: Multimedia tools and applications. https://
[Link]/10.1007/s11042-020-09457-6
16. Link [Link]
els-5fd5b736142. Complete architectural details of all EfficientNet models
Chapter 10
Multimodal Deep Learning for Enhanced
Prediction of Molecular Binding
Affinities Integrating Chemical
Structures and Protein Sequences
1 Introduction
Deep learning was utilized in the Improved Molecular Binding Affinity technique to
evaluate the binding strength between protein sequences and drug structures. This
approach took into account protein sequences represented as amino acid sequences
and drug structures expressed as SMILES. The term “drug structure” describes the
molecular makeup and atomic arrangement of a pharmaceutical molecule.
This data is critical for the design, optimization, and prediction of drug action in
human tissue, including bioavailability, efficacy, and potential side effects. SMILES,
or the Simplified Molecular Input Line Entry System, is a human-readable notation
system that uses ASCII letters to describe chemical structures. A number of characters
were used to represent bonds; these characters represented ring closures, branches,
and single, double, or triple bonds. The atomic symbols were used to identify the
elements. SMILES may have applications in cheminformatics, bioinformatics, and
chemistry. In the SMILES analysis of imatinib in the DAVIS dataset, character-level
tokenization is applied to the provided SMILES sequence using the SMILES notation
CN1CCN(CC1)C2…. This can only be accomplished by breaking the sequence up
into distinct tokens, which yield tokens such as “C,” “N,” “1,” “C,” “C,” “N,” and
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 125
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
126 L. Prasika et al.
A. Dataset
The two datasets used to evaluate the performance of the deep learning framework:
Davis and KIBA. These datasets provided a collection of drug-target pairs with
experimentally or computationally determined binding affinities.
10 Multimodal Deep Learning for Enhanced Prediction of Molecular … 127
The first dataset was DAVIS which shows the interaction between 68 kinase
inhibitors and 442 kinases, covering over 80% of the human catalytic protein kinase.
It consisted of 30,056 Drug Target Interaction pairs as shown in Table 1. The binding
affinity between kinase inhibitors and protein kinases was to be predicted. The dataset
includes information about the target amino acid sequence and compound SMILES
string. This dataset was a valuable resource for developing models in drug discovery,
particularly for understanding and predicting kinase inhibitor interactions.
The second dataset is the KiBA (Kinase inhibitor Bio-Activity) dataset. Kinase
is a biocatalyst enzyme which transfers the phosphate group from ATP to a specific
molecule. The dataset consisted of 118,257 drug-target interaction pairs involving
2,111 drugs and 229 proteins. The goal was to predict the binding affinity between
drugs and target proteins based on their amino acid sequence or compound SMILES
string. The dataset contained a lot of information about how kinase inhibitors
(compounds that influence proteins) interact with proteins. The dataset helped models
to predict how well certain drugs might work in the process of discovering new
medicines.
B. Concatenation and Fusion
Various features obtained from SMILES and protein sequence processing were
successfully merged using the concatenation approach. Accurate representation of
the interactions between chemical and biological components was achieved by this
methodical integration. Molecular structures and protein compositions interact intri-
cately, and the model could comprehend these connections by creating a feature
map. This combination allowed the model to function with both kinds of data, which
Table 1 Datasets
Dataset Drug Protein (Target) Interaction
DAVIS 68 442 30,056
KiBA 2111 229 118,254
128 L. Prasika et al.
improved results and increased the accuracy of binding affinity predictions—a crit-
ical component of the deep learning-based Improved Molecular Binding Affinity
technique.
Sequences and chemical structures in the Simplified Molecular Input Line Entry
System (SMILES) underwent necessary processing processes in the early stages of
SMILES processing. The SMILES strings were divided into separate tokens using a
character-level tokenizer. A structured numerical representation of the chemical data
was then produced by numerically converting these tokens. The SMILES model’s
capacity to identify minute patterns within the chemical structure was improved
by the addition of embedding layers, which encode associations between various
characters.
Similar processing was done on protein sequences in parallel. Using a character-
level tokenizer, they were separated into distinct tokens. To create an ordered
numerical representation of the chemical data, these tokens were then numerically
converted. To enhance the model’s capacity to identify minute patterns within the
molecular structure, embedding layers were used to encode the connections between
different characters in the protein sequences.
C. Neural Network Architecture
Using a deep learning framework, the design started with two input layers, each
representing a distinct kind of data, in order to increase the molecular binding
affinity method. Using a character-level tokenizer, the initial stage in the SMILES
sequences processing branch was tokenization and numerical conversion. The
SMILES strings were divided into individual characters by this procedure, and
those characters were subsequently numerically encoded to provide a structured
representation. The correlations between various characteristics were then recorded
using an embedding layer, which produced a numerical depiction of the chemical
data. Character-level tokenization created tokens like ‘C’, ‘1’, ‘=’, ‘C’, and so on
in the SMILES sequence “C1=CC2=C(C=C1C3=NC(=NC=C3)N)NN=C2N,” for
example. A predefined mapping was then applied to each character to transform it
to a numerical representation.
Tokenization was also performed at the character level on the protein sequence
MARTTSQLYDAVP….. Using a predefined mapping, this sequence was split up into
discrete tokens such as “M,” “A,” “R,” and so on. The tokens were then numerically
encoded. Conv1D layers processed the protein sequences and inserted SMILES in
numerous ways. They contributed to the prediction outcome by preserving translation
invariance for pattern recognition, identifying local patterns in sequential data, and
acting as filters to extract advanced features. Conv1D layers also improved training
efficiency by reducing dimensionality through pooling operations.
In conjunction with BiLSTM layers, they yielded a thorough comprehension of
input sequences. Processing SMILES and protein sequences required the use of bidi-
rectional long short-term memory networks or BiLSTM. Long-range relationships
and affiliations in sequential data were captured by BiLSTM, which also successfully
represented patterns and contextual dependencies within the sequences, enabling
10 Multimodal Deep Learning for Enhanced Prediction of Molecular … 129
A. Evaluation Criteria
Reliability and accuracy of a model that predicts affinity relations for molecular
interaction are critical to computational molecular biology and drug development.
This section describes the assessment criteria used to assess the model’s execution. To
increase computing efficiency and quicken the assessment process, GPU acceleration
is used. In order to quantify accuracy, mean absolute error, or MAE, calculates the
average absolute difference between actual and projected affinity connections.
A lower MAE indicates that forecasts and ground reality are more closely aligned.
As a metric that can be understood in the same units as the predicted variable, Root
Mean Squared Error (RMSE), which is derived from MSE, is presented. Increased
accuracy and dependability are implied by a lower RMSE.
The percentage of binding affinity variance that the model explains is measured
by the R-squared (R2 ) score. Greater R2 denotes a better fit, indicating that the model
can adequately describe and capture the variation in molecular affiliation. By quanti-
fying the squared average difference and highlighting the impact of greater mistakes,
Mean Squared Error (MSE) offers a thorough understanding of overall efficiency.
Better accuracy and dependability are indicated by a lower MSE. The Concordance
Index (CI), a popular statistic used exclusively for drug-target interaction prediction,
evaluates how well the model can accurately rank possible drug-target pairings. This
section assesses the dependability of CI values in predicting drug-target affinity using
comparisons between two widely used datasets (DAVIS and KIBA).
Through the use of these several assessment criteria, we are able to obtain a
solid grasp of the model’s dependability for drug discovery and development by
predicting binding affinities with accuracy, precision, and overall efficacy. We use
assessment measures in our study to see how well our prediction models performed in
the regression. We employ the commonly used Mean Squared Error (MSE), a metric
that expresses how different actual values are from expected values, for regression.
1
a
2
The MSE as in (1) is calculated as the average of the squared differences across
all samples, where each sample is denoted by Ec as the actual value and Ec as the
predicted value. A smaller MSE signifies heightened accuracy in our regression
predictions.
CI = B(bs − bt ) (2)
∂s >∂t
By determining if the anticipated affinity scores match the actual values in the
same order, the confidence interval (CI) is calculated as in (2). Specifically, the
formula incorporates ∂s as the predicted affinity score order, bs as the predicted value
associated with a larger affinity score ∂s , bt as the value with a smaller affinity score
∂t , and W (u) as a piecewise function defining thresholds for alignment as in (3).
These assessment criteria function as strong predictors of our models’ predic-
tive success, offering insightful information on the precision, accuracy, and ranking
stability of our methodology in the complex field of drug-target interaction predic-
tions. With a CI of 0.832 and an MSE of 0.155, this model predicts a rather excellent
concordance and decent accuracy in binding affinity prediction.
B. Settings of Hyperparameters
Optimizing the performance of the model is an essential part of enhancing its oper-
ation, requiring accurate adjustments to its configurations. The choices chosen for
these configurations in Table 2 have a substantial impact on the nature of the learning
procedure and, as a result, our neural network’s predictive power.
For all the adaptation and validation stages, a batch size of 256 was selected.
This configuration guaranteed representative samples for model updates while opti-
mizing memory use and processing speed. A total of 100 epochs are used to train the
model. This choice was based on a careful balance between preventing overfitting
and achieving an acceptable adjustment. A regularization technique called dropout
was applied at a rate of 0.1. By randomly omitting a portion of nodes throughout the
learning process, this helps to mitigate overfitting.
128 dimensions were set up for the embedding layer for biological sequences and
SMILES. This decision achieves a subtle balance between computing efficiency and
depth of representation. Convolutional layers were utilized in order to extract local
patterns from the input sequences. For the alternative biological routes and SMILES,
three convolutional layers with filter widths of 4, 6, and 8 were used, respectively.
1024, 1024, and 512 dimensions make up the completely linked dense layers. The
132 L. Prasika et al.
Table 4 The Avg. CI and MSE scores of the test set for the DAVIS dataset and KIBA dataset for
our model
Models Target rep Drug rep CI MSE
DTABinding SeqM&StruM SeqM&StruM 0.832 0.155
Furthermore, the low Mean Squared Error (MSE) of 0.155 suggests that the
model is better at making precise predictions about the strength of these interac-
tions. Our model will be a valuable tool for predicting and understanding drug-target
interactions.
4 Conclusion
The research provided a bioinformatics strategy for DTI estimation that comprised
many network architectures and encoding approaches. To analyze the structure, two
popular datasets that had been tokenized and encoded were used: Davis and KIBA.
Concatenating the outputs of the network configurations that were used produced
the best results. In these investigations, computational molecular biology has made
substantial progress toward predicting affinities for molecule binding.
Drug development strategies improved the model’s ability to identify complex
patterns by combining information from biological and chemical sources. With the
development of more productive new computational methods, the Improved Molec-
ular Binding Affinity technique employing deep learning represents a significant step
toward wide and accurate predictions in biochemical affiliation research.
The integration of information from both chemical and biological data enhances
the model’s capability to recognize complex patterns, providing for drug discovery
processes. As computational methods continue to evolve, this multimodal deep
learning approach will be a significant step towards the accurate and extensive
predictions in molecular interaction studies.
References
1. Pu Y, Li J, Tang J, Guo F (2022) Deep fusion DTA: drug-target binding affinity prediction with
information fusion and hybrid deep-learning ensemble model. IEEE/ACM Trans Comput Biol
Bioinform 19(1)
2. Zhao Y, Yang Z et al (2023) Improving protein function prediction by adaptively fusing
information from protein sequences and biomedical literature. IEEE J Bioinf Health Inform
27(2)
3. Jiang Y, Quan L, Li K, Li Y et al (2023) DGCddG: deep graph convolution for predicting
protein-protein binding affinity changes upon mutations. IEEE/ACM Trans Comput Biol Bioinf
20(3)
4. Dhanuka R, Singh JP, Tripathi A (2023) A comprehensive survey of deep learning techniques
in protein function prediction. IEEE/ACM Trans Comput Biol Bioinf 20(4)
10 Multimodal Deep Learning for Enhanced Prediction of Molecular … 135
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 137
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
138 N. Udaya Kumar et al.
in turn, operating speed, are continuing. In digital computers, basic arithmetic oper-
ations are implemented using gates like AND, OR, NOR, NAND, etc. Multiplica-
tion is achieved through repeated addition, subtraction by negating, and division by
repeated subtraction. Hence, the adder plays a major role in performing arithmetic
operations. Additions can be done with various adders, and one of those adders is
the ripple carry adder, where each full adder waits for the carry from the previous
one. Lamani et al. [2] made comparisons between various adder topologies such as
Ripple Carry Adder (RCA), Carry Save Adder (CSA), Carry Skip Adder (CSkA),
and Carry Select Adder (CSLA) and concluded that While RCA has a basic archi-
tecture, carry propagation causes a significant latency. For faster computation, more
advanced adder designs like Carry Save Adder (CSA), Carry Increment Adder (CIA),
Carry-Lookahead Adder (CLA), or Carry Select Adder (CSLA) are used.
Raju and Kumari [3] focused on modifying Carry Save Adder (CSA) for lower
power consumption. While discussing the benefits of CSA in faster computation, it
mentions the requirement for an extra adder stage to obtain the final sum, which can be
disadvantageous in some applications. Even the CSA developed by Benisha Bennet
et al. is power-efficient but covers more area [4]. Varshney and Arya [5] proposed an
improved Carry Increment Adder (CIA) by replacing the Ripple Carry Adder (RCA)
with a Carry Look-Ahead Adder (CLA), Kogge-stone adder, and Hancarlson block.
Although there is a reduction in complex circuitry, the modified CIA is less efficient
in terms of utilization of area and power. They also discussed the performance metrics
of the Hancarlson adder, providing less delay and improved area utilization.
In, Omid Akbari et al. mentioned that by increasing the width of the Carry Look
Ahead Adder (CLA), the delay, area usage, and power consumption of the carry
generator units increase [6]. So CLA possesses the disadvantage of design complexity
as the number of variables increases. Tapadar et al. [7] acknowledged that to elude the
latency, Carry Select Adder (CSLA) is suggested over all the other adders, although
it exhausts more area. The CSLA excels at reducing carry propagation delay through
parallel carry generation but is considered less area-efficient because of the cascading
of Ripple Carry Adders (RCA).
Swetha and Reddy [8] designed a Carry Select Adder at a cost per area using D-
latch and multiplexers. By comparison with high-speed adders such as Brent Kung, it
consumes more power and delay. The major constraint of the above adder is the area
and population consumption. Hence, concentrating on the area and power utilization
of the adder, a new adder is proposed using hybrid technology, in which the adder is
designed by employing multiple logic circuits. So different adders are studied from
various reference papers to know their performance metrics.
In, Suganya et al. mentioned that lower latency is caused by ling adders, and
less area is used in contrast to the Carry-Lookahead Adder [9]. Also based on the
study by Gaur et al. [10], the Weinberger adder reduces logic stages and final carry
generation duration in comparison with other adders. According to Refs. [11–13], the
topology based on binary to excess one converters (BECs) minimizes logic redun-
dancy, which lowers power and area usage. BEC circuits can consume less power
and area utilization.
11 A Comprehensive Performance Analysis of Area and Power-Efficient … 139
The hybrid adder combines these modules synergistically. Ling handles long-
distance carry propagation, while Weinberger segments and optimizes local delays.
Han Carlson’s area-efficient logic balances the complexity of other modules, and the
Ripple Carry Adder provides a cost-effective option where appropriate. This strategic
combination addresses each bottleneck with the most suitable module, leading to
overall performance gains in terms of area, power, and delay. While further investi-
gation is required to explore limitations and wider applicability, the hybrid approach
demonstrates substantial potential for enhancing VLSI circuit efficiency. Therefore,
the previously mentioned adders are integrated as separate modules to enhance the
adder’s speed and area usage, and finally an area and power-efficient hybrid adder is
developed.
This paper presents a hybrid adder incorporating various adders to improve crucial
performance metrics. The subsequent sections are structured as follows: Sect. 2
outlines the VLSI architecture of the proposed hybrid adder, while Sects. 3 and 4
delve into discussions regarding the obtained results.
The hybrid full-adder design aims to balance speed, area, and power consumption. It
incorporates multiple addition schemes, low-power logic gates, and level restoration
carry logic for efficient performance. The goal is to reduce power consumption,
minimize area requirements, and enhance overall efficiency in digital circuit design.
The 16-bit hybrid adder, designed based on the SQRT CSLA architecture and
depicted in the Fig. 1, comprises five distinct groups of adders such as Ripple Carry
Adder (RCA), Han-Carlson Adder, Binary to Excess-1 Converter (BEC), Weinberger
Adder, and Ling Adder. The RCA is simple and easy to implement, suitable for small
bit-width additions. Han-Carlson, Weinberger, and Ling adders are designed for
efficient parallel computation, enhancing the speed of addition operations. They are
optimized for specific bit ranges and have a conditional carry selection mechanism
that optimizes functionality. In summary, each type of adder in the 16-bit hybrid
design contributes to overall efficiency by addressing specific bit ranges and utilizing
specialized designs.
Binary to Excess-1 Converter (BEC) is used for converting binary numbers to
excess-1 notation, playing a specific role in addressing and optimizing the addition
process. It participates in the conditional selection mechanism, dynamically influ-
encing the choice of output based on the carryout status. In the architecture of the
proposed hybrid adder shown in Fig. 1, the 4-bit Binary to Excess-1 Converter is
referred to as BEC4. Similarly, the 5-bit Binary to Excess-1 Converter is denoted as
BEC5, and the 6-bit Binary to Excess-1 Converter is labeled as BEC6.
The initial two groups comprising RCA in Fig. 1 are dedicated to processing two
inputs of size 2-bit along with carrying 1-bit. The first group consisting of RCA
takes A[1:0] and B[1:0] along with cin as inputs and generates sum out as s[1:0] and
carryout as cout1. The second group, consisting of RCA, takes A[3:2] and B[3:2]
140 N. Udaya Kumar et al.
along with cin, i.e., cout1, from the previous RCA as inputs and generates sum out as
s[3:2] and carryout as cout2. In the third group of the hybrid adder architecture, a 3-bit
Han-Carlson adder is incorporated, along with Binary to Excess-1 Converter (BEC)
and Multiplexer (MUX) units. This module takes A[6:4] and B[6:4] as inputs and
operates dynamically by selecting the carryout (cout3) and sum bits (s[6:4]) based
on the carryout bit, i.e., cout2, from the preceding module. Specifically, if cout2 is
0, the output of the Han-Carlson adder is chosen. Conversely, if the cout2 is 1, the
output of the BEC4 is selected. This conditional selection mechanism based on the
carryout status optimizes the functionality of the third group in handling inputs from
augend and addend bits 4–6. Similar to the third group, the fourth group utilizes a
4-bit Weinberger adder for the addition of A[10:7] and B[10:7], and based upon the
previous carryout, i.e., cout3, the output sum (s[10:7]) and carryout (cout4) are chosen
between the Weinberger adder and BEC5. In the fifth group, a 5-bit Ling adder is used
for the addition of A[15:11] and B[15:11], and based upon the previous carryout, i.e.,
cout4, the output sum (s[15:11]) and carryout (cout) are chosen between the Ling
adder and BEC6. Each adder type is strategically chosen to match the characteristics
and requirements of the bit range it processes, contributing to the overall efficiency
of the 16-bit hybrid adder, with the primary objective of minimizing the required
area and power consumption.
The Xilinx Vivado 2019.1 tool is utilized for the simulation and synthesis of the
proposed 16-bit hybrid adder. The simulation results, demonstrating the performance
of the hybrid adder with various input combinations, are presented in Fig. 2.
11 A Comprehensive Performance Analysis of Area and Power-Efficient … 141
The proposed 16-bit hybrid adder is tested with various input combinations,
and for each combination of inputs, the output sum and carryout are generated,
respectively, as shown in Fig. 2.
The schematic diagram of the proposed hybrid adder obtained using the Xilinx
Vivado 2019.1 tool is represented in Fig. 3, which serves as a simplified illustration
of a system or circuit employing components and their interconnections. The perfor-
mance comparison of our proposed hybrid adder is done with other adders such as
Linear Hancarlson CSLA, Linear Ling CSLA, CLA_CSLA_D-Latch [8], and CLA_
CSLA_MUX [8]. The analysis is based on area, power, and delay reports generated
using Cadence software. The area, power consumption, and delay after synthesis are
tabulated in Tables 1 and 2, which represent values for 90 and 180 nm technology,
respectively.
The performance comparison of our proposed 16-bit hybrid adder is done with
other adders such as Liner Han Carlson CSLA,16-bit Linear Ling CSLA,16-bit CLA_
CSLA_D-Latch [8], 16-bit CLA_CSLA_MUX [8]. The analysis is based on area and
power reports generated using Cadence genus tool. The area and power consumption
after synthesis are tabulated in Table 1 which represent values for 90 nm technology.
The Liner Han Carlson CSLA utilizes 4.23% more area than the proposed hybrid
adder, while the Linear Ling CSLA has 15.65% more area than the proposed adder.
x4 s0_i
S=1'b1 I0[2:0]
cin cout O[2:0] 2:0
S=default I1[2:0] s[15:0]
x[2:0] y[2:0] c7_i x1
S=1'b1 I0 S RTL_MUX
bec3
O 1:0 a[1:0] cout
S=default I1
1:0 b[1:0] s[1:0] 1:0
S RTL_MUX cin
x2
rca_g1
cin cin
a[15:0]
rca_g1
x3 c10_i
x5 S=1'b1 I0 s0_i__0
6:4 a[2:0] cout ... O
x6 S=default I1 S=1'b1 I0[3:0]
6:4 b[2:0] s[2:0] a[3:0] cout O[3:0] 3:0
b[15:0] S=default I1[3:0]
... b[3:0] s[3:0] cin cout S RTL_MUX
hancarlson_g3
x[3:0] y[3:0] S RTL_MUX
weinberger_g4
bec4
s0_i__1
x7 S=1'b1 I0[4:0]
O[4:0] 4:0
S=default I1[4:0]
... a[4:0] cout
S RTL_MUX
Table 1 Practical comparison of area, power, and delay for various adders and the proposed hybrid
adder based on 90 nm technology
Adders (90 nm) Area (µm2 ) Power (mW) Delay (ns)
Liner Han Carlson CSLA 527.99 0.0237098 1799
Linear Ling CSLA 599.465 0.0212 2235
CLA_CSLA_D-Latch [8] 965.804 0.0381 2220
CLA_CSLA_MUX [8] 772.038 0.04 2228
Proposed hybrid adder 505.609 0.0211 1975
Table 2 Practical comparison of area, power, and delay for various adders and the proposed hybrid
adder based on 180 nm technology
Adders (180 nm) Area (µm2 ) Power (mW) Delay (ns)
Liner Hancarlson CSLA 1683.158 0.0994716 2809
Linear Ling CSLA 2092.306 0.0987 3156
CLA_CSLA_D-Latch [8] 3113.51 0.01495 2996
CLA_CSLA_MUX [8] 2464.862 0.13896 2420
Proposed hybrid adder 1679.82 0.0964 3172
The CLA_CSLA_D-Latch [8] utilizes 47.64% more area than the proposed hybrid
adder, while the CLA_CSLA_MUX [8] has 34.50% more area than the proposed
adder. Similarly, the Liner Han Carlson CSLA utilizes 3.01% more power than the
proposed hybrid adder, while the Linear Ling CSLA utilizes 2.33% more power than
the proposed adder. The CLA_CSLA_D-Latch [8] utilizes 44.61% more power than
the proposed hybrid adder, while the CLA_CSLA_MUX [8] utilizes 47.25% more
power than the proposed adder.
The performance comparison of our proposed 16-bit hybrid adder is done with
other adders such as Liner Han Carlson CSLA, 16-bit Linear Ling CSLA, 16-bit
CLA_CSLA_D-Latch [8], 16-bit CLA_CSLA_MUX [8]. The analysis is based on
area and power reports generated using Cadence genus tool. The area and power
consumption after synthesis are tabulated in Table 1 which represent values for
11 A Comprehensive Performance Analysis of Area and Power-Efficient … 143
180 nm technology. The Liner Han Carlson CSLA utilizes 0.19% more area than
the proposed hybrid adder, while the Linear Ling CSLA has 19.71% more area than
the proposed adder. The CLA_CSLA_D-Latch [8] utilizes 46.04% more area than
the proposed hybrid adder, while the CLA_CSLA_MUX [8] has 31.84% more area
than the proposed adder. Similarly, the Liner Han Carlson CSLA utilizes 11% more
power than the proposed hybrid adder, while the Linear Ling CSLA utilizes 0.47%
more power than the proposed adder. The CLA_CSLA_D-Latch [8] utilizes 35.78%
more power than the proposed hybrid adder, while the CLA_CSLA_MUX [8] utilizes
32.35% more power than the proposed adder.
Figure 4 presents the comparative analysis of area (µm2 ) on 90 nm technology.
From Fig. 4, it is evident that the proposed hybrid adder occupies less area when
compared to other adders. Figure 5 puts forward the comparative analysis of power
(mW) on 90 nm technology, which displays that the hybrid adder consumes less
power in comparison with the other adders. Figure 6 represents the comparative
analysis of time delay (ns) on 90 nm technology, which conveys that the speed of
operation of other adders is high compared to hybrid adders.
144 N. Udaya Kumar et al.
4 Conclusion
In this paper, a 16-bit hybrid adder is designed using Xilinx Vivado 2019.1, which is
efficient in terms of area and power and outperforms existing alternatives. The adder
is made to work well by incorporating different sizes of various adders. The proposed
hybrid adder is compared with other adders like Linear Hancarlson CSLA, Linear
Ling CSLA, CLA_CSLA_D-Latch [8], and CLA_CSLA_MUX [8]. Performance
metrics such as area, power, and delay reports are generated using the Cadence
Genus tool in both 90 and 180 nm technologies. From the reports, the proposed hybrid
adder is 4–35% and 10–50% more efficient in terms of area and power respectively
11 A Comprehensive Performance Analysis of Area and Power-Efficient … 145
5 Future Scope
The future work can be done on reducing delay of the adder by using various reduction
techniques such as MGDI (modified gate diffusion input-only two transistors required
to realize the gates, new low power and area-efficient technique) and power can be
even more reduced by employing power rating technique at back end and further the
developed efficient adder can be employed in various DSP and AI applications.
References
1. Sarkar S, Sarkar S, Mehedi J (2018) Comparison of various adders and their VLSI. In:
International conference on computer communication and informatics
2. Lamani DS, Kiran (2022) A comparative analysis on parameters of different adder topologies.
Int Res J Eng Technol (IRJET)
3. Tilak Raju D, Sravani Kumari S (2020) Design of carry save adder with low power using
modified gate diffusion input technique. J Crit Rev 7:11
4. Bennet B, Maflin S (2015) Modified energy efficient carry save adder. In: International
conference on circuit, power, and computing technologies [ICCPCT]
5. Varshney N, Arya G (2019) Design and execution of an enhanced carry increment adder
using Han-Carlson and Kogge-stone adder technique. In: Proceedings of the third international
conference on electronics communication and aerospace technology [ICECA 2019], p 8
6. Akbari O, Kamal M, Afzali-Kusha A, Pedram M (2018) RAP-CLA: a reconfigurable
approximate carry look-ahead adder. IEEE Trans Circuits Systems II Express Briefs 65:5
7. Tapadar A, Sarkar S, Dutta A, Mehedi J (2018) Power and area aware improved SQRT carry
select adder (CSlA). In: Proceedings of the 2nd international conference on trends in electronics
and informatics (ICOEI 2018), p 7
8. Swetha S, Siva Sankara Reddy N (2023) Design of FIR filter using low-power and high-speed
carry select adder for low-power DSP applications. IETE J Res 15
9. Suganya R, Meganathan D (2015) High performance VLSI adders. In: 3rd International
conference on signal processing, communication, and networking (ICSCN), p 7
10. Gaur N, Mehra A, Kumar P (2019) 16-Bit power efficient carry select adder. In: 6th International
conference on signal processing and integrated networks (SPIN), p 4
11. Munawar M, Khan T, Rehman M, Shabbir Z, Daniel K, Sheraz A, Omer M (2020) Low power
and high speed Dadda multiplier using carry select adder with binary to excess-1 converter. In:
International conference on emerging trends in smart technologies (ICETST)
12. Gudala NA, Ytterdal T, Lee JL, Rizkalla M (2021) Implementation of high speed and low
power carry select adder with BEC. In: International midwest symposium on circuits and
systems (MWSCAS)
13. Challa Ram G, Venkata Subbarao M, Varma R, Prema Kumar M (2023) Delay enhancement
of Wallace tree multiplier with binary to excess-1 converter. In: 5th International conference
on smart systems and inventive technology (ICSSIT)
Chapter 12
Performance Driven VLSI Adder
Choices in Image Processing:
A Comparative Analysis
1 Introduction
K. Bala Sindhuri (B) · N. Udaya Kumar · Ch. Gowthami · Ch. Sree Varun ·
E. S. V. S. Surya Vaishnavi · A. K. Prathardhan · G. G. Karthik
Sagi Ramakrishnam Raju Engineering College, Bhimavaram, India
e-mail: kbsinduri@[Link]
N. Udaya Kumar
e-mail: nuk@[Link]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 147
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
148 K. Bala Sindhuri et al.
the most suitable and resource-efficient image processing architecture for specific
applications, a comparative analysis is conducted. This analysis serves to align high-
level image enhancement algorithms with low-level VLSI implementations, facili-
tating the emergence of expedited, resource-efficient solutions across various image
processing domains [12, 13].
The initial stage encompasses several sequential processes to obtain image data,
commencing with image acquisition from the source and resizing it to dimensions of
768 × 512 pixels. Subsequently, the resized image is stored in BMP format, following
this, the subsequent phase involves interfacing the image data with MATLAB, where
pixel extraction and conversion into [Link] file format occur as shown in Fig. 1. This
file is then processed by Verilog, delineating the spatial parallelism functional unit.
During this phase, diverse image enhancement algorithms such as Brightness and
Invert may be executed. Upon completion of algorithmic manipulation, the resulting
image is presented, showcasing the enhanced pixels. This system implementation
process is illustratively elucidated. The concluding step entails the practical real-
ization of these techniques, employing various adders based on their performance
metrics. Subsequently, the outputs of these adders are meticulously analyzed. These
evaluations are conducted to ascertain each adder’s impact on processing area,
power consumption, and delay with the ultimate goal of identifying and optimizing
the most suitable and resource-efficient image processing architecture for specific
applications.
3 Adders
The Ling Adder stands as a notable example of Parallel Prefix Adders within the
realm of digital arithmetic circuits. Positioned as an evolution from the traditional
Carry Look-Ahead adder, the Ling Adder is engineered to augment the speed and
efficiency of 8-bit binary addition operations. The Area, Power, and delay are very
small when compared with other Adders. Here 8-Bit ling adder is considered in this
work in order to perform Image brightness and invert operations.
The Weinberger adder is renowned for its minimal area. It utilizes the Weinberger
Recurrence algorithm for carry computation. To enhance circuit speed, the Wein-
berger adder incorporates the concept of parallel carry computation. The Area, Power,
and delay are small when compared with other Adders. Because of these proper-
ties. Its simple architecture makes it more efficient for different image enhancement
techniques.
Inverting an image involves reversing the intensity values, turning dark areas into
bright ones and vice versa. This technique can be useful in highlighting specific
features or structures in an image. Image inversion can also be employed for creative
purposes in art and design. It allows for the creation of visually striking and unconven-
tional images, adding an artistic dimension to the processing of visual content. The
12 Performance Driven VLSI Adder Choices in Image Processing … 151
The Adder architectures are modeled using Verilog HDL. For simulation and
synthesis, the Vivado 2019.1 ISIM is used. The tool ran on an INTEL Core i5
processor with a 64-bit operating system and 16 GB of RAM.
The RTL designs for both the 4-bit and 8-bit Ling Adders were developed using
Vivado 2019.1. These designs were meticulously crafted to adhere closely to the
theoretical architectures derived from computational analysis. Figures 2 and 3 illus-
trate the RTL schematics for the respective adders, ensuring fidelity to the theoretical
specifications throughout the design phase.
The RTL designs for both the 4-bit and 8-bit Weinberger Adders were implemented
using Vivado 2019.1. These designs align precisely with the theoretical architectures
derived from computational analysis. Figures 4 and 5 depict the RTL schematics for
the respective adders, ensuring fidelity to the theoretical specifications during the
design process.
Both the adders were analyzed and compared based on their performance. The total
area, power consumption, and delay after synthesis for two adders for 45, 90, and
180 nm are obtained and tabulated in Tables 1, 2, and 3 respectively. And next these
adders are used for the image enhancement operations such as Brightness addition
and invert operation. Figures 6, 7, and 8 represent the comparative analysis of Area,
power, and delay for different adders (µm2 ) in 45 nm respectively.
See Figs. 6, 7, and 8 represent the comparative analysis of Area, power, and delay
for different adders in 45nm technology.
Table 1 Performance
Adders (45 nm) Area (µm2 ) Power (mW) Delay (ns)
analysis of different adders in
45 nm Ling Adder [8] 76.123 0.0425802 1972
Weinberger Adder [9] 115.463 0.00641718 2034
Table 2 Performance
Adders (90 nm) Area (µm2 ) Power (mW) Delay (ns)
analysis of different adders in
90 nm Ling Adder [8] 157.435 0.00543504 1918
Weinberger Adder [9] 213.446 0.00771417 1472
Table 3 Performance
Adders (180 nm) Area (µm2 ) Power (mW) Delay (ns)
analysis of different adders in
180 nm Ling Adder [8] 558.835 0.0255611 2705
Weinberger Adder [9] 671.933 0.0322704 2230
154 K. Bala Sindhuri et al.
Figures 9, 10, and 11 represent the comparative analysis of Area, power, and delay
for different adders in 90 nm technology.
Figures 12, 13, and 14 represent the comparative analysis of Area, power, and delay
for different adders (µm2 ) in 180 nm technology respectively. And next these adders
are used for the image Based on the findings presented in Figs. 9, 10, and 11. It is
determined that, in comparison to the Weinberger adder, the Ling adder consumes less
12 Performance Driven VLSI Adder Choices in Image Processing … 155
area and exhibits lower power consumption. Following the implementation of image
enhancement operations, it is ascertained that there exists no concern regarding the
quality of images, as both adders yield identical outputs. Consequently, it is inferred
that the Ling adder is deemed more efficient among the two alternatives.
156 K. Bala Sindhuri et al.
Figure 15 represents the original image without applying any of the image processing
techniques such as Brightness Operation, Inverse Operation.
The resulting pixel values are checked to ensure they remain within the valid range
of 0 to 255. If a value exceeds 255, it is clipped to 255 to prevent overflow and vice
versa. The brightness operation modifies the intensity of pixels in the image to adjust
its overall brightness. Addition increases brightness, while subtraction decreases it.
By adjusting the pixel values of the R, G, and B components accordingly, the overall
brightness of the image is altered. The Addition or subtraction is done by using the
address mentioned above. Figures 16 and 17 represent the output images obtained
for Brightness operation using Ling Adder and Weinberger Adder.
158 K. Bala Sindhuri et al.
Fig. 16 Brightness
operation using Ling Adder
Fig. 17 Brightness
operation using Weinberger
Adder
12 Performance Driven VLSI Adder Choices in Image Processing … 159
5 Conclusion
and simulation processes, with the indispensable Xilinx Vivado 2019.1 ISIM tools
being relied upon. The practical implementation of prevalent point operations for
image enhancement were elucidated, with the Verilog language being employed.
Notably, Verilog’s versatility extended to file manipulation within storage environ-
ments, amplifying its utility in our research framework. The results of our investi-
gations unveiled notable enhancements in the area, Power, and Delay metrics of the
adders, underscoring their augmented energy efficiency. These insights are consid-
ered of notable significance, poised to enhance the performance and efficiency across
diverse applications within the realm of image processing.
References
1. Narula MS (2018) FPGA implementation of image enhancement using Verilog HDL. Int Res
J Eng Technol (IRJET) 5(5), e-ISSN: 2395-0056
2. Iqbal K, Salam RA, Osman A, Talib AZ (2007) Underwater image enhancement using an
integrated color model
3. Patel S (2019) Image enhancement on FPGA using Verilog. Int J Tech Innov Mod Eng Sci
(IJTIMES) Impact Factor 5(3):22 (SJIF-2017), e-ISSN: 2455-2585
4. Puneet P, Garg N (2013) Binarization techniques used for grey scale images. Int J Comput
Appl 71:8–11. [Link]
5. Menotti D, Najman L, Facon J, Araújo A (2007) Multi-histogram equalization methods for
contrast enhancement and brightness preserving. IEEE Trans Consum Electron 53:1186–1194.
[Link]
6. Sarkar S, Mehedi J, Sarkar S (2018) Comparison of various adders and their VLSI. In:
International conference on computer communication and informatics
7. Ramya AS, Mounica ACN, Ramesh Babu BSSV (2015) Performance analysis of different 8-bit
full adders. IOSR J VLSI Signal Process (IOSR-JVSP) 5
8. Tilak Raju D, Sravani Kumari S (2020) Design of carry save adder with low power using
modified gate diffusion input technique. J Crit Rev 7
9. Meganathan D, Suganya R (2015) High performance VLSI adders. In: 3rd International
conference on signal processing, communication and networking (ICSCN), p 7
10. Shetty, Saud, Serrao, Pinto R (2020) Design and implementation of 64-bit parallel prefix adder
11. Gaur N, Mehra A, Kumar P, Kallakuri S (2019) 16 Bit power efficient carry select adder. In:
6th International conference on signal processing and integrated networks (SPIN)
12. Cui X (2011) Optimized design of parallel prefix Ling adder. In: 2011 International conference
on electronics, communications and control (ICECC)
13. Thakur G (2020) FPGA-based parallel prefix speculative adder for fast computation application.
In: 2020 Sixth international conference on parallel, distributed and grid computing (PDGC)
Chapter 13
Text-to-Speech Conversion for Gujarati
Language Using Deep Learning
Technique
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 161
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
162 V. Narvani and H. Arolkar
The Gujarati script evolved from the Devanagari script. The Gujarati language can be
traced back to its ancestral roots in Sanskrit. Gujarati is utilized by a global population
of over 50 million individuals for both written and spoken communication. The
earliest known document in the Gujarati script, dating back to 1592, represents its
historical origin, while the script itself made its first appearance in print through an
advertisement in 1797 [3].
The Gujarati character set comprises a total of 75 officially recognized shapes,
encompassing 59 characters and 16 diacritics, each of which is distinct and consid-
ered legal. The 59 characters can be categorized into three groups: 36 consonants,
including 2 compound consonants and 34 singular consonants that represent embel-
lished sounds; 13 vowels, representing pure sounds; and 10 numerical digits. There
are a total of 16 diacritics, which can further be categorized into 13 vowel characters
and 3 other characters. The systematic grouping of vowels and consonants according
to their respective phonetic pronunciations determines the alphabet’s arrangement
[4].
The Gujarati script employs a syllabic writing system, wherein each character
represents a distinct syllable and is conventionally written from left to right. In
linguistic terms, the phonetic units known as consonants are referred to as “Vyanjan,”
while the phonetic units known as vowels are referred to as “Swar.” The Gujarati
language features a specific set of modifier symbols associated with each vowel.
These symbols are used to alter the pronunciation of consonants and are commonly
referred to as “maatras.” Modifiers take various forms and are affixed either at the
upper, lower-right, or lower portion of a consonant, contingent upon the specific
consonant in question. A letter is considered conjunct when it is formed by the
combination of two half-consonants. In the Gujarati script, characters are created
through the combination of consonants, vowels, and diacritics. The characters in the
Gujarati language are depicted in Table 1 [4].
3 Text-to-Speech (TTS)
ka
kha
ga
gha
cha
4 Literature Review
Kothari and Kumbharana in [2] outline the process of designing, developing, and
implementing a concatenation-based algorithm for text-to-speech synthesis in the
Gujarati language. When researchers endeavor to create a recognition system, they
need access to certain pre-existing data, such as a database, that is relevant to the
intended recognition system. The study specifically explores the utilization of a
database containing pre-recorded Gujarati phonemes in concatenative synthesis,
where these phonemes are combined to generate sound.
Kaveri and Ramesh in [5] delves into the concept of a Text-to-Speech (TTS)
synthesizer, a computer-based system that should be able to read text. Specifically
focusing on Indian languages, the paper provides a comprehensive explanation of a
single text-to-speech (TTS) system tailored for Hindi, with the purpose of generating
speech. Typically, this technique consists of two stages: text processing and speech
creation. A Java Swings graphical user interface has been developed to translate
Hindi text into voice. India is a linguistically diverse country with several spoken
languages, each serving as the primary language for tens of millions of individuals.
The paper also acknowledges the significant dissimilarities in languages and scripts
164 V. Narvani and H. Arolkar
while highlighting the substantial similarities in grammar and alphabet words across
Indian languages.
Diwakar et al. in [6] discuss the ongoing research of the authors on a Sanskrit
text-to-speech (TTS) system named ‘Samvachak’ at the Special Centre for Sanskrit
Studies, JNU. Currently, there is no existing text-to-speech (TTS) system specifically
designed for the Sanskrit language. Upon examining the relevant study, the report
centers on the advancement of several components of the text-to-speech (TTS) system
and highlights potential challenges in its development. The research for the TTS may
be categorized into two groups: TTS independent language study and TTS-related
research and development (R&D).
Sajini and Neetha in [7] investigate the utilization of speaker adaptation tech-
nology in Hidden Markov Model-based Text-to-Speech (HTS) for introducing
speaker variability in Malayalam TTS. Speaker adaptation (SA) has been success-
fully implemented within the HTS framework for foreign languages like English
and Japanese. However, it has not yet been experimented with for Indian languages.
The aim of this study is to employ the HTS framework to apply Speaker Adaptation
(SA) as a means of achieving a wider range of voices while simultaneously reducing
the costs, time, and effort typically associated with generating a new or modified
Text-to-Speech (TTS) voice. The study employs constrained maximum likelihood
linear regression (CMLLR) and maximal posterior probability (MAP) method to
make different vocal variations. The database for sentiment analysis includes five
speakers, each contributing one hour of speech. Within this database, a subset of
four speakers is employed to train a speaker-independent average model (SI). The
SI model underwent training with varying numbers of speakers. The average model,
equipped with three speakers, produced a comprehensible but noisy output. However,
when using four speakers, the model produced a comprehensible output of high
quality and similarity, with only occasional distortions.
Femina and Jayakumari in [8] discuss that the text-to-speech (TTS) technology
may be used in several languages, including those that have limited usage. Text-to-
speech (TTS) systems produce spoken representations based on textual input. While
the process of generating speech is somewhat intricate, the significant difficulty in
text-to-speech (TTS) is achieving a natural and authentic expression from the speaker.
This study presents a highly accurate and efficient text-to-speech (TTS) conversion
system for the Tamil language. This research study focuses on the development of a
deep learning approach called Deep Quality Speech Recognition (DQSR) specifically
tailored for Tamil language text-to-speech (TTS) applications.
13 Text-to-Speech Conversion for Gujarati Language Using Deep Learning … 165
5 Methodology
The pre-processing step of the text-to-speech system is when two essential tech-
niques, text analysis and text normalization, come into action. When dealing with
Gujarati phrases, it is important to note that many terms can be found in a basic dictio-
nary. As a result, the terminology included within the dataset is used to construct this
dictionary, ensuring the correctness and relevance of the linguistic elements used in
the process [9].
In the pre-processing module, a significant role is played by the translation of
abbreviations, numerical values, and acronyms into their corresponding comprehen-
sive textual representations. The purpose of this transformation is to guarantee that
the synthesized speech adheres to the desired language rules while preserving clarity
and comprehensibility. Additionally, pre-processing contributes to enhancing the
overall language analysis and synthesis process by performing the task of efficiently
segmenting incoming texts into word clusters [10].
The text analysis component is responsible for preparing the incoming text by
analyzing and organizing it into a manageable list of terms. The system incorpo-
rates numerical values, shortened forms, initialisms, and idiomatic expressions and
converts them into complete written language as necessary. One notable challenge at
the character level is the ambiguity of punctuation marks, particularly in determining
the conclusion of sentences. This issue can be resolved to some extent using basic
regular grammar [11].
Text normalization refers to the process of converting text into a form that may
be easily spoken. Prior to any text processing, it is common practice to undertake
text normalization, which involves standardizing the text. This is typically done in
preparation for tasks such as generating synthesized voices or automated language
translation. The primary goal of this technique is to recognize punctuation marks and
intervals between words. The text normalization procedure often involves removing
punctuations, accent marks, stop words (commonly used words), and other diacritics
from letters [11].
During this phase, an examination of the document’s structure is conducted.
This procedure operates at two levels: sentence-level tokenization, which involves
dividing text into individual sentences, and word-level tokenization, which involves
breaking down text into individual words. Identifying the boundaries of a sentence in
166 V. Narvani and H. Arolkar
sentence tokenization can be a challenging task. For instance, it is not required that
the demarcation of sentence boundaries be only determined by the use of periods;
alternative punctuation marks such as colons or double quotation marks may also
serve this purpose. Word tokenization, on the other hand, involves the process of
converting non-standard words into their standardized form [12–14].
Speech synthesis refers to the process of converting written text into audible
speech through the generation of corresponding waveforms, which encompasses
three distinct methodologies, namely articulatory synthesis, formant synthesis, and
concatenative synthesis [15, 16].
Formant speech synthesis relies on defining the resonance frequencies of the vocal
tract through a set of criteria. The formant method uses the source-filter voice
synthesis paradigm. The goal is to create source signals that are both periodic and
non-periodic, and then send them through a filter or resonator circuit that emulates the
characteristics of the vocal tract. Although rule-based formant synthesis is capable
of producing high-quality speech, the difficulties in precisely estimating the vocal
tract models and source variables may cause the voice to sound unnatural. The funda-
mental frequency, the relative intensities of the voiced and unvoiced source indicators,
and the degree of voicing are typically the lowest number of modifiable parameters.
13 Text-to-Speech Conversion for Gujarati Language Using Deep Learning … 167
During each phoneme, the settings controlling the source signal and the vocal tract
filter’s frequency response are modified. The resonators can be connected in either a
parallel or cascade pattern to create the vocal tract model [17].
In this section, we are proposing a speech synthesis system which operates through
the subsequent stages. A detailed description of the stages as per the flow diagram
is shown in Fig. 1.
1. Load Library and Declare Variable: A load library contains files prepared to
be executed and a variable is generated when some word is assigned to it. The
text assigned to a variable determines the data type of that variable.
2. Load Speech and Text Dataset: Datasets have the capability to be loaded from
local files that are saved on the user’s PC.
3. Divide data in Training and Testing: The concept of the data divide refers
to the existing disparity between individuals or entities possessing the necessary
capabilities to effectively collect, store, mine, and manage vast quantities of data,
and those individuals or entities from whom the data is being collected.
4. Apply MFCC and Log Filter for feature extraction: The Mel frequency
cepstral coefficients (MFCCs) are a representation of a sound’s short-term power
spectrum. It is a feature extraction approach for speech and audio analysis. This
converts raw audio signals into a compact representation that captures significant
frequency and temporal information. Additionally, a log filter is a useful tool for
real-time processing of very large log files.
5. Apply Feature Extraction: The process of removing various characteristics
from a speech signal, such as pitch, vocal tract anatomy, and power, is known
as feature extraction. Parameter transformation is the subsequent process of
converting these features into signal parameters by means of differentiation and
concatenation.
6. Apply Sequential Model (CNN): Convolution Neural Network (CNN) was
used to extract and train the speech characteristics. CNN was used as the
most sophisticated deep learning approach, and its performance for voice signal
identification as a multiclass classification process was explored. CNN-based
models are trained to detect patterns in text, such as important words, automati-
cally. For feature extraction, the majority of CNN-based models employ a one-
dimensional (1-D) convolution process, succeeded by an average or maximum
one-dimensional pooling procedure.
7. SoftMax Classifier on Test Data: Employ a SoftMax classifier by assigning a
label to a middle word and concatenating all word vectors surrounding it.
8. Classify Audio based on Confusion Matrix: Utilize a confusion matrix to clas-
sify audio, a matrix designed to evaluate the performance of classification models
with a given set of test data.
9. Analysis: Perform a detailed analysis of the results and conclude the process.
The diagram in Fig. 1 illustrates the fundamental stages involved in the compu-
tational procedure for deriving cepstral coefficients from a speech signal. These
procedures are outlined below.
• Prioritize the input signal.
• Conduct a Fourier brief interval analysis to get a magnitude spectrum.
• Construct a Mel-spectrum from the magnitude spectrum.
• Use the power spectrum’s log operation (also known as the Square of the Mel-
spectrum).
13 Text-to-Speech Conversion for Gujarati Language Using Deep Learning … 171
Utilizing the log-Mel power spectrum, apply the Discrete Cosine Transform
(DCT) to extract Cepstral features and execute Cepstral.
Step 1: Pre-Emphasis: This step involves passing the isolated word sample
through a filter that amplifies higher frequencies. It will amplify the signal’s energy
at higher frequencies.
Step 2: Framing: The voice stream is segmented into frames, each consisting of
brief durations lasting 20–3 ms. The voice stream is divided into N samples, with
adjacent frames distinguished by M (where M is less than N). Commonly, standard
values for M are 100, and for N, they are set to 256. Speech framing becomes neces-
sary due to the temporal variability of the signal. Nevertheless, when the signal is
analyzed over a sufficiently short period, its characteristics remain relatively constant.
Consequently, a brief spectrum analysis is conducted.
Step 3: Each frame mentioned above is multiplied by a Hamming window to
maintain signal continuity. To mitigate this discontinuity, a window function is
employed, gradually decreasing the amplitude of the voice sample to zero at the
frame’s beginning and end as shown in Eq. (1). This helps reduce spectral distortion.
frequency domain into a time-like domain known as the quefrency domain, gener-
ating Mel-scale cepstral coefficients. While MFCC alone is suitable for speech recog-
nition, incorporating log energy and performing delta operations can enhance overall
performance.
Step 7: Log Energy: We can also calculate the energy within a frame, which can
be another feature for MFCC. We can enhance the feature set by calculating the time
derivatives of (energy + MFCC), which provide velocity and acceleration in Eq. (4).
Cm(t) = = −M Cm(t + τ )τ
M
= −M M τ 2 (4)
τ τ
We have created a dataset for audio and text with sample data as shown in Fig. 3. The
system in the suggested experiment is trained using recorded wav files that contain
the numbers 1 through 9 from nine different speakers.
Next step is to showcase results or sounds in wav format using the Python wave
reader as shown in Figs. 4 and 5.
Figure 6 shows a confusion matrix in which we can see there are nine different
classes that are classified accuracy based on detection rate. A confusion matrix is
a tabular representation that provides a succinct summary of a machine learning
model’s performance on a specific dataset used for testing purposes. It is a method
of visually representing the number of correct and incorrect occurrences determined
by the model’s predictions.
Figure 7 displays a classification output after selecting and classifying data,
choosing the most appropriate class.
Table 3 depicts the accuracy and loss ratio of 9 different classes after applying
testing input to the training dataset. Figure 8 shows the evaluation results.
After successfully applying and completing the epoch, it will show accuracy
and other resulting parameters. While running the code, it updates and provides
a cumulative response. After applying the suggested model, Table 4 displays an
accuracy ranging from 1 to 9 digits.
13 Text-to-Speech Conversion for Gujarati Language Using Deep Learning … 175
8 Conclusion
References
10. Onaolapo JO, Idachaba FE, Badejo J, Odu T, Adu OI (2014) A simplified overview of text-to-
speech synthesis. In: Proceedings of the world congress on engineering, vol 1. ISSN: 2078-0958
11. Sasirekha D, Chandra E (2012) Text to speech: a simple tutorial. IJSCE
12. Slobodan B, Sanda M Text normalization for Croatian speech synthesis
13. Pooja MR, Manoj C (2019) Text normalization and its role in speech synthesis. In: IJEAT
14. Pooja MR, Manoj C (2019) An experimental technique on text normalization and its role in
speech synthesis. In: IJITEE
15. Yin Z (2018) An overview of speech synthesis technology. IEEE
16. Xu T, Tao Q, Frank S, Tie-Yan L (2021) A survey on neural speech synthesis. ISCSLP
17. Helal UM (2015) A comparative study of different text-to-speech synthesis techniques. IJSER
18. Kothari JJ, Kumbharana CK (2015) A phonetic study for constructing a database of Gujarati
characters for speech synthesis of Gujarati text. Int J Comput Appl 117(19)
19. Fahima K, Farha AM, Nadia AR, Aloke KS, Muhammad FM (2022) Text to speech synthesis:
a systematic review, deep learning based architecture and future research direction. J Adv Inf
Technol
20. Yishuang N, Sheng H, Zhiyong W, Chunxiao X, Liang-Jie Z (2019) A review of deep learning
based speech synthesis. MDPI
21. Pooja P, Miral P (2017) Feature extraction of isolated Gujarati digits with Mel frequency
cepstral coefficients (MFCCs). Int J Comput Appl
22. Dani PP, Deole MS (2017) Improvement of accuracy using MFCC speech recognition. IJARIIE
Chapter 14
A Comprehensive Bibliometric Study
on AI-Guided Breast Cancer Diagnosis
and Prognosis Investigating Web
of Science and Scopus from 2016 to 2023
1 Introduction
For more than a decade, the prevalence of breast cancer has been steadily increasing. It
is one of the leading cancer-related causes of death in women. Although there has been
significant progress in breast cancer diagnosis methods, the quest for early detection
remains a work in progress [1]. Numerous critical studies and clinical trials have
significantly improved breast cancer prognosis, but several others remain unknown
to clinical maturity, implying the need for this bibliometric analysis [2]. In the past,
a few studies linked the most cited papers of the time in several domains under the
umbrella term of breast cancer, which motivated researchers to conduct such studies
with ease. This identification is critical because clinicians base their opinions on the
substantiation and significance of these studies. The most significant component of
research methodology is linked to higher citation counts and a significant impact on
the journal of publication. These parameters were the main focus of this study, along
with several other crucial factors for a more thorough analysis [3].
The main contribution of this study, compared to existing papers, lies in its compre-
hensive bibliometric analysis of AI-based breast cancer detection research from
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 179
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
180 E. Bhatti et al.
2 Literature Review
The above figure (see Fig. 1) gives the process of selecting the papers for bibliometric
analysis. The topic used, accessibility, query prompt, record extraction, and process
of bibliometric analysis are presented in this figure.
(A) Datasets: The study obtained its dataset through a comprehensive query-
based search on Scopus and Web of Science, including Science Citation Index
Expanded journals.
Table 1 Comprehensive literature analysis
Article Description of article No. of papers Source Method used Bibliometric parameters Research gaps identified
referenced analyzed/period database analyzed
under
consideration
[7] Focus on the study of 1059 Web of Excel, Co-authorship of countries, TFEB’s role in breast
TFEB which is pivotal in Science VOSviewer, institutes, and authors, cancer’s molecular pathways
the research of and Citespace co-citation of cited authors and needs additional study
neurodegenerative references as well as keyword
diseases and tumors clusters
including breast cancer
[8] Focused on aging 100 Web of VOSviewer and Publication year, authorship, Need for research on novel
research Science Histcite publication type, keywords, interventions targeting
journal name, institution, age-related factors in breast
country cancer prevention and
treatment
[9] Deep learning on breast 2014–2021 Scopus VOSviewer and The yearly trends in Not much clinical translation
cancer image Bibliometrix publications, the networks of and real-world use of deep
classification co-authors, countries, and learning models for breast
scientific journals, as well as cancer diagnosis and
the networks of authors’ prognosis
keywords that appear together
[10] Male breast cancer 100 Web of – Parameters such as the subject, Potential gaps in awareness,
14 A Comprehensive Bibliometric Study on AI-Guided Breast Cancer …
Article Description of article No. of papers Source Method used Bibliometric parameters Research gaps identified
referenced analyzed/period database analyzed
under
consideration
[11] Automated diagnosis of 1216 Web of – Dissects papers to unravel Possible limitations in
various diseases with the Science and different insights such as the knowing how automated
help of machine Scopus most prolific authors, diagnosis affects healthcare
learning-based countries, and organizations as delivery and patient
techniques well as articles cited with the outcomes
highest frequencies
[12] Literature concentrates 20 years Scopus Biblioshiny, Examines the increase in Potential gaps in the
on the computer-aided VOSviewer, publications and sources, the translation of research
detection of cancers and Word author’s wealth, keyword findings into clinical practice
based on medical Cloud research, the number of times and patient care
imaging an article is cited, and other
factors
[13] Breast cancer studies – – – The h-index, impact factor, Potential weaknesses in the
specific to Pakistan journal quartile, and the Pakistani population’s
number of publications and customized breast cancer
citations are some of the prevention, screening, and
factors that have been used treatment programs
[14] Breast cancer care, – Scopus Biblioshiny, There are details about the top Possible omissions from our
helping to narrow down VOSviewer journals, articles that get the knowledge of differences in
the treatment range for highest citations, most outcomes and access to care
patients and increasing important authors, and research between various
patient survivability centers that do this kind of socioeconomic and
work. On top of that, there is demographic groups
authorship co-occurrence
analysis and keyword analysis
(continued)
E. Bhatti et al.
Table 1 (continued)
Article Description of article No. of papers Source Method used Bibliometric parameters Research gaps identified
referenced analyzed/period database analyzed
under
consideration
[15] Breast cancer studies in 2734, from 2009 Web of CitespaceII Reveals some basic insights, Significant exclusions from
the domain of nursing to 2018 Science e.g., the year when the an understanding of how
publications under scrutiny nursing care affects treatment
peaked compliance and long-term
results in breast cancer
survivors
[16] Advocates the use of 2928, from the PubMed Look at the country, the first Possibilities for
machine learning for the year 2015 to author, the journal, the shortcomings in addressing
detection of breast cancer 2019 institutional collaborations, issues with algorithmic bias,
and the number of times author data privacy, and the use of
keywords appear together machine learning
technologies by clinicians
[17] Breast cancer stem cells – Scopus and Citespace Clustering results indicate that Known gaps in the
the web of breast cancer was identified to therapeutic applications of
science be the most researched and stem cells for breast cancer
heavily cited topic treatment derived from
preclinical research
[18] Concentration on From 1976 to Web of VOSviewer and Parameters analyzed include Limitations in knowing
14 A Comprehensive Bibliometric Study on AI-Guided Breast Cancer …
integrative and 2017 Science SciMAT production trends, country patient preferences, safety,
complementary oncology collaboration, and leading and efficacy of integrative
research topics therapy in breast cancer
treatment
(continued)
183
Table 1 (continued)
184
Article Description of article No. of papers Source Method used Bibliometric parameters Research gaps identified
referenced analyzed/period database analyzed
under
consideration
[19] Triple-negative breast 1932, From Scopus A mix of Citations and productivity The challenges in identifying
cancer and 2012 to 2017 VOSviewer, the mechanisms of action and
Nanotechnology STATA, Excel, potential toxicity of
and R-tools nanotechnology-based
therapeutics in breast cancer
patients
[20] Breast cancer From the years – R tool with a – Need for clarification on the
2007 to 2017 bibliometric specific aspects of breast
analysis cancer research addressed in
package the study
[21] Perspective of Indian 40 years Scopus – Performance analysis Challenges in leveraging
Breast Cancer Research employing institutions, indigenous knowledge and
journals, authors, and their resources for addressing
citation impact with the Hirsch breast cancer disparities in
Index India
[22] Breast cancer from the 3529, From Scopus Lotka’s law and Collaborative authorship Potential limitations in
Indian perspective 2005 to 2014 Bradford’s law addressing regional
differences in breast cancer
incidence, biology, and
healthcare access in India
(continued)
E. Bhatti et al.
Table 1 (continued)
Article Description of article No. of papers Source Method used Bibliometric parameters Research gaps identified
referenced analyzed/period database analyzed
under
consideration
[23] Breast cancer radiation – Scopus R-Studio with Growth trends, author There may be areas where
therapy: a bibliometric bibliometrix R productivity, relevant journals, our understanding of patient
analysis of the scientific package themes analysis (niche, preferences and
literature emerging, hot, essential) decision-making regarding
radiation therapy options for
breast cancer treatment is
incomplete
[24] Evolution of research ––- ––- Bibliometrix (R Trends in literature studies Possible deficiencies in
trends in artificial package) published, productivity of dealing with regulatory and
intelligence for breast countries, relevant authors, ethical issues in the
cancer diagnosis and affiliated institutions, leading implementation of AI
prognosis over the past journals, and emerging topics technologies for breast
two decades: A cancer diagnosis
bibliometric analysis
[25] A Comprehensive 985, 2019–2023 Web of Bibliometrix (R Comprehensive analysis of Significant limitations in
Bibliometric Analysis of Science Core package) literature on breast cancer comprehending the influence
Deep Learning Collection segmentation using deep of deep learning-based
Techniques for Breast learning techniques segmentation on treatment
Cancer Segmentation: planning and patient results
14 A Comprehensive Bibliometric Study on AI-Guided Breast Cancer …
(B) Search and Extraction: This study employed basic and advanced queries in
WOS and Scopus to find breast cancer papers from 2016 to 2023 [26] (see
Fig. 2).
output. Each bar represents a single year, with 2023 exhibiting the highest number
of publications among the depicted years.
Country-wise Productivity Analysis: The bar graph shows 10 nations’ research
production via Web of Science and Scopus papers and partnership types (see Fig. 4).
China is the biggest donor, sending 40 documents to Scopus and 40 to Web of
Science. The US and India produce significant research in both databases, trailing
behind. The image includes MCP (Multiple Collaboration Pattern) and SCP (Subject
Collaboration Pattern), which shed light on research partnerships. Malaysia, which
embodies SCP, and the Netherlands, which represents MCP, serve as prime examples
of the varied partnership patterns that exist across nations. The collaborative efforts
in Malaysia are primarily subject-oriented, whereas the Netherlands invests heavily
on research, as shown by its diversified collaboration patterns.
The narrative summarizes international collaboration and comparative study
results. The image highlights nations with similar research pathways and unique
collaboration preferences. Policymakers, researchers, and institutions seeking to
understand global research, foster international cooperation, and effectively manage
resources will find the material invaluable. The map provides a global overview of
research, emphasizing the quantity and quality of current academic research.
Author Productivity Analysis: Based on Scopus and Web of Science submissions
and journal publications, the bar graph displays the most productive writers (see
Fig. 5). Famous authors include CHANG, whose 15 Scopus publications influenced
academia. CHUN, CHDI MS, CHEN Y, CHEN J, CHANG JS, ARYA I, LIUZY,
CHEN C, and WANG Y have two Web of Science papers. The graphic depicts the
author’s output and publishing locations. Every Web of Science author is linked to
188 E. Bhatti et al.
this journal. Author CHANG distinguishes himself with many Scopus publications
without naming the journal.
It shows magazine authorship and publication concentration. This information
can help readers, institutions, and research evaluators understand writers’ scholarly
impact and different academic contributions. These visual representations also show
the complexity of scholarly activity by identifying hardworking scholars and their
chosen academic periodicals.
Annual Scientific Production: The line chart depicts the annual scientific index from
2016 to 2023 for Web of Science (WOS) and Science Citation Index (SCI). WOS
index fluctuates initially, peaking in 2017, while SCI index remains relatively stable.
Both indices converge and show similar trends from 2020 onwards, with WOS
surpassing SCI in 2023 (see Fig. 6).
The Co-citation Network: This network visually represents the connectivity between
Web of Science (WOS) and Scopus papers, indicating shared citations. It showcases
the intellectual interdependence and collaborative nature in scholarly research. For
example, it has been observed that papers similar to authors “Beniogo 2009,” “Xie
2017,” and “Wang 2014” had been cited collectively with the best frequency in WOS
while Bray 2018, Sudharshan 2019, Arrival 2018, etc., are highest among all in the
co-citation of Scopus.
Historical Direct Citation Network: The analysis of past direct mutual citations
reveals a low interlink density and transitivity, indicating limited correlation. The
most productive year was 2017, with deviations observed in 2023(see Fig. 7).
Keyword Co-occurrences: For 330 WOS database papers, bibliometric analysis
analyzes keyword occurrence. Max keywords are displayed in max size. R calls this
a word cloud. The word classification has been the most common keyword. Other
frequent keyword occurrences include breast cancer segmentation. Keyword evalu-
ation from the SCOPUS database shows that categorization is most often utilized.
The survival keyword is also used. The word cloud retrieved after this evaluation
shows that researchers prefer breast cancer classification.
190 E. Bhatti et al.
Conceptual Structure Map: Next, plot the conceptual structural map (see Fig. 8). A
conceptual structural map links keywords to the paper’s major analysis. Classification
precedes diagnosis in most WOS articles. Most SCOPUS publications establish an
identical structural map from classification to diagnosis.
Topic Dendrogram: The topic dendrograms reveal that segmentation, classification,
and diagnosis keywords are most commonly used in all the 801 papers obtained from
both databases (see Fig. 9). This means that papers related to these topics are most
commonly accepted in the Web of Science as well as Scopus directories.
Factorial Maps of Most Cited Documents: Factorial mapping highlights the distinc-
tiveness of breast cancer research. Most focused on disease detection, with image-
based datasets being popular. Citations indicate 2020 cited 2017 publications, show-
casing advancements in AI methods [27] (see Fig. 10). Thus, breast cancer research
may continue with better methods.
5 Research Gaps
The study finds many gaps in AI-guided breast cancer detection and prognosis liter-
ature. First, there are few studies on the ethical and social effects of AI breast
cancer diagnosis. The ethics of patient privacy, consent, and AI algorithm biases
must be examined. Second, the survey highlights China’s scientific output but also
laments a lack of international engagement. Addressing this gap could boost inter-
national cooperation and breast cancer AI applications. The study also emphasizes
192 E. Bhatti et al.
6 Conclusion
References
1. Sun YS, Zhao Z, Yang ZN, Xu F, Lu HJ, Zhu ZY, Shi W, Jiang J, Yao PP, Zhu HP (2017) Risk
factors and preventions of breast cancer. Int J Biol Sci 13:1387. [Link]
21635
2. Waks AG, Winer EP (2019) Breast cancer treatment: a review. JAMA 321:288–300. https://
[Link]/10.1001/JAMA.2018.19323
3. Key TJ, Verkasalo PK, Banks E (2001) Epidemiology of breast cancer. Lancet Oncol 2:133–
140. [Link]
4. Lo PK, Sukumar S (2008) Epigenomics and breast cancer. Pharmacogenomics 9:1879–1902.
[Link]
5. Polyak K (2007) Breast cancer: origins and evolution. J Clin Invest 117:3155–3163. https://
[Link]/10.1172/JCI33295
6. Fan L, Strasser-Weippl K, Li JJ, St Louis J, Finkelstein DM, Yu KD, Chen WQ, Shao ZM,
Goss PE (2014) Breast cancer in China. Lancet Oncol 15. [Link]
5(13)70567-9
7. Zhou R, Lin X, Liu D, Li Z, Zeng J, Lin X, Liang X (2022) Research hotspots and trends
analysis of TFEB: a bibliometric and scientometric analysis. Front Mol Neurosci 15:854954.
[Link]
14 A Comprehensive Bibliometric Study on AI-Guided Breast Cancer … 193
8. Haroon, Li Y-X, Ye C-X, Ahmad T, Khan M, Shah I, Su X-H, Xing L-X (2022) The 100 most
cited publications in aging research: a bibliometric analysis. Electron J Gen Med 19. https://
[Link]/10.29333/ejgm/11413
9. Khairi SSM, Bakar MAA, Alias MA, Bakar SA, Liong CY, Rosli N, Farid M (2022) Deep
learning on histopathology images for breast cancer classification: a bibliometric analysis.
Healthcare (Switzerland) 10. [Link]
10. Kwok HT, Van M, Fan KS, Chan J (2022) Top 100 cited articles in male breast cancer: a
bibliometric analysis. [Link]
11. Ahsan MM, Luna SA, Siddique Z (2022) Machine-learning-based disease diagnosis: a
comprehensive review. [Link]
12. Kore M, Naik DN, Chaudhari D (2021) A bibliometric approach to track research trends in
computer-aided early detection of cancer using biomedical imaging techniques. J Scientometric
Res 10:318–327. [Link]
13. Ahmad S, Ur Rehman S, Iqbal A, Farooq RK, Shahid A, Ullah MI (2021) Breast cancer
research in Pakistan: a bibliometric analysis. Sage Open 11. [Link]
40211046934
14. Joshi SA, Bongale AM, Bongale A (2021) Breast cancer detection from histopathology images
using machine learning techniques: a bibliometric analysis. Libr Philos Pract 2021
15. Özen Çınar İ (2020) Bibliometric analysis of breast cancer research in the period 2009–2018.
Int J Nurs Pract 26. [Link]
16. Salod Z, Singh Y (2020) A five-year (2015 to 2019) analysis of studies focused on breast
cancer prediction using machine learning: a systematic review and bibliometric analysis. J
Public Health Res 9. [Link]
17. Liu J, Qiu XC (2020) Research hotspots and trends of breast cancer stem cells. J Shanghai
Jiaotong Univ (Med Sci) 40:881. [Link]
18. Moral-Munoz JA, Carballo-Costa L, Herrera-Viedma E, Cobo MJ (2019) Production trends,
collaboration, and main topics of the integrative and complementary oncology research area:
a bibliometric analysis. Integr Cancer Ther 18. [Link]
ASSET/IMAGES/LARGE/10.1177_1534735419846401-[Link]
19. Handerson R, Teles G, Fernando H, Márcia M, Cominetti R, Moralles HF, Cominetti MR
(2018) Global trends in nanomedicine research on triple-negative breast cancer: a bibliometric
analysis. Int J Nanomed 13:2321–2336. [Link]
20. Vakharia PP, Kakish D, Tadros R, Riutta J (2017) Bibliometric analysis of breast cancer-related
lymphoedema research published from 2007–2016. J Lymphoedema 12
21. Ram S (2017) Indian contribution to breast cancer research: a bibliometric analysis. Ann Libr
Inf Stud 64
22. Singh N, Handa TS, Kumar D, Singh G (2013) Mapping of breast cancer research in India: a
bibliometric analysis. Curr Sci 110
23. Lin L, Liang L, Wang M, Huang R, Gong M, Song G, Hao T (2023) A bibliometric analysis of
worldwide cancer research using machine learning methods. [Link]
24. Franco P, De Felice F, Jagsi R, Nader Marta G, Kaidar-Person O, Gabrys D, Kim K, Ramiah D,
Meattini I, Poortmans P (2023) Breast cancer radiation therapy: a bibliometric analysis of the
scientific literature. Clin Transl Radiat Oncol 39. [Link]
25. Zaman S (2023) A comprehensive review: bibliometric analysis of decision tree-based
approaches for breast cancer prediction
26. Cardoso F, Kyriakides S, Ohno S, Penault-Llorca F, Poortmans P, Rubio IT, Zackrisson S,
Senkus E (2019) Early breast cancer: ESMO clinical practice guidelines for diagnosis, treatment
and follow-up. Ann Oncol 30:1194–1220. [Link]
27. Prerita, Sindhwani N, Rana A, Chaudhary A (2021) Breast cancer detection using machine
learning algorithms. In: 2021 9th International conference on reliability, Infocom technologies
and optimization (trends and future directions), ICRITO 2021. [Link]
51393.2021.9596295
28. Akram M, Iqbal M, Daniyal M, Khan AU (2017) Awareness and current knowledge of breast
cancer. Biol Res 50. [Link]
Chapter 15
Predictive Modeling of Cardiovascular
Disease Using Feedforward Neural
Networks
1 Introduction
CVD is a leading cause of morbidity and mortality and poses a major public health
burden with devastating costs in terms of human suffering, disability, and deaths.
The accurate and timely diagnosis of CVD is a critical need, which allows for the
delivery of interventions that not only reduce the burden of treatment on patients
but also maximize clinical outcomes. The increasing availability of healthcare data,
along with breakthroughs in deep learning models, has motivated many researchers
to utilize these resources for the purpose of CVD prediction and diagnosis.
The following paper significantly extends the existing efforts in the field of predic-
tive healthcare by providing a thorough assessment of the performance of the ML
and DL models in the context of cardiovascular diseases’ prediction.
The case with the FNNs organized under deep learning schema holds immense
potential for identifying and even predicting CVDs more effectively and efficiently,
which in turn promises to bring a shift in the healthcare management model towards
a more precise and targeted one. Therefore, constant striving for improvement of the
mentioned area through new studies and innovations in the sphere can be essential for
the healthcare community to better approach the complex issues caused by CVDs,
and thus, improve patients’ quality of life and minimize the social costs of diseases.
In summary, this research is useful in addressing a current gap in the literature in
a comparison of machine learning and deep learning models for predicting CVD. In
addition, this study has helped to progress the field by demonstrating that FNNs with
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 195
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
196 D. Joshi et al.
dropout regularization yield higher accuracy in predicting the outcome. This paper
provides the basis for the next research projects to enhance the methodologies used
in predicting risk of CVD in an effort to provide health solutions that will prevent
further occurrences of the disease.
The relevance of innovation discipline in the concurrent paradigm of managing
the emergence and interplay of the challenges of public health is manifested by the
need to incorporate the use of innovation technologies and methodologies for health
systems to evolve and apply efficiencies.
2 Literature Review
Over the past few years the number of works that use deep learning models for
the CVD prediction has increased, they include different architectures and method-
ologies to improve prediction and explain more complex patterns behind CVD
occurrence and development.
In a related study, Garcia-Ordas et al. (2023) have applied a novel model, specifi-
cally the CNN with a sparse autoencoder network to 11 crucial cardiovascular health
parameters dataset. Their results, which reported an approximate 90% accuracy of the
conventional neural network approach to deep learning, point to further proof of the
effectiveness of deep learning in CVD prognosis and areas that need enhancements
of the analysis of cardiovascular health predictive models [1].
Subramani et al. (2023) provided a worthy contribution employing multiple
machine learning models of which deep learning models were incorporated to iden-
tify a set of features capturing essential aspects of cardiovascular health. The effec-
tiveness of their test results stands at 96% accuracy levels in CVD prediction with
the use of machine learning and deep learning and they identify areas of research in
the advancement of the techniques in CVD prediction [2].
Bhatt et al. (2023) have put forward an approach, which utilizes machine learning
algorithms including decision tree, XGBoost and random forest, and multilayer
perceptron to classify a real-world dataset with 70,000 instances imported from
Kaggle; elements of cardiovascular health being utilized. They included their find-
ings, and their ability to accurately diagnose up to 87% of the offenders showcased
their research paper skills [3].
In the study by Ahmad et al., the authors combined CNN and Bi-LSTM to perform
CVD prediction relying on data with a 94.5% accuracy rate. Their work goes a long
way in supporting hybrid models for the career advancement of cardiovascular health
research to facilitate the discovery of better enhancing predictive models [4].
Following the same idea, Mehmood et al. (2021) also prove that deep learning
solutions are feasible by using the CNN method in predicting CVD and achieving
an astonishing accuracy of 97%. As findings, their paper highlighted the importance
of applying sophisticated machine learning techniques especially in cardiovascular
diseases especially in the making of better and more accurate models for predictions
15 Predictive Modeling of Cardiovascular Disease Using Feedforward … 197
that would open up more opportunities for further research and analysis of deeper
level learning systems [5].
Sharma and Parmar proffered a distinct deep learning framework for CVD prog-
nostication, dwelling upon an optimized deep neural network utilizing the tool, Talos
in 2020. Using a dataset of 14 crucial features associated with cardiovascular disease,
they offered promising results when developing the basics of an efficient predictive
model of overall cardiovascular health [6].
Bharti et al. (2020) proposed a novel framework combining machine learning and
deep learning approaches. Their assessment on a dataset from University of Cali-
fornia, Irvine (UCI), which contains 14 important features relative to cardiovascular
disease, demonstrated a fairly impressive accuracy of 94.2%. This study stresses how
important the application of the hybrid machine learning method is in the context of
furthering knowledge on cardiovascular health [7].
Pasha et al. (2020) developed a new framework based on deep learning in which
the best accuracy rate of 85% of cardiovascular disease prediction was obtained by
ANNs. Their input is significant in the development of the existing literature that
supports the use of deep learning approaches to accurately estimate cardiovascular
health prognoses [8].
Altogether, these works proved that interest to apply deep learning approaches in
cardiovascular disease prediction raises.
3 Methodology
Random Forest. Random Forest, a decision tree-based learning technique that uses
set data samples for assembly learning, has gained tremendous ground in dealing
with both classification and regression projects. In the training process, it generates
many decision trees and then boils down all their forecasts. For classification, to
generate the final class label prediction, it uses the mode of the output probability
distribution, while for regression, it uses the mean of the predicted values to give the
final output value.
XGBoost with random forest as the base estimator. XGBoost stands for Extreme
Gradient Boosting and is another successful gradient boosting technique used
primarily for classification and regression. It works by building a large number of
weak learners, mainly such decision trees in a successive manner with the intention
of reducing the loss function.
The outlined model is a form of the Feedforward Neural Network (FNN) with an
architecture of the multilayer perceptron with the Dropout method of regularization.
Data Collection and Preparation. Cross-sectional study was conducted and data
was extracted from a file including important information on cardiovascular diseases
(CVD). The information in this dataset includes data about age, sex, and cholesterol
along with other medical parameters’ details. Such detailed data helps in performing
a better analytical study and helps in enhancement of existing and new policies to
manage the CVD in a better way.
Data Preprocessing. To begin with, the dataset was divided into the features, which
are represented using the symbol “X”, and the target variable symbolized by the “y”;
the target in this study is defined per binary CVD classification labels. Next, we split
the data in an 80:20 ratio to have a test size of 20%. The random state was kept at
42 for reproducibility.
Feature Scaling. To boost the convergence problem between the models and also
bring about equal or comparable value scale to the features, feature scaling was
conducted. This is a vital step adding to the fact that research has shown that features
with zero means and unit variance are better for model convergence. In this way, the
occurrence of the features is treated uniformly, improving the training of models.
Model Architecture. The structure of our model is delineated as follows:
Description. The network consists of four primary layers: an input layer, a hidden
layer, a dropout layer, and an output layer. Together, these layers enable the neural
network to effectively process and learn from input data, ultimately producing
accurate and reliable outputs (shown in Fig. 1).
15 Predictive Modeling of Cardiovascular Disease Using Feedforward … 199
Output Layer (1 unit). Output layer having only one neuron for binary classification
where “1” represents cardiovascular disease and “0” represents its absence.
Activation function: Sigmoid
Model Compilation. This model is trained with Adam optimizer and the binary
cross-entropy loss that is especially applicable to binary classification problems.
Adam optimizer is famous for its ability to quickly adapt to the large-scale dataset,
but it also performs well in the case of sparse gradients. Binary cross-entropy loss
function is best suited for binary class problems as it quantifies the dissimilarity of the
probability distributions of the predicted classes versus the actual class distribution
in the data.
Model Training. Over a span of 50 training sessions, our model went through a
learning journey. In each session, it tackled a batch of 128 samples, taking small
steps towards improvement. Throughout this training process, we made sure to check
in on its performance with a separate set of validation data to ensure it wasn’t just
memorizing the training set.
Model Evaluation. Once our model finished its training, it was time for the big test.
We put it through its paces on a set of data it hadn’t seen before. We compared its
predictions to the actual outcomes to see how well it could anticipate the future. This
rigorous test helped us see if our model could handle real-world scenarios beyond
its classroom training.
Result Analysis. We meticulously calculated its accuracy on the test set, which gave
us a solid idea of how well it could predict outcomes. We dug deeper, looking at
other metrics like precision, recall, and F1-score to get a more complete picture. We
used a confusion matrix to break down its performance even further.
Description. The methodology outlines the steps from dataset collection to final
prediction using deep learning techniques. The approach ensures the model is well-
prepared to make accurate predictions (shown in Fig. 3).
4 Experimental Result
Source and Description. The dataset used in this study consists of 1025 entries that
are carefully selected and organized to cover in detail the cardiovascular health area.
This comes from the well-known University of California, Irvine (UCI) Machine
Learning Repository. The data set contains 14 distinct features which were selected in
detail and captures some essential aspects that are important to predict cardiovascular
disease (CVD) outcomes.
15 Predictive Modeling of Cardiovascular Disease Using Feedforward … 201
Description. The table comprises 14 carefully selected features, each essential for
predicting cardiovascular health. These features are chosen based on their relevance
and impact on cardiovascular conditions, ensuring the model has the necessary
information to make accurate predictions (shown in Table 1).
Table 2 Accuracy
S. No Accuracy (%) Model
comparison of various models
1 78.68 Logistic regression
2 80.32 SVM
3 73.77 KNN
4 85.24 Random forest
5 90 XGBoost
6 98.5 FNN (MLP)
Description. The performance metrics comparison shows that the FNN (MLP),
outperforms all other models, indicating superior accuracy, balance between preci-
sion and recall, and effectiveness in identifying relevant instances (shown in
Table 3).
It reveals that the FNN model has higher performance than various machine
learning models. Although each of these models has been studied and tried to have
a better predictive model for CVD, the present study demonstrates that advanced
neural networks with addition of regularization techniques can be useful to improve
prediction accuracy. This trend implies the need to have a high number of hidden
layers and the ability to flexibly adjust the model to analyze the Cardiovascular health
data complexities. The relevance of such trends and fluctuations, as was highlighted
in our analysis, is based on their applicability as answer to the question posed at
the beginning of our study. In attempting to derive and compare the most effective
machine learning, and deep learning models for CVD risk prediction strategy, we
sought to determine how different training, feature selection, and hyperparameter
optimization strategies can be used to improve diagnostic precision in cardiovascular
health care.
Description. The bar graph visually illustrates the performance disparities among
various models, prominently showcasing that the FNN (MLP), achieves the highest
accuracy compared to others (shown in Fig. 4).
Description. Confusion matrix evaluates how well a Feedforward Neural Network
(FNN) has performed in classifying data into two categories. The Correlation Matrix
represents the relationships between 13 variables. Darker red indicates stronger posi-
tive correlation, while darker blue indicates stronger negative correlation (shown in
Fig. 5).
Fig. 5 Confusion matrix for FNN classification performance and correlation matrix for variable
relationships
Description. The grouped bar plot compares Precision, F1-Score, and Recall across
multiple models. Each metric (Precision, F1-Score, and Recall) is represented by six
bars, each corresponding to a different model. This visualization provides a clear
comparison of how each model performs across these important evaluation metrics
(shown in Fig. 6).
15 Predictive Modeling of Cardiovascular Disease Using Feedforward … 205
Fig.6 Grouped bar plot of precision, F1-score, and recall across models
5 Conclusion
Taking into consideration, the current study has outlined the significance of adopting
innovative and prolixic methodologies of advanced deep learning for predictive
cardiovascular healthcare. Our efforts have paid off through rigorous practice and
evaluation, and their achievements give us their best accuracy, almost touching the
98.5% in predicting cardiovascular diseases. It is firmly concretized in the present
paper that the deep learning algorithm should serve as the bedrock of model devel-
opment for cardiovascular diseases. The limitations of this study also include the fact
that it uses only one dataset and one experimental setting, which greatly limits the
generality of the conclusions drawn. Furthermore, actual deployment can pose other
considerations that are not adequately reflected in this study. Despite these, our study
builds on prior work to progress the field of predictive cardiovascular healthcare.
Further advancements should expand into detailed combinations of other machine
learning methods, innovative feature creation methods, and the use of even larger
datasets than existing trends. Furthermore, future research should aim toward discov-
ering the generalizability of the results to various populations and setting as the
developed framework of the predictive cardiovascular healthcare has the potential to
extend its application to various settings.
206 D. Joshi et al.
References
1. Garcia-Ordas MT et al (2023) Heart disease risk prediction using deep learning techniques with
feature augmentation. Multimedia Tools Appl 82:31759–31773
2. Subramani S et al (2023) Cardiovascular diseases prediction by machine learning incorporation
with deep learning. Front Med 10:1150933. [Link]
3. Bhatt CM, Patel P, Ghetia T, Mazzeo PL (2023) Effective heart disease prediction using machine
learning techniques. Algorithms 16(2):88. [Link]
4. Ahmad S, Asghar MZ, Alotaibi FM et al (2022) Diagnosis of cardiovascular disease using deep
learning techniques. Soft Comput 27:8971–8990
5. Mehmood A, Iqbal M, Mehmood Z et al (2021) Prediction of heart disease using deep
convolutional neural networks. Arab J Sci Eng 46:3409–3422
6. Sharma S, Parmar M (2020) Heart diseases prediction using deep learning neural network model.
Int J Innov Technol Exploring Eng
7. Bharti R et al. (2020) Prediction of heart disease using a combination of machine learning and
deep learning. Hindawi Comput Intell Neurosci 2021
8. Pasha SN et al. (2020) Cardiovascular disease prediction using deep learning techniques. IOP
Conf Ser: Mater Sci Eng 981:022006
Chapter 16
Design and Implementation of Fetal
Heart Rate Measuring System
on MATLAB Simulink
Twarita Singh, Tanishq Dixit, Pragya Paliwal, Samriddhi Tiwari,
and Shivani Saxena
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 207
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
208 T. Singh et al.
2 Literature Review
extract time-varying ECG features and can be independent of any pathological and
physiological states of mother and Fetus. The work in this project is framed according
to the above research gap and uses a DWT-based signal filtering and processing block
to analyze each beat-beat of the fetal.
4 Methodology
Figure 3 shows a workflow diagram of the proposed model which consists of five
basic steps. Steps 1–2 are used in the signal acquisition, Step 3 is used to remove
various noise components from the signal, called pre-processing stage, and Steps
4–5 are implemented to detect and measure fetal heart rate using R-peaks. Finally,
from a software algorithm, a MATLAB-based Simulink model is created, which is
an ensemble of all entities used in fetal heartbeat.
The detailed description of each step is as follows:
Step1: Import FECG records from Physionet to MATLAB R2023b workspace
[17].
Step 2: Normalization of ECG gain from dB to mV, using Eq. 1, as follows:
Original signal
Normalized ECG signal(n) = (1)
Gain factor
This ensures consistent interpretation of ECG data across different recordings.
Step 3: Conducted 4-level undecimated DWT using the SYM4 wavelet, resulting
in the signal decomposition into its constituent frequency components for the removal
of various noise components.
Step 4: Detection of R-peaks using amplitude thresholding and windowing
method.
Step 5: Calculation of Heart Rate using successive R-Peaks, according to the
following equation.
16 Design and Implementation of Fetal Heart Rate Measuring System … 211
Number of Rpeaks
HR = (2)
Time duration
Here, HR is the heart rate in beats per minute (bpm) and number of R-Peaks is the
count of R-peaks detected within the specified time duration [18].
Statistical parameters are used to measure the performance of the proposed
method. First is Mean Square Error (MSE) is computed to find the variation in
time between successive heartbeats. The mathematical equation governing the MSE
is Eq. 3.
1 N
MSE = ECG(i) − ECG(i)2 (3)
N i=1
Here, ECG (i) and ECG (i) are the noisy and denoised ECG signals, respectively.
The signal-to-noise ratio is measured using signal averaging, given by Eq. 4.
1 N
Average ECG(t) = ECGi (t) (4)
N i=1
Here, N is the number of ECG samples to be averaged and ECG i (t) is denoised ECG
signal [15].
As discussed in Sect. 4, the first step of the proposed model is signal acquisition
which is implemented using Algorithm 1 in MATLAB workspace to import this
FECG signal.
The second and third step is to apply DWT on noisy and normalized FECG signals,
as shown in algorithm 2 (Fig. 4).
The resulting FECG signals with all detected R-peaks are shown in Fig. 6.
As shown in Fig. 6, the number of detected R-peaks in record ARR_03 is 15.
Using these R-peaks, the measure value of heart rate is shown in Eq. 5.
15
HR = · 60 = 90bpm (5)
60
This measured value is used as a reference for comparison of simulated values
obtained from derived Algorithms 1–3 in the proposed model.
In the same manner, simulated values of heart rate are compared with theoretical
values in all 16 FECG records and measured values of performance parameters using
Eqs. 3 and 4 are tabulated in Table 1.
It is shown in the table above that the theoretical values of heart rates are similar
with the measured values of heart rate using experimental methods as discussed in
Fig. 2. Also, it exemplifies the performance indices of the proposed DWT based
method which is quantified in the form of statistical parameter.
In the final stage of hardware implementation of the proposed model, the FECG
signal is imported into the MATLAB Simulink. Firstly, conversion of floating-point
data type into fixed-point data type is performed to import these FECG signals into
the MATLAB Simulink environment. Figure 7 shows basic processing blocks in
Simulink to convert into appropriate data types.
The acquired signal is segmented into two distinct sections corresponding to the
chest and abdomen regions and pre-processed with supported data format conversion.
The collective fetal heart rate signal, derived from these categorized sections in
integer format, fixed data type signal representation. This type of fixed-point signal
representation is necessary and essential for hardware implementation as it utilizes
less hardware resources to be embedded into a small device [19].
214 T. Singh et al.
Table 1 Measurement of heartbeat of fetus using FECG and numerical simulation of statistical
parameters in denoised ECG signals
S. no Records No. of Heartbeat Heartbeat count MSE SNR
R-peaks (theoretical) (experimental)
detected
1 ARR_01m 14 84 88 0.0392 42
2 ARR_02m 15 90 96 0.0409 49
3 ARR_03m 15 90 114 0.0344 70
4 ARR_04m 15 90 90 0.0486 56
5 ARR_05m 18 108 108 0.0042 54
6 ARR_06m 19 114 114 0.0344 42
7 ARR_07m 14 84 84 0.0749 51
8 ARR_08m 13 78 78 0.0405 43
9 ARR_09m 23 138 108 0.031 39
10 ARR_10m 12 72 72 0.1217 42
11 ARR_11m 13 78 78 0.0875 48
12 ARR_12m 17 102 102 0.0704 64
13 NR_01m 15 90 138 0.0256 51
14 NR_02m 16 96 96 0.1039 45
15 NR_03m 15 90 96 0.237 47
16 NR_04m 17 102 108 0.1051 50
Fig. 7 Import of FECG data into Simulink using integer data type
It is shown in Fig. 8 that the simulated fetal heart rate signal is sent to the display
component followed by pipelining of various DSP blocks including digital filters
to remove high-frequency noise components, and also the DC (zero frequency). In
addition, the Spectrum Analyzer is easily used in SIMULINK blocks to identify the
frequency of noise components. The resultant FECG signal generated by the above
Simulink model is shown in Fig. 9, and consists of various signal components, (as
discussed in Fig. 1).
It is clear from Fig. 9 that the proposed Simulink model generates FECG signals
efficiently which is used for clinical purposes in heart rate monitoring.
16 Design and Implementation of Fetal Heart Rate Measuring System … 215
6 Conclusion
References
1. Ronsmans C, Graham WJ (2006) Maternal mortality: who, when, where, and why. The Lancet
368(9542):1189–1200
2. Aggarwal G, Wei Y (2021) Non-invasive fetal electrocardiogram monitoring techniques:
potential and future research opportunities in smart textiles. Signals 2(3):392–412
3. Lin H, Liu R, Liu Z (2023) ECG signal denoising method based on disentangled autoencoder.
Electronics 12(7):1606
4. Behar J, Johnson A, Clifford GD, Oster J (2014) A comparison of single channel fetal ECG
extraction methods. Ann Biomed Eng 42:1340–1353
5. Alnuaimi SA, Jimaa S, Khandoker AH (2017) Fetal cardiac Doppler signal processing
techniques: challenges and future research directions. Front Bioengd Biotechnol 5:82
6. Ahmad AA, Nyitamen DS, Lawan S, Wamdeo CL (2019) Fetal heart rate estimation: adaptive
filtering approach vs time-frequency analysis. In: 2019 2nd international conference of the
IEEE Nigeria computer chapter (NigeriaComputConf). IEEE, pp 1–5
7. Grivell RM, Alfirevic Z, Gyte GM, Devane D (2015) Antenatal cardiotocography for fetal
assessment. Cochrane Database Syst Rev 9
8. Jabbari S (2021) Source separation from single-channel abdominal phonocardiographic signals
based on independent component analysis. Biomed Eng Lett 11(1):55–67
9. Keenan E, Udhayakumar RK, Karmakar CK, Brownfoot FC, Palaniswami M (2020) Entropy
profiling for detection of fetal arrhythmias in short length fetal heart rate recordings. In: 2020
42nd annual international conference of the IEEE engineering in medicine & biology society
(EMBC). IEEE, pp 621–624
10. Kasap B, Vali K, Qian W, Chak WH, Vafi A, Saito N, Ghiasi S (2021) Multi-detector heart rate
extraction method for transabdominal fetal pulse oximetry. In: 2021 43rd annual international
conference of the IEEE engineering in medicine & biology society (EMBC). IEEE, pp 1072–
1075
11. Farahi M, Casals A, Sarrafzadeh O, Zamani Y, Ahmadi H, Behbood N, Habibian H (2022)
Beat-to-beat fetal heart rate analysis using portable medical devices and wavelet transformation
techniques. Heliyon 8(12)
12. Escalona-Vargas D, Bolin EH, Lowery CL, Siegel ER, Eswaran H (2020) Recording and
quantifying fetal magnetocardiography signals using a flexible array of optically pumped
magnetometers. Physiol Meas 41(12):125003
13. Vullings R, Van Laar JO (2020) Non-invasive fetal electrocardiography for intrapartum
cardiotocography. Front Pediatr 8:599049
14. Chen D, Wan S, Xiang J, Bao FS (2017) A high-performance seizure detection algorithm based
on Discrete Wavelet Transform (DWT) and EEG. PLoS ONE 12(3):e0173138
16 Design and Implementation of Fetal Heart Rate Measuring System … 217
15. Saxena, S., & Vijay, R. (2020). Optimal selection of wavelet transform for de-noising of
ECG signal on the basis of statistical parameters. In: Soft computing and signal processing:
proceedings of 2nd ICSCSP 2019, vol 2. Springer Singapore, pp 731–739
16. Non-Invasive Fetal ECG Arrhythmia Database (nifeadb) downloaded from PhysioNet Website
17. Simulink toolbox MATLAB R2023b
18. Saxena S, Vijay R, Pahadiya P, Gupta KK (2023) Classification of ECG arrhythmia using
significant wavelet-based input features. Int J Med Eng Inf 15(1):23–32
19. Travieso-González CM, Pérez-Suárez ST, Alonso JB (2013) Using fixed point arithmetic for
cardiac pathologies detection based on electrocardiogram. In: Computer aided systems theory-
EUROCAST 2013: 14th international conference, Las Palmas de Gran Canaria, Spain, February
10-15, 2013. Revised Selected Papers, Part II 14. Springer Berlin Heidelberg, pp 242–249
Chapter 17
Exploring the Effectiveness of Artificial
Neural Networks and Regression Models
in Weather Prediction
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 219
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
220 V. Singh et al.
everyday activities, particularly from a tourism perspective. WFS and early warn-
ings assist travelers in planning trips more effectively, enabling informed decisions
about specific days, locations, and times. Accurate WF is crucial in natural disas-
ters to ensure the safety of people and property. Therefore, this paper focuses on
predicting temperature data using historical data from Delhi city spanning 20 years
and 365 days. The temperature dataset from Himani et al. [16] is utilized to test NN-
based weather forecasting (WF) accuracy, revealing a significant need for improved
forecasting accuracy. The ultimate goal of the paper is to demonstrate the accurate
use of the ANN network model for temperature WF.
2 Literature Review
Pathak et al. [1] have introduced FourCastNet, a computational model with signif-
icant implications for predicting catastrophic weather phenomena, including extra-
tropical and tropical cyclones, air pathways, and wind power planning. FourCastNet
demonstrates superior forecasting accuracy at short lead periods compared to the
ECMWF Interactive Forecasting System (IFS), a cutting-edge Numerical Weather
Prediction (NWP) approach. Particularly, FourCastNet outperforms IFS in small-
scale factors like rainfall, and its remarkable speed enables the generation of weekly
predictions in just a second. This velocity facilitates the cost-effective creation of
large-ensemble projections, enhancing deterministic modeling. The study under-
scores the value of data-driven computational models like FourCastNet as valuable
additions to the weather forecasting arsenal, complementing and improving NWP
modeling.
In a related development, Bi et al. [2] have presented Pangu-Weather, a deep
learning-based system for precise and rapid global weather forecasting. Utilizing
hourly global meteorological data spanning 43 years from the ECMWF revision
(ERA5), the model is trained with around 256 million parameters. Pangu-Weather
achieves a spatial accuracy of 0.25° × 0.25°, comparable to the ECMWF Inte-
gral Forecasting System (IFS). Notably, the artificial intelligence-driven approach
surpasses traditional NWP techniques in precision across various variables and time
intervals.
Xia et al. [3] have introduced the ED-ConvLSTM architecture, employing a
convex short quick memory network, to predict worldwide net electron contents
(TEC) with a 1-h time cycle. Using International GNSS Services (IGS) TEC data
from 2005 to 2018, the model forecasts TEC maps one to seven days in advance. The
model’s effectiveness is validated through simulations and comparisons with empir-
ical frameworks such as the Bend model, International Model for the Reference
Ionosphere (IRI) models, and NeQuickmodel.
Chen et al. [4] address state-dependent errors in weather forecasts resulting
from imprecise algorithms. They propose using data assimilation (DA) to incor-
porate observations and correct model mistakes. The study focuses on extracting
data embedded in the analysis increments generated by the DA procedure to
17 Exploring the Effectiveness of Artificial Neural Networks … 221
enhance numerical weather forecasts. Neural networks (NNs) are trained to predict
adjustments to systemic errors in the FV3-GFS simulation, leading to significantly
improved error correction compared to a linear baseline.
In their study, Grabar et al. [5] present a comprehensive approach to address
dry data, utilizing a publicly available quarterly weather dataset as input for a
spatio-temporal artificial intelligence model. The investigation, conducted across
five distinct climatic locations, assesses the accuracy of Palmer Dry Severity Index
(PDSI) forecasts using various algorithms. Notably, the study concludes that the
Transformers approach, EarthFormer, exhibits remarkable precision in generating
immediate (up to six months) forecasts. The use of equations with partial differen-
tials to mimic Earth’s systems highlights the potency of general circulation models
(GCMs) in predicting climatic events.
Kurth et al. [6] address the challenges associated with the time-to-solution
constraints and processing costs of physics-based numerical weather forecasting
(NWP). They reveal FourCastNet, a data-driven deep neural network planetary
emulator, as a promising alternative that outperforms traditional NWP in medium-
range forecasting with significantly reduced computational time. The study employs
spectrum approaches to achieve these advancements.
Sharma et al. [7] introduce a data-driven model named U-Net, based on convolu-
tional neural networks, for meteorological weather forecasting. This model considers
environmental characteristics such as 2 m temperatures, mean ocean stress, surface
stress, wind speed, modeling topography height, sunlight intensity, and humidity
ratio to forecast weather conditions for the next six hours.
Liu et al. [8] contribute to the exploration of weather forecasting (WF) prob-
lems by presenting CAST-YOLO, an enhanced YOLO approach incorporating a
cross-attention approach converter. Utilizing convolutional block attention modules
(CBAM) and a transformation decoder layer (TE-Layer), the detector employs infor-
mation distillation techniques for cross-domain object identification, showcasing
notable advancements in convolutional neural network-inspired item recognition
techniques.
Nayak et al. [9] tackle the immediate forecasting of severe downpours, a crit-
ical aspect of flooding alert communication. Focusing on urbanized locations like
Mumbai, the study employs micro and continental-level weather observations to
develop a machine learning-based method, using the support vector machine (SVM)
and the atypical regularity methodology (AFM). This methodology leverages the
occurrence of high anomalous values of meteorological parameters to identify
predictors for SVM, addressing the challenges of predicting excessive precipitation.
Novitasari et al. [10] employ climate variables to predict rainfall quantity in their
study, utilizing the Adaptive Neuro-Fuzzy Inference System (ANFIS) and support
vector regression (SVR) techniques. This approach aims to provide rapid and reli-
able insights into upcoming weather conditions, enabling individuals to prepare
adequately. The prediction is based on synopsis data, including wind speed, temper-
ature, and humidity levels, with the ANFIS technique forecasting each variable’s
results for precise rainfall forecasts, assessed using RMSE and MSE metrics.
222 V. Singh et al.
Hayaty et al. [11] conducted a study with the aim of assessing the capability of
support vector machines (SVM) in forecasting rain at Tanjungpinang, Kepulauan
Riau, Indonesia. The research utilizes factors such as humidity, temperature, wind
speed, and precipitation to make predictions. The gathered information undergoes
detailed analysis to determine whether the constructed SVM model meets the conver-
gence requirements of an algorithm or technique. This evaluation is crucial for vali-
dating the effectiveness and reliability of the SVM-based rain forecasting model in
the specific geographical context of Tanjungpinang.
In a study by Chandrasekara [12], an automatic SVM-inspired method for annual
weather forecasting with thirty-minute increments is proposed. This method, tested
with a dataset from the Kandy region comprising 136 samples, 20 characteristics, and
5 labels, exhibits significantly higher accuracy than models based on yearly datasets.
The framework achieves optimal precision of 86% through thorough training, data
preparation processes, and hyper parameter optimization using the grid search tech-
nique. SVM algorithms prove effective, particularly in time series-based forecasting
approaches.
Velasco et al. [13] address the impact of climate change-related heavy rainfall
on the Southern Philippines. Using a 4-year precipitation dataset, a Support Vector
Regression Machine (SVRM) forecasts future precipitation in a tropical city. The
process involves optimizing cost and gamma radiation variables to enhance predic-
tion precision, assessing forecasting algorithms, and authenticating cost and gamma
decay parameters to establish correlations across current and past value pairs.
Khan et al. [14] present a unique deep learning-driven approach for estimating
energy load in their research. This involves a three-step process: feature extrac-
tion through Recurrent Features Removal, classification/forecasting using enhanced
support vector machines (SVM) and Extreme Learning Machines (ELM), and feature
selection through a combination of include choice. Hyper parameter adjustments are
made using the Genetic Algorithm for ELM and the Grid Searching Engine for SVM,
with the relevance of features determined through the Recursive Features Elimination
(RFE) approach.
Álvarez-Alvarado et al. [15] review various hybrid SVM algorithms utilizing
State-of-the-Art (SOA) approaches to optimize settings for sunlight prediction.
Examining the last five years of studies on hybrid SVM-optimized algorithms, they
find that SVM using Genetic Algorithm (GA) outperforms traditional SVM models,
especially when forecasting variables are set using the Zonal-based kernel algorithm.
SVM emerges as a potent machine learning classification technique for forecasting
ultraviolet (UV) rays.
Tyag et al. [16] have focused on weather forecasting utilizing historical datasets.
Due to the intricate and nonlinear nature of atmospheric patterns, conventional
methods often fall short in terms of effectiveness and efficiency. Recognizing the
complexity of the issue, the researchers turned to Artificial Neural Network (ANN)
as a potent approach for addressing these challenges. The proposed ANN method
assesses model performance through the exploration of various parameters such as
neurons, hidden layers, and transfer functions.
17 Exploring the Effectiveness of Artificial Neural Networks … 223
Table 1 shows different types of weather forecasting models, from old methods
to new advanced ones using deep learning and hybrid techniques. Each model has
unique strengths, improving accuracy, speed, and usefulness for various weather
and climate issues. The new method we suggest uses an artificial neural network
(ANN) and is very accurate, highlighting how modern AI can improve environmental
monitoring.
The paper has introduced WFS, which is designed by using ANN. This ANN-based
system assesses different models by predicting temperatures throughout the year,
considering various factors like transfer functions, hidden layers, and neurons. The
selection of the best model is based on the MSE. The main aim of the study is to
create a more accurate and efficient method for modeling weather data, reducing
computational costs, and achieving a lower MSE. The process involves removing
invalid values from the raw data, creating a more manageable dataset, and using it
for predictions. Regression analysis is then employed to measure the accuracy of
the data predictions. The flow chart illustrating the steps of the ANN-based weather
forecasting is provided in Fig. 1.
3.1 Pre-Processing
At the front end, raw weather data is downloaded from the Kaggle site and stored as
an Excel spreadsheet. The following steps are involved sequentially:
A. Data Cleaning
Read the columns of weather data. The data contains month, date, year, and temper-
ature in °F for Mumbai and Delhi cities. The missing values are replaced with “Not
a Number” (NaN) for i = 1 to the length of the data as follows:
5
TempoC = Tempdata − 32 ∗
9
Then the NaN is eliminated using the spline interpolation method.
B. Network Architecture
The features of the input data are divided into 19 input layers, with one output layer
corresponding to the 20 years, comprising 7320 data samples divided by 366 days.
224 V. Singh et al.
Table 1 (continued)
Study Approach/Model Main focus/result
Nayak and Ghosh SVM-based method - Focuses on immediate forecasting of severe
[9] downpours for flooding alerts. - Uses SVM and
atypical regularity methodology (AFM). -
Addresses challenges in predicting excessive
precipitation in urbanized locations like
Mumbai
Novitasari et al. [10] ANFIS and SVR - Uses climate variables to predict rainfall
quantity. - Employs Adaptive Neuro-Fuzzy
Inference System (ANFIS) and supports vector
regression (SVR) techniques. - Enables rapid
and reliable insights into upcoming weather
conditions
Hayaty et al. [11] SVM-based rain - Assesses SVM capability in forecasting rain. -
forecasting Uses factors like humidity, temperature, wind
speed, and precipitation. - Evaluates model
convergence for reliability in Tanjungpinang,
Kepulauan Riau, Indonesia
Chandrasekara [12] SVM-inspired method - Proposes an automatic SVM-inspired method
for annual weather forecasting. - Demonstrates
higher accuracy than models based on yearly
datasets. - Achieves optimal precision through
training and hyperparameter optimization
Velasco et al. [13] Support Vector - Addresses the impact of climate
Regression Machine change-related heavy rainfall in the Southern
Philippines. - Uses Support Vector Regression
Machine (SVRM). - Optimizes variables for
enhanced prediction precision. - Contributes to
understanding ionospheric thermospheric
emission curves
Khan et al. [14] Enhanced SVM and - Presented a deep learning-driven approach for
ELM estimating energy load. - Uses feature
extraction, SVM, and Extreme Learning
Machines (ELM). - Employs Genetic Algorithm
and Grid Searching Engine for hyperparameter
adjustments
Álvarez-Alvarado Hybrid SVM with SOA - Reviews various hybrid SVM algorithms for
et al. [15] sunlight prediction. - Highlights SVM with
Genetic Algorithm (GA) outperforming
traditional SVM models. - Utilizes Zonal-based
kernel algorithms for forecasting variables. -
Potent for UV rays prediction
Tyag et al. [16] ANN Evaluates performance for temperature
prediction across all 365 days of the year with a
4:1 ANN accuracy of 98.6% for Delhi data.
They stated that increasing the layers may
minimize the MSE error further
(continued)
226 V. Singh et al.
Table 1 (continued)
Study Approach/Model Main focus/result
Kumar Abhishek ANN - ANN for weather forecasting with a 10:1 ratio,
et al. [17] utilizing either a five-layer or ten-layer
architecture, achieved an accuracy of around
90%
Proposed ANN - Designed a 19:1 model with 14 hidden layers,
Methodology incorporating spline interpolation for NaN
reduction, and achieved an accuracy of around
99.042% for Delhi data
The best performance is achieved with 14 hidden layers, each corresponding to the
12 months.
The selection of the best model is based on the MSE. The main aim of the study is
to create a more accurate and efficient method for modeling weather data, thereby
reducing computational costs and achieving a lower MSE.
Regression analysis is then employed to measure the accuracy of the data predictions.
The training process of the suggested ANN-based WFS is depicted in Fig. 2. In
Fig. 2a, it’s evident that the system employs 19 input and 4 hidden layers. Despite
having a more extensive structure, this results in a substantial enhancement in data
correlation. Moving on to Fig. 2b, it illustrates the training process of the system.
Although the proposed system takes a bit more time to converge, the trade-off is
improved accuracy in forecasting.
It is suggested to remove NANs from the datasets to create a smaller dataset. The paper
aims to maximize the R-square value to demonstrate improved correlation between
subjective and proximate data. The estimation MSE is used to evaluate performance,
seeking to minimize errors. The results of the histogram of error performance for the
proposed WF system are shown in Fig. 3.
17 Exploring the Effectiveness of Artificial Neural Networks … 227
Fig. 3 Histogram comparison of the error performance for the proposed system
17 Exploring the Effectiveness of Artificial Neural Networks … 229
data. It can be clearly observed from Fig. 4 that the estimation error performance is
better for the proposed method, with the best performance achieved at lower epochs.
The comparison of MSE and epochs is provided in Table 2. Table 2 presents a
comparative analysis between reference methods [16] and the proposed method for
Delhi data based on MSE and respective epochs. The reference method achieved
a MSE of 4.3915 with a corresponding epoch of 12, while the proposed method
outperformed it with a significantly lower MSE of 0.3962 at a slightly higher epoch
of 18. This indicates the improved performance of the proposed method in terms of
minimizing error and achieving convergence over the training epochs compared to
the reference method.
Table 3 provides a parametric comparison of an existing method [16] and the
proposed ANN approach. The proposed method is evaluated on both Delhi and
Mumbai datasets, with variations in hidden layers, R-square values, and MSE. The
proposed method demonstrates superior performance in terms of R-square values
and MSE compared to the reference method in the Delhi dataset. However, when
validated on the Mumbai dataset, the R-square value decreases, and MSE increases,
suggesting potential variability in performance across different locations.
It can be observed from Table 4, which presents the analysis under different loca-
tions and weather conditions for four cities in India, whose performance and compu-
tation complexity vary notably for highly humid weather data. The best performance
Table 2 MSE and Epochs comparison of proposed and existing methods for Delhi data
Parameter Ref. [16] Proposed method
Best MSE 4.3915 0.3962
Respective epoch 12 18
230 V. Singh et al.
is achieved for Calicut data. However, the proposed method performs well under all
cases.
Figures 5 and 6 compare the Regression Plot for the 4:1 (Input–Output Data Ratio)
Neural Network; versus 19:1 (Input–Output Data Ratio) Neural Network for Delhi
over and Mumbai over the last 20 years respectively. As it can be observed from the
figures that the fit of the 19:1 ANN is better compared to that of the 4:1 ANN.
Challenges
The major challenge lies in computation, particularly for very large datasets where
the approach could become slow or even impractical. When considering geographical
areas, different regions may exhibit distinct data features. A technique that works
effectively in one area may necessitate modifications in pre-processing to perform
well in another.
This paper proposes the use of an ANN-based data prioritization method after initially
validating the fundamental WF coefficient. The recommendation is to remove NANs
from the datasets to create a smaller dataset. The paper’s primary goal is to maxi-
mize the R-square value, demonstrating improved correlation between subjective
and proximate data. Additionally, the estimation MSE is employed to assess perfor-
17 Exploring the Effectiveness of Artificial Neural Networks … 231
Fig. 5 Results of the performance comparison for proposed ANN system with 20-year Delhi data
mance, aiming for a minimum error value. The suggested system incorporates 19
inputs and 14 hidden layers. During the training phase, the suggested system achieves
peak performance with the best possible R-square value of 0.904 and a minimum
MSE of 0.3962 for Delhi and 0.2624 for Mumbai data. Looking ahead, the study
suggests the potential use of Deep Neural Networks (DNN) for improved forecasting
performance with more extensive datasets in the future. Several crucial elements of
weather datasets that may improve their reproducibility in the future include: (1)
Using NetCDF ensures compatibility. (2) Incorporating indicators or flags concerning
outliers, suspicious data points, or missing values into the dataset. (3) Utilizing hourly
and mean daily temperature data may enhance reproducibility.
232 V. Singh et al.
Fig. 6 Results of performance comparison for Proposed ANN for 20-year Mumbai data
References
1 Introduction
In regions where the electrical grid is inaccurate, an Energy storage system provides
constant electricity, grid stability, and control of frequencies [1, 2]. Nowadays, the
most prevalent kinds of storage systems implemented are those for disasters [3],
emergencies [4], and intermittent or separated operation scenarios [5, 6]. Petrol or
diesel-electric generators are often used in these instances as energy sources. In recent
days, however, some states have begun to consider large-scale battery deployments
for grid or renewable energy storage. While battery storage alone may be adequate for
a large-scale energy system, this research will concentrate on smaller-scale residential
systems, where hybrid energy storage tends to be more appropriate [7–10]. Battery
supercapacitors (SCs) offer higher power density and quick response to variations
but are not suitable for long-term storage due to their limited power source [11–
13]. Hybrid energy storage systems are due to their opposing characteristics and PV
systems have become increasingly popular and suitable for distributed systems [14].
Many governments promote the utilization of renewable energies and encourage a
more decentralized approach to power delivery systems [15]. Despite their relatively
high cost, there has been very remarkable growth in installed in RESs [11]. Solar
energy is the world’s major renewable energy source. They do not have any moving
parts, operation is smooth, and generate no emissions [16]. Another advantage is
that they are highly modular and can be easily scaled to provide the required power
for different loads [17–19]. To overcome environmental depletion by harvesting
distributed energy from RES which plays a major role in clean energy production [8,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 235
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
236 K. R. Patel and J. J. Gadit
9]. Considering the nature of the voltage and current, microgrids are sub-categorized
into three types, i.e., AC, DC, and hybrid [20–23]. Each alternative has provided
different advantages and disadvantages. Microgrids provide three benefits reliability,
sustainability, and economic [24–26].
2 System Configuration
3 Mathematical Models
To deliver the total amount of output power into the network and maintain the
PV output power (mostly to track the maximum power point), a boost converter
is employed as shown in Fig. 2.
The voltage-current characteristic equation of a solar cell [5] module photo-
current:
Ir
Iph = [Isc + Ki (T − 298)] × (1)
1000
Here, Iph : : photo-current (A); Isc : : short circuit current (A); Ki : short-circuit current
of the cell at 25 °C and 1000 W/m2 ; T: operating temperature (K); Ir : : solar irradiation
(W/m2 ).
The current output of the PV module [5] as:
When dealing with loads that fluctuate and require more immediate power. Including
SC helps to enhance and alleviate most of these challenges. Supercapacitors are high-
density power storage devices that can be used to provide the increased current. For
this purpose, they cannot be used independently due to their poor energy density.
They are hence suitable with BESSs. The SC is connected in parallel to increase the
potential energy storage
2RESR CTOTAL
−
ηeff = e tdch
(3)
Here, ηeff : energy efficiency; RESR : total equivalent series resistance; CTOTAL : total
capacitance; tdch : discharging time.
−R t
Vc (t) = Vo e T Ctotal (4)
V (t) = Vo 1 + e (6)
RT
Due to the sporadic nature of renewable energy sources, times when solar energy
production is little or non-existent need a battery storage device. The energy is stored
in the battery system may be used to provide the necessary power during peak and
non-peak hours. The battery’s efficiency and performance are contingent upon several
factors such as the surrounding temperature, charge level, voltage fluctuations, and
18 Modeling and Simulation of a Hybrid Energy Storage System for DC … 239
R0 is the internal resistance ();it = idt is the removed charge (Ah);A and B are
empirical constants (V), (1/Ah).
Q Q
V = E0 − K i∗ − K it − R0 · i + A · e(−B·it) (8)
Q − it Q − it
Vfull = E0 − R0 · I + A (9)
Q
Vexp = E0 − K Q + I − R0 · I + A · e(−B·Qexp ) (10)
Q − Qexp exp
4 Scenario Design
In this scenario, the simulation operation of DCMG is illustrated without grid power.
Hence, it has either surplus power or sufficient charge in the storage devices to feed
the load power requirement. The MATLAB Simulink model of the DC microgrid is
shown in Fig. 3.
5 Simulation Results
This simulation analysis has different modes including random behavior of gener-
ation and loads and the deficit beyond the battery’s discharging. All the various
scenarios of DCMG without a utility power supply are discussed below:
Case 1: Ppv = Minimum , PBat− + PSC− < Pload
During evening time, the generation is less than the demand power due to the
irradiation. Therefore, SC and battery are provided to meet the required load demand
are shown in Fig. 4.
Case 2: Ppv = Pload
In the above Fig. 5, morning time power of pv are same as the demand of load
power. However, the power of pv is minimum or zero than the load demand in
morning time. Therefore, the fuel cell is used as a backup protection and satisfied
the load demand as shown in Fig. 6.
Case 3: Ppv = 0 or Minimum , PFC− < Pload
Also, Fig. 7 illustrates that when the load demand is higher than the power of PV
then the required power demand is fulfilled using the battery supercapacitor and fuel
cell and the power of generation is larger than the Load demand. The battery and SC
are charged as shown in Fig. 8.
Case 4: Ppv = 0 or minimum , PBat− + PSC− + PFC− < Pload
Case5 : Ppv + PBat++ + PSC++ > Pload
18 Modeling and Simulation of a Hybrid Energy Storage System for DC … 241
6 Conclusion
This paper has presented different scenarios in the DC microgrid, wherein stochastic
output variations of the generation power due to the variation in irradiation can lead to
uncertainty in the microgrid. However, the combined Hybrid Energy Storage System
(HESS) such as a battery and supercapacitor can solve this problem and improve the
system’s stability and reliability. Therefore, to ensure the reliability, stability, and
robustness of the energy management strategy for residential applications consider
the time of use before applying it to the real simulation system.
References
1. Ahmed M, Datta M, Vahidnia R (2020) Stability and control aspects of microgrid architectures-a
comprehensive review. IEEE Access
2. Hatziargyriou N (2014) Microgrids: architectures and control. Wiley
3. Farrokhabadi M, Cañizares CA, Simpson-Porco JW, Nasr E, Fan L, Mendoza-Araya PA,
Tonkoski R, Tamrakar U, Hatziargyriou N, Lagos D et al. (2019) Microgrid stability definitions,
analysis, and examples. IEEE Trans Power Syst
4. Hama N, Weerawoot K, Siriroj S (2017) An evaluation of voltage variation and flicker severity
in micro grid. Int J Electr Eng Congress (IEECON)
5. Patel KR, Gadit J (2024) Power management and control of hybrid energy storage system in
a standalone DC microgrid. International multidisciplinary conference on emerging trends in
sustainable development (IMCETSC)
6. Sadhu RM, Patel KR, Gadit J (2024) Energy management using hybrid energy storage system
in DC microgrid: a review. International multidisciplinary conference on emerging trends in
sustainable development (IMCETSC)
7. Xu L, Chen D (2011) Control and operation of a dc microgrid with variable generation and
energy storage. IEEE Trans Power Del
8. Eghtedarpour N, Farjah E (2014) Distributed charge/discharge control of energy storages in a
renewable-energy-based dc micro-grid. IET Renew Power Gen
18 Modeling and Simulation of a Hybrid Energy Storage System for DC … 243
9. Hong J, Yin J, Liu Y, Peng J, Jiang H (2019) Energy management and control strategy of
photovoltaic/battery hybrid distributed power generation systems with an integrated three-port
power converter. IEEE Access
10. Cabrane Z, Kim J, Yoo K, Ouassaid M (2021) HESS-based photovoltaic/batteries/
supercapacitors: energy management strategy and DC bus voltage stabilization. J Solar Energy
11. Cabrane Z, Ouassaid M, Maaroufi M (2016) Analysis and evaluation of battery-supercapacitor
hybrid energy storage system for photovoltaic installation. Int J Hydrogen Energy
12. Zheng D, Wei D, Zhang W, Meng Z (2015) The study of supercapacitor’ transient power quality
improvement on Microgrid. IEEE Eindhoven PowerTech Eindhoven Netherlands
13. El-Shahat A, Sumaiya S (2019) DC-microgrid system design, control, and analysis. J
Electronics
14. Jing W, Lai CH, Wallace Wong SH, Dennis Wong ML (2017) Battery-supercapacitor hybrid
energy storage system in standalone DC microgrids: a review. Inst Eng Technol
15. Gaeed A (2022) Study of power management of standalone DC microgrids with battery
supercapacitor hybrid energy storage system. Int J Electr Comput Eng (IJECE)
16. Campagna N, Castiglia, Miceli R, Mastromauro RA Trapanese M, Viola F (2020) Battery
models for battery powered applications: a comparative study. J Energies
17. Nguyen XH, Nguyen MP (2015) Mathematical modelling of photovoltaic cell/module/arrays
with tags in MATLAB/Simulink. Environ Syst Resources
18. Chitransh A, Kumar S (2021) The different type of MPPT techniques for photovoltaic system.
Indian J Environ Eng (IJEE)
19. Kumar H, Kumar S (2018) Smart grid integration of solar energy system using MPPT with
incremental conductance and control analysis. International conference on power energy
environment and intelligent control (PEEIC)
20. Baghaee R, Mirsalim M, Gharehpetian GB, Talebi HA (2016) Reliability/cost based multi-
objective Pareto optimal design of stand-alone wind/PV/FC generation microgrid system. J
Energy
21. Jing W, Lai CH, Wallace Wong SH, Dennis Wong ML
22. Battery-supercapacitor hybrid energy storage system in standalone DC microgrids: a review
The Institution of Engineering and Technology (2017)
23. Masaud TM, El-Saadany EF (2020) Correlating optimal size, cycle life estimation, and tech-
nology selection of batteries: a two-stage approach for microgrid applications. IEEE Trans
Sustain Energy
24. Cabrane Z, Kim J, Yoo K, Ouassaid M (2021) HESS-based photovoltaic/batteries/
supercapacitors: energy management strategy and DC bus voltage stabilization, J Solar Energy
25. Zhou T, Sun W (2014) Optimization of battery–supercapacitor hybrid energy storage station
in wind/solar generation system. IEEE Trans Sustain Energy
26. Gomez-Gonzalez M, Hernandez JC, Vera D, Jurado F (2020) Optimal sizing and power
schedule in PV household-prosumers for improving PV self-consumption and providing
frequency containment reserve. J Energy
Chapter 19
Prediction of CKD: A Performance
Analysis of Six Machine Learning
Algorithms
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 245
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
246 P. V. Baviskar et al.
disease, underscore the urgency of prediction and early-stage diagnosis for timely
intervention [11–13]. CKD, recognized as a progressive and irreversible pathologic
syndrome, emphasizes the essentiality of proactive measures in prediction and diag-
nosis to ameliorate its progression [14]. Early detection and the management of risk
factors emerge as crucial strategies in mitigating the impact of CKD on global public
health.
Machine learning, as defined, pertains to a computer program capable of calcu-
lating and deducing information relevant to a given task, extracting characteristics
of corresponding patterns [11]. This technique has the potential to provide accurate
and cost-effective illness diagnoses, making it a potential method for identifying
chronic kidney disease (CKD). With the advancement of information technology,
machine learning has arisen as a new medical tool [12], finding extensive application
potential, particularly with the quick growth of electronic health records [13].
In health care, machine learning has already shown useful in identifying human
body status [14], assessing significant disease components [15, 16], and diagnosing a
range of disorders. The integration of machine learning into medical practices show-
cases its transformative impact on health care, facilitating enhanced diagnostic capa-
bilities and contributing to the broader landscape of medical advancements [17–19].
In this paper, we present a deep literature survey on machine learning models for
early prediction of CKD, and also we analyzed the performance of six algorithms
and provided comparative analysis.
The rest of the paper is organized as follows: Sect. 2 provides literature survey. and
Sect. 3 discusses our proposed approach for implementing the algorithm. Section 4
shows the results and performance of our proposed model. Section 5 concludes the
paper and provides future research directions.
2 Literature Survey
Predicting Chronic Kidney Disease (CKD) using machine learning is a broad and
evolving field, as demonstrated by various in-depth studies. In the research con-
ducted by Xiao et al. [20], a thorough examination of clinical and blood biochemical
measurements from 551 patients with proteinuria was undertaken. Multiple machine
learning models, including RF, XGB, LR, elastic net (ElasNet), lasso, ridge regres-
sion, KNN, Support Vector Machine (SVM), and Artificial Neural Network (ANN),
were compared for CKD risk prediction. The study revealed that ElasNet, lasso,
ridge, and LR demonstrated superior predictive performance, with LR ranking first,
achieving an AUC of 87.3%. In [21], researchers utilized various algorithms, includ-
ing SVM, AdaBoost, Linear Discriminant Analysis (LDA), and gradient boosting
(GBoost). Notably, the gradient boosting classifier emerged as the top performer,
achieving an impressive accuracy of 99.80%.
Similarly, a study [22] utilized a CKD dataset to create three different CKD
prediction models using logistic regression (LR), decision tree (DT), and K-nearest
neighbors (KNN) algorithms. LR achieved the highest accuracy at 97%, surpassing
19 Prediction of CKD: A Performance Analysis of Six Machine Learning Algorithms 247
DT at 96.25% and KNN at 71.25%. Another study [23] assessed Naïve Bayes (NB),
random forest (RF), and LR models for CKD risk prediction, with accuracies of 93.9,
98.88, and 94.76%, respectively. Studies [24, 25] used data from 455 patients and
real-time datasets to develop CKD risk prediction systems using RF and artificial
neural networks (ANN), achieving accuracies of 97.12 and 94.5%.
In addition, research [26] developed a machine learning model to predict CKD,
testing various classifiers including ANN, C5.0, LR, linear support vector machine
(LSVM), K-nearest neighbors (KNN), and RF. The LSVM algorithm showed the
highest accuracy at 98.86% when used with the Synthetic Minority Over-sampling
Technique (SMOTE) and all features included. The combination of SMOTE and
feature selection by LASSO regression provided better results compared to using
LASSO regression alone.
A study [27] implemented and evaluated nine different machine learning algo-
rithms, such as XGBoost, logistic regression, lasso regression, support vector
machine, random forest, ridge regression, neural network, Elastic Net, and KNN, for
CKD prediction, with linear models showing the highest accuracy. Another study
[28] used a CKD dataset from UCI with 400 instances and 25 attributes, applying
algorithms like Naïve Bayes and KNN, with KNN achieving the best accuracy. Simi-
larly, [29] employed classifiers including extra-trees (ET), AdaBoost, KNN, GBoost,
XGB, DT, Gaussian Naïve Bayes (NB), and RF, with KNN and ET showing the best
performance, achieving accuracies of 99 and 98%.
Additionally, a study [30] proposed an ANN-based regression analysis for man-
aging sparse medical datasets. The researchers introduced new variables to enhance
the radial basis function (RBF) input-doubling technique for output signal calcula-
tion. Another study [31] presented an innovative input-doubling method based on
the classical iterative RBF neural network, evaluated using a small medical dataset
and performance metrics like Mean Absolute Error and Root Mean Squared Error.
In a different study [32], an inventive data augmentation approach was used to
improve disease categorization based on generative adversarial networks (GAN).
Experiments on the NIH chest X-ray image dataset resulted in a test accuracy of
60.3% for the Convolutional Neural Network (CNN) model. The GAN-augmented
CNN model showed enhanced performance with a test accuracy of 65.3%. Another
study [33] introduced a supervised learning methodology to develop efficient models
for predicting CKD risk.
Researchers in [34] presented a method for data assertion and sample diagnosis in
CKD, using KNN for data assertion. Six classification algorithms, including logistic
regression, random forest, support vector machine, K-nearest neighbor, Naïve Bayes
classifier, and feed-forward neural network, were evaluated for diagnostic accuracy,
with random forest achieving the highest accuracy at 99.75%.
Researchers in [35] developed a neural network model to predict the risk of
Chronic Kidney Disease (CKD), achieving a 95% accuracy on a dataset of 40,000
instances. Another study [36] applied three models—K-nearest neighbor (KNN),
support vector machine (SVM), and soft independent modeling of class analogy
(SIMCA)—to a UCI dataset to calculate CKD risk. Both SVM and KNN models
reached an accuracy of 99.7%, with SVM showing strong performance even in the
248 P. V. Baviskar et al.
presence of noise. Given the invasive and costly nature of CKD, early detection is
crucial to prevent progression to advanced stages without treatment. In [37], an SVM
machine learning classifier achieved an accuracy of 93%.
In [38], researchers proposed early CKD detection for diabetic patients using
machine learning classifiers, utilizing data from a diabetes research center in Chennai.
The Naive Bayes classifier achieved the highest accuracy at 91%. Another study
[39] explored the use of Decision Tree, Random Forest, and SVM, including SVM
with various functions, on the MIMIC-II database, finding that Random Forest and
Decision Tree yielding prediction accuracies of 80 and 87%, respectively.
The authors in [40] built a model with various machine learning classifiers, con-
cluding that the Multiclass Decision Forest algorithm was best suited for the CKD
dataset, achieving an accuracy of 99.1%. Researchers [41] utilized the SVM algo-
rithm for CKD prediction, employing feature selection through two approaches:
Wrapper and filter. SVM achieved the highest accuracy, reaching 98.5%. A few
researchers from [42] worked on a CKD dataset from UCI, preprocessing data, iden-
tifying missing data, filling it with zeros, and then applying algorithms for important
attributes. The K-Nearest Neighbor (KNN) classifier achieved the highest accuracy.
Similarly, the authors in [42, 43] used machine learning models for prediction of
diseases too and have achieved accuracy of 98 and 93%.
Table 1 provides a comparative analysis, summarizing prediction models, dataset
collections, and accuracies. It offers insights into the diverse approaches used in
CKD prediction, emphasizing the importance of tailored solutions based on dataset
characteristics and algorithmic strengths.
3 Proposed System
as its core component, leveraging its ability to outperform other classifiers in CKD
prediction tasks. Our experimental results demonstrate that the SVM-based hybrid
model surpasses individual classifiers when trained on comprehensive feature sets
under similar conditions. Furthermore, existing literature primarily addresses this
challenge through advanced feature engineering and selection techniques. Naive
Bayes and Random Forest algorithms are recognized for their capability to discern
underlying data structures compared to alternative methods.
3.1 Dataset
The Chronic Kidney Disease dataset was used in this study. It was retrieved from
the Kaggle machine learning repository. The data was collected over a 2-month
period in India and included 25 features such as RBC count, WBC count, subject ID,
diabetes, hypertension, creatinine, urea, albuminuria, age, gender, GFR, and CKD
risk evaluation. The aim is the ‘classification’, which is either ‘ckd’ or ‘notckd’. CKD
stands for chronic renal disease. There are 400 rows.
3.2 Architecture
and inconsistencies to ensure data integrity. Relevant features are carefully selected to
enhance the predictive power of the model. Numerical variables are scaled to prevent
bias due to differing scales, while categorical variables are encoded to enable their
use in algorithms. Class imbalance is addressed through various techniques such as
oversampling or undersampling. Finally, the dataset is split into training and testing
sets, preserving class distributions to accurately evaluate model performance. These
preprocessing steps are crucial in transforming the raw data into a suitable format,
ultimately improving the accuracy and reliability of the CKD prediction model.
J48 Algorithm: The J48 algorithm, derived from the C4.5 algorithm, is exten-
sively applied in both categorical and continuous data analysis across diverse fields.
Notably, it finds utility in interpreting clinical data for diagnosing coronary heart
disease, classifying E-governance data, and similar tasks.
RF Algorithm: Random Forest stands out as a popular supervised learning algo-
rithm adept at handling classification and regression problems. It uses ensemble
learning, which combines numerous classifiers to solve complex issues and improve
model performance. Random Forest consists of several decision trees built on dis-
tinct subsets of the dataset, and averaging their outputs improves the model’s forecast
accuracy.
The K-Nearest Neighbor Algorithm: The K-nearest neighbor (KNN) technique,
a nonparametric method, is frequently used in classification and regression tasks. It
works by evaluating the K-nearest training instances in the feature space. In KNN
classification, an object’s class membership is determined by a majority vote among
its K-nearest neighbors. When K = 1, the object is allocated to the class of its single
nearest neighbor.
Naïve Bayes Algorithm: Leveraging Bayes’ Theorem, the Naïve Bayes algorithm is
a classification technique assuming independence among predictors. It computes the
probability of an object belonging to a specific class based on the presence of various
features, selecting the class with the highest probability. The algorithm presupposes
that the presence of a particular feature in a class is unrelated to the presence of any
other feature.
SVM Algorithm: Support Vector Machines (SVMs) are supervised learning models
employed for classification and regression analysis. They construct a model by scru-
tinizing a set of training examples, assigning new examples to different categories
based on their attributes. SVMs map examples as points in a space, endeavoring to
establish a distinct gap between various categories. Subsequently, new examples are
mapped into this space and forecasted to belong to a particular category based on
their position relative to the gap.
MLP Algorithm: A Multilayer Perceptron (MLP) represents a fully connected feed-
forward artificial neural network, comprising multiple layers of perceptrons inter-
connected to subsequent layers. Often termed ‘vanilla’ neural networks, particularly
with a single hidden layer, MLPs are widely employed for tasks like classification and
regression. They possess the capability to discern intricate patterns and relationships
within the data.
19 Prediction of CKD: A Performance Analysis of Six Machine Learning Algorithms 251
After implementing this algorithm our proposed model evaluates the performance
of the algorithms using evaluation parameter precision, recall, F1-score, and accu-
racy.
4 Results
In this section, we present the outcomes of the developed system for predicting
heart disease. The performance of the algorithm is evaluated using metrics such as
Accuracy, Precision (P), Recall (R), and F-measure. Precision, as defined in Eq.
(1), offers a correct assessment of positive analysis, while Recall (Eq. 2) indicates
the quantity of true positives accurately identified. Accuracy is assessed through the
F-measure (Eq. 3):
TP
Precision = (1)
TP + FP
TP
Recall = (2)
TP + FN
2 ∗ Precision ∗ Recall
F-Measure = (3)
Precision + Recall
• True Positive (TP ): The test is positive as well as the patient has the CKD.
• False Positive (FP): The test is positive although the patient does not have CKD.
• True Negative (TN): The test is negative as well as the patient does not have CKD.
• False Negative (FN): The patient does have CKD, but the test came back negative.
The experimentation involves the utilization of the preprocessed dataset for con-
ducting tests, exploring and applying the aforementioned techniques. Table 2 shows
the comparative analysis and results obtained on the CKD dataset.
Table 2 shows the performance of six machine learning algorithms. SVM was
performs with the highest Precision (98%), Recall (96%), F-Measure (97%), and
Accuracy (97%). Random Forest also performed well with Precision (97%), Recall
(93%), F-Measure (95%), and Accuracy (91%). While J48, Naive Bayes, MLP, and
KNN showed good results, they were slightly outperformed by SVM and Random
Forest, making these two the most reliable for CKD prediction.
The result obtained for all six algorithms on the evaluation parameters is shown
in Figs. 2 and 3 and ROC Curve is depicted in Fig. 4.
From the graphs, it is observed that SVM classifier has gained the highest accuracy
of 97% as compared to other algorithms. Random Forest and MLP have shown the
accuracy of 89%.
5 Conclusion
References
1. Chen Z, Zhang X, Zhang Z (2016) Clinical risk assessment of patients with chronic kidney
disease by using clinical data and multivariate models. Int Urol Nephrol 48:2069–2075. https://
[Link]/10.1007/s11255-016-1346-4
2. Charleonnan A et al (2016) Predictive analytics for chronic kidney disease using machine
learning techniques. In: 2016 Management and innovation technology international conference
(MITicon). IEEE
254 P. V. Baviskar et al.
3. Subasi A, Alickovic E, Kevric J (2017) Diagnosis of chronic kidney disease by using ran-
dom forest. In: Badnjevic A (eds) CMBEBIH 2017. IFMBE Proceedings, vol 62. Springer,
Singapore. [Link]
4. Zhang L et al (2012) Prevalence of chronic kidney disease in China: a cross-sectional survey.
Lancet 379(9818):815–822
5. Chen Z, Zhang Z, Zhu R, Xiang Y, Harrington PB (2016) Diagnosis of patients with chronic
kidney disease by using two fuzzy classifiers. Chemometrics Intell Lab Syst 153:140145
6. Subasi A, Alickovic E, Kevric J (2017) Diagnosis of chronic kidney disease by using random
forest. In: Proceedings of the international conference on medical and biological engineering,
pp 589–594
7. Zhang L (2012) Prevalence of chronic kidney disease in China: A crosssectional survey. Lancet
379:815822
8. Singh A, Nadkarni G, Gottesman O, Ellis SB, Bottinger EP, Guttag JV (2015) Incorporating
temporal EHR data in predictive models for risk stratification of renal function deterioration.
J Biomed Inform 53:220228
9. Cueto-Manzano AM, Cortés-Sanabria L, Martínez-Ramírez HR, Rojas-Campos E, Gómez-
Navarro B, Castillero-Manzano M (2014) Prevalence of chronic kidney disease in an adult
population. Arch Med Res 45(6):507513
10. Polat H, Mehr HD, Cetin A (2017) Diagnosis of chronic kidney disease based on support vector
machine by feature selection methods. J Med Syst 41(4):55
11. Barbieri C, Mari F, Stopper A, Gatti E, Escandell-Montero P, Martínez-Martínez JM, Martín-
Guerrero JD (2015) A new machine learning approach for predicting the response to anemia
treatment in a large cohort of end stage renal disease patients undergoing dialysis. Comput Biol
Med 61:5661
12. Papademetriou V, Nylen ES, Doumas M, Probsteld J, Mann JF, Gilbert RE, Gerstein HC
(2017) Chronic kidney disease, basal insulin glargine, and health outcomes in people with
dysglycemia: the ORIGIN study. Am J Med 130(12):1465.e27–1465.e39
13. Hill NR (2016) Global prevalence of chronic kidney disease: a systematic review and meta-
analysis. PLoS One 11(7), Art. no. e0158765
14. Hossain MM, Detwiler RK, Chang EH, Caughey MC, Fisher MW, Nichols TC, Merricks
EP, Raymer RA, Whitford M, Bellinger DA, Wimsey LE, Gallippi CM (2019) Mechanical
anisotropy assessment in kidney cortex using ARFI peak displacement: preclinical validation
and pilot in vivo clinical results in kidney allografts. IEEE Trans Ultrason Ferroelectr Freq
Control 66(3):551–562
15. Alloghani M, Al-Jumeily D, Baker T, Hussain A, Mustana J, Aljaaf AJ (2018) Applications of
machine learning techniques for software engineering learning and early prediction of students’
performance. In: Proceedings of the international conference on soft computing in data science,
pp 246–258
16. Gupta D, Khare S, Aggarwal A (2016) A method to predict diagnostic codes for chronic
diseases using machine learning techniques. In: Proceedings of the international conference
on computing, communication and automation (ICCCA), pp 281–287
17. Du L, Xia C, Deng Z, Lu G, Xia S, Ma J (2018) A machine learning based approach to identify
protected health information in Chinese clinical text. Int J Med Inform 116:2432
18. Abbas R, Hussain AJ, Al-Jumeily D, Baker T, Khattak A (2018) Classification of foetal distress
and hypoxia using machine learning approaches. In: Proceedings of the international conference
on intelligent computing, pp 767–776
19. Mahyoub M, Randles M, Baker T, Yang P (2018) Comparison analysis of machine learning
algorithms to rank Alzheimer’s disease risk factors by importance. In: Proceedings of the 11th
international conference on developments in eSystems engineering (DeSE), pp 1–11
20. Xiao J, Ding R, Xu X, Guan H, Feng X, Sun T, Zhu S, Ye Z (2019) Comparison and development
of machine learning tools in the prediction of chronic kidney disease progression. J Transl Med
17:119
21. Ghosh P, Shamrat FJM, Shultana S, Afrin S, Anjum AA, Khan AA (2020) Optimization of
prediction method of chronic kidney disease using machine learning algorithm. In: Proceedings
19 Prediction of CKD: A Performance Analysis of Six Machine Learning Algorithms 255
of the 2020 15th international joint symposium on artificial intelligence and natural language
processing (iSAI-NLP), Bangkok, Thailand, pp 1–6
22. Ifraz GM, Rashid MH, Tazin T, Bourouis S, Khan MM (2021) Comparative analysis for pre-
diction of kidney disease using intelligent machine learning methods. Comput Math Methods
Med 2021:6141470
23. CKD Prediction Dataset. Available online: [Link]
chronic-kidney-disease. Accessed on 27 June 2022
24. Islam MA, Akter S, Hossen MS, Keya SA, Tisha SA, Hossain S (2020) Risk factor prediction
of chronic kidney disease based on machine learning algorithms. In: Proceedings of the 2020
3rd international conference on intelligent sustainable systems (ICISS), Palladam, India, pp
952–957
25. Yashfi SY, Islam MA, Sakib N, Islam T, Shahbaaz M, Pantho SS (2020) Risk prediction of
chronic kidney disease using machine learning algorithms. In: Proceedings of the 2020 11th
international conference on computing, communication and networking technologies (ICC-
CNT), Kharagpur, India, pp 1–5
26. Chittora P, Chaurasia S, Chakrabarti P, Kumawat G, Chakrabarti T, Leonowicz Z, Jasiński M,
Jasiński Ł, Gono R, Jasińska E et al (2021) Prediction of chronic kidney disease-a machine
learning perspective. IEEE Access 9:17312–17334
27. Xiao J, Ding R, Xu X, Guan H, Feng X, Sun T, Zhu S, Ye Z (2019) Comparison and development
of machine learning tools in the prediction of chronic kidney disease progression. J Transl Med
17:119
28. Drall S, Drall GS, Singh S, Naib BB (2018) Chronic kidney disease prediction using machine
learning: a new approach. Int J Manage Technol Eng 8:278–287
29. Baidya D, Umaima U, Islam MN, Shamrat FJM, Pramanik A, Rahman MS (2022) A deep
prediction of chronic kidney disease by employing machine learning method. In: Proceedings
of the 2022 6th international conference on trends in electronics and informatics (ICOEI),
Tirunelveli, India, pp 1305–1310
30. Izonin I, Tkachenko R, Dronyuk I, Tkachenko P, Gregus M, Rashkevych M (2021) Predictive
modeling based on small data in clinical medicine: RBF-based additive input-doubling method.
Math Biosci Eng 18:2599–2613
31. Izonin I, Tkachenko R, Fedushko S, Koziy D, Zub K, Vovk O (2021) RBF-based input doubling
method for small medical data processing. In: Proceedings of the international conference on
artificial intelligence and logistics engineering, Kyiv, Ukraine. Springer, Berlin/Heidelberg,
Germany, pp 23–31
32. Bhattacharya D, Banerjee S, Bhattacharya S, Uma Shankar B, Mitra S (2020) GAN-based
novel approach for data augmentation with improved disease classification. In: Advancement
of machine intelligence in interactive medical image analysis. Springer, Berlin/Heidelberg,
Germany, pp 229–239
33. Dritsas E, Trigka M (2022) Machine learning techniques for chronic kidney disease risk pre-
diction. Big Data Cogn Comput 6:98. [Link]
34. Qin J, Chen L, Liu Y, Liu C, Feng C, Chen B (2020) A machine learning methodology for
diagnosing chronic kidney disease. IEEE Access 8:2099121002
35. Vasquez-Morales GR, Martinez-Monterrubio SM, Moreno-Ger P, Recio-Garcia JA (2019)
Explainable prediction of chronic renal disease in the Colombian population using neural
networks and case-based reasoning. IEEE Access 7:152900152910
36. Chen Z, Zhang X, Zhang Z (2016) Clinical risk assessment of patients with chronic kidney
disease by using clinical data and multivariate models. Int Urol Nephrol 48(12):20692075
37. Amirgaliyev Y, Shamiluulu S, Serek A (2018) Analysis of chronic kidney disease dataset by
applying machine learning methods. In: Proceedings IEEE 12th international conference on
application of information and communication technologies (AICT), pp 1–4
38. Padmanaban KRA, Parthiban G (2016) Applying machine learning techniques for predicting
the risk of chronic kidney disease. Indian J Sci Technol 9(29)
39. Kilvia De Almeida L, Lessa L, Peixoto A, Gomes R, Celestino J (2020) Kidney failure detec-
tion using machine learning techniques. In: Proceedings of the 8th international workshop on
advances in ICT infrastructures and services, pp 1–8
256 P. V. Baviskar et al.
Tamanna, Subarna Rana, Vaibhav Kant Agrawal, and Manoj Kumar Dasi
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 257
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
258 Tamanna et al.
2 Literature Survey
Generative models have been around since the 1950s, starting with models like
HMMs [5] and GMMs [6] that produced sequential data. However, their capabilities
reached new heights with the rise of deep learning, leading to substantial improve-
ments in both performance and versatility. GenAI has gained tremendous interest
in various fields other than Computer Science for generating specific content such
as text, audio, and images etc. [7]. Transformer neural architecture based on the
encoder-decoder method is the core strength of GenAI state-of-the-art models such
as GPT-2 [8], DALL-E-2 [9], and Gopher [10]. It has resolved the limitations of tradi-
tional neural networks such as Recurrent Neural Networks (RNN) using attention
mechanisms [11]. Transformer based pre-trained language models (PTLM) exhibit
superior performance in terms of relevant content generation. Highly efficient GPUs
such as NVIDIA A100, distributed training, and cloud computing have unleashed
immense potential for GenAI applications. The current PTLMs can be categorized
as shown in Table 1.
The model we are going to use in this study is GPT-3.5 which is an autoregressive
language model and suitable for text generation based on user instructions. It is like
a word predictor. It looks at the words that have already been written and uses that
information to guess what the next word should be. This makes it great for creating
text, as it can generate it one word at a time.
3 Case Study
Craft-Persona-Rx app is built using RAG based LLM. In this app we are using
OpenAI LLM model GPT-3.5 [14] also known as “writer’s best friend” because of the
capability of generating high quality content. RAG helps in bringing the dynamicity
in static paramedic knowledge of LLMs. It also helps in providing the correct context
for generating application specific content.
260 Tamanna et al.
the brand of medicine, target audience, persona of target audience, and theme of
content (Table 2).
Now we are ready with a cleaned and mapped database with all the attributes. For
this case study we have created dummy data for avoiding conflict of interest. As this
field is new, we couldn’t find any publicly available data for the same. For fine tuning
the LLM we used live examples of medicine brands that are publicly available on
their websites and created this Database (DB). Our DB consists of 120 rows (thirty
rows for every brand) and five columns (Table 2).
This DB consists of following headers: Target Audience, Brand Content, Brand,
Persona and Theme. We have taken four fictitious brands:
• Migraineaid (for migraine)
• Glucoreg (for diabetes)
• Mobilium (for arthiritis), and
• Memora (for alzhiemer).
We have two types of target audiences: Patients and doctors. As both have different
mindset that’s why marketing content is also different for both of them. Next thing is
persona which means personality, further categorization for understanding behavior
of target audiences. In this DB we have four personas two for doctors and two for
patients. Conservative (who stick to proven methods only) and Pathfinder (who search
for new methods for curing patients) are types of doctors while naysayer (difficult to
please) and optimistic (open to experiment new things) are for patients.
Themes are the identified topics on which we want to create our content like
safety oriented or cost-effective etc. These themes are helpful in creating variations
in generating content with touch of personalization. Based on the above discussed
headers, we have brand content that satisfies above categorization.
Step 3
Now when the user (healthcare marketers) wants to create marketing content (banner
or emails) comes up with specific needs like content type, brand, persona and target
audience.
Step 4 and 5
After taking the desired information from the user, the orchestrator will fetch
respective data (Brand Content) from DB.
Step 6
We have two prompt [15] channels for content generation: one for emails and one for
banners. Here we select the respective prompt according to user preference. Creating
and optimizing prompt according to application specifications is not a one time
process as it requires various iterations and experimentation (like no shot training,
one shot training and multi-shot training etc.) For this case study we used one shot
training for prompt optimization (it suits well for our dataset).
Step 7
After prompt selection and gathering the desired set of information we are ready
with the final prompt.
Step 8
This final prompt is fed to LLM for generating the content as per specifications. In
this case study we restrict our prompt to create text content only and not images or
any other media element.
Step 9
After getting the response from LLM, our next step is validation. Due to the sensitive
nature of marketing for healthcare brands, there are several key areas where LLMs
should be excluded from responding like Direct diagnosis or treatment recommen-
dations, Emotional appeals or fear tactics, or discriminative language, and false and
misleading information. For avoiding all these types of content, validating LLM
response becomes an essential step. For avoiding all these types of content, vali-
dating LLM response becomes an essential step. We have a list of exclusions for
catering this kind of generated data and also used guardrails in the prompt itself.
For accessing the quality of generated text, we used a combination of evaluation
metrics called READ SCORE. Single performance metric is not able to evaluate the
quality of output that is why we use combined score for covering different dimensions
20 Unlocking the Power of Personalized Content with Generative AI … 263
of evaluation (like BERT score is about accessing the goodness of generated text,
while ARI is for accessing readability complexity). READ SCORE of the generated
text, which is a combination of BERT Score [14], Automated Readability Index [16],
and Linsear Write [17].
AutomatedReadabilityIndex
+ BERTScore
+ LinsearWrite
READSCORE = 2.8 10
3
All the scores are on a different scale (like BERT is between -1 to 1, ARI is in 1
to 28). To bring them on the same scale, we normalize them in a range of 0 to 10. For
maintaining the quality of generated text, we set a threshold of 5. If the LLM output
READ Score is less than 5, it is rejected, and the output is generated again. If LLM
output passes all the validation checks; then it will be shown as the final response
to the user, otherwise it will be fed to LLM again until it passes all the validation
checks.
In this section we explained how GenAI based solutions are helpful in shortening the
marketing life cycle with less human dependency and resource efficiency in terms
of time and cost with more layers of personalization.
Layer Based Personalization
As explained earlier here our main goal is to provide flexibility to users for creating
personalized content with minimal information. In Craft-Persona-Rx we have person-
alization layers which enable users to generate content with any variation they want.
We have designed a user interface (UI) for the healthcare marketers whereby they
can select various personalization options to generate content (Fig. 2).
If we want to generate banners for patients with an ‘optimistic’ profile and
Migraineaid brand, the same can be selected in the UI and output generated by
clicking the Generate Content button (Fig. 3).
The options selected by the user get embedded in a prompt through the DB which
is then processed by the LLM. Prompt plays a vital role in content generation. It is a
set of instructions from LLM to generate the content. A sample prompt is as shown
below in italics:
You are an expert marketing content generator in the field of Healthcare marketing.
Your task is to help the USER generate {WHAT TO GENERATE} for the brand
{BRAND}. The marketing content is to be generated for {TARGET AUDIENCE} with
a personality of {PERSONA} and content should highlight the {THEME} aspects of
the brand.
Follow the below mentioned rules strictly while generating the content:
Do not use derogatory language.
Use polite and respectful language.
264 Tamanna et al.
Fig. 2 Craft-Persona-Rx UI
References
1. Haeba Ramli A, Sjahruddin H (2015) Building patient loyalty in healthcare services. Int Rev
Manag Bus Res 4:391–401
2. Lee D, Yoon SN (2021) Application of artificial intelligence-based technologies in the health-
care industry: opportunities and challenges. Int J Environ Res Public Health 18:271. https://
[Link]/10.3390/ijerph18010271
3. Ali O, Abdelbaki W, Shrestha A, Elbasi E, Alryalat MAA, Dwivedi YK (2023) A system-
atic literature review of artificial intelligence in the healthcare sector: Benefits, challenges,
20 Unlocking the Power of Personalized Content with Generative AI … 267
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 269
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
270 N. Bansal et al.
Today Large Language Models (LLMs) have emerged as a leading technology in the
field of artificial intelligence. Their adeptness in understanding and generating human
language, along with their strong capabilities in generalization and reasoning, has
greatly enhanced their performance. Additionally, their ability to adapt to new tasks
and domains highlights their versatility. Consequently, there is a growing interest in
utilizing LLMs to transform recommender systems, with the goal of providing users
with personalized and high-quality recommendations. Given the rapid progress in
this area, this paper examines the current landscape of LLM-powered recommender
systems in various domains.
272 N. Bansal et al.
Knowledge graphs, with nodes as entities and edges as relations, enhance recom-
mender performance as side information and act as the common format of knowledge
bases. LLMs have demonstrated remarkable proficiency in retrieving factual knowl-
edge, akin to explicit knowledge bases [29–37]. This ability opens up avenues for
constructing more comprehensive knowledge graphs within recommender systems.
Existing research [29] underscores how LLMs excel in storing not only factual infor-
mation but also common-sense knowledge, which can then be effectively applied to
downstream tasks. However, despite the promise offered by LLMs, existing methods
[38, 39] in knowledge graph construction face challenges in handling incomplete
knowledge graphs and integrating textual corpus data. Researchers [40, 41] have
commenced exploring how LLMs can address these challenges, particularly through
knowledge completion and construction tasks. In the realm of knowledge graph
completion, efforts are directed towards leveraging LLM models such as MTL-KGC
[42], MEMKGC [43], StAR [44], GenKGC [45], TagReal [46], and AutoKG [47]
to encode text or generate missing facts within knowledge graphs. This involves
enhancing the completeness of knowledge graphs by inferring and adding missing
information. On the other hand, knowledge graph construction involves the struc-
tured representation of knowledge, including entity discovery [48, 49], coreference
resolution [50, 51], and relation extraction [47, 52]. LLMs offer potential solutions
for each of these subtasks, allowing for more accurate and comprehensive knowledge
graph construction. Moreover, LLMs show promise in enabling end-to-end construc-
tion [53, 54] wherein they directly build knowledge graphs from raw text data. This
holistic approach streamlines the process of knowledge graph construction, elimi-
nating the need for intermediate steps and enhancing efficiency. However, LLMs due
to their inherent nature may introduce ambiguity or inaccurate information can mani-
fest as extraneous information or noise in the recommendation process [42], leading
to responses that lack informative context or relevance, despite being syntactically
correct.
The advancement of large language models (LLMs) has revealed their remarkable
reasoning abilities when sufficiently scaled, akin to human intelligence in decision-
making and problem-solving [55]. Through techniques like “chain of thoughts”
prompting, LLMs can exhibit emergent reasoning skills, enabling them to draw
conclusions based on evidence or logic [56]. In the realm of recommender systems,
these reasoning capabilities empower LLMs to enhance user interest mining, thereby
improving overall performance. Additionally, LLMs demonstrate “step by step”
reasoning, leveraging prompts that include intermediate steps to tackle complex
21 Web Personalization with Large Language Models: Challenges … 273
tasks effectively [56]. For instance, Wang and Lim [57] introduce NIR, a three-
step prompt designed to capture user preferences, filter items, and re-rank recom-
mendations. Moreover, Automated Machine Learning (AutoML) is increasingly
utilized in recommender systems to streamline manual setup processes, particularly
in optimizing embedding sizes [43–46] and other facets like feature selection and
model architecture. However following challenges needs to addressed effectively
by LLMs for significant improvements in automated learning approaches effective
recommendation algorithms and systems:
• Complex Search Space: the search space within recommender systems is notably
complex, encompassing diverse types and facing volume issues, making effective
exploration and optimization challenging.
• Lack of Foundation: Unlike other domains with well-established network struc-
tures, recommender systems lack a strong foundation of knowledge about the
informative components within the search space, particularly regarding effective
high-order feature interactions. Further, This knowledge gap is compounded by
the diverse and domain-specific nature of recommender systems, operating across
various scenarios.
presenting users with more personalized and tailored choices. For instance, users may
actively engage in the decision-making process by inputting both textual descriptions
and visual preferences into the system, enhancing the granularity and specificity of
recommendations.
3 Future Outlook
4 Conclusion
Personalization in large language models (LLMs) for user services poses several
critical challenges that need to be addressed. Firstly, achieving effective person-
alization requires a deep understanding of user preferences, which often involves
domain-specific knowledge beyond the general knowledge acquired by LLMs
through training. This suggests that adapting LLMs to effectively cater to personal-
ized services remains a significant unresolved issue. Moreover, there are concerns
regarding privacy when using LLMs for personalization. Since LLMs have the
capacity to memorize users’ confidential information to provide personalized
services, there’s a legitimate worry about safeguarding user privacy. Ensuring that
LLMs maintain privacy while delivering personalized experiences is crucial for
building trust with users. Additionally, LLMs trained on internet data are suscep-
tible to exposure bias, leading to potentially unfair predictions for minority groups.
This highlights the importance of mitigating biases in LLMs to ensure fair and
equitable outcomes for all users. To tackle these challenges, the research commu-
nity needs comprehensive benchmarks and evaluation datasets. However, the current
availability of such resources is limited, indicating a gap that needs to be addressed
through collaborative efforts within the research community. Furthermore, to fully
harness the potential of LLMs for personalization, it’s essential to establish systematic
methodological and experimental frameworks. These frameworks should encom-
pass various perspectives, including understanding user preferences, addressing
privacy concerns, mitigating biases, and evaluating model performance accurately.
In summary, addressing the challenges associated with personalization using LLMs
requires a multidimensional approach involving domain-specific knowledge, privacy
protection measures, bias mitigation strategies, and the development of robust evalu-
ation frameworks. Collaboration within the research community is key to advancing
research in this area and realizing the full potential of personalized services powered
by LLMs.
References
1. Gao Y, Sheng T, Xiang Y, Xiong Y, Wang H, ZhangJ.(2023) Chat-rec: towards interactive and
explainable llms-augmented recommender system. arXiv preprint arXiv:2303.14524
2. Chen J, Ma L, Li X, Thakurdesai N, Xu J, Cho JH, ... Achan K (2023) Knowledge graph
completion models are few-shot learners: An empirical study of relation labeling in e-commerce
with llms. arXiv preprint arXiv:2305.09858
3. Chen X, Fan W, Chen J, Liu H, Liu Z, Zhang Z, Li Q (2023) Fairly adaptive negative sampling
for recommendations. In: Proceedings of the ACM web conference 2023, pp 3723–3733
4. Fan W, Zhao X, Chen X, Su J, Gao J, Wang L, ... Li Q (2022) A comprehensive survey on
trustworthy recommender systems. arXiv preprint arXiv:2209.10117
5. Zhang S, Yao L, Sun A, Tay Y (2019) Deep learning based recommender system: a survey and
new perspectives. ACM Comput Surv (CSUR) 52(1):1–38
6. Fan W, Liu C, Liu Y, Li J, Li H, Liu H, ... Li Q (2023) Generative diffusion models on graphs:
methods and applications. arXiv preprint arXiv:2302.02591
21 Web Personalization with Large Language Models: Challenges … 279
30. Roberts A, Raffel C, Shazeer N (2020) How much knowledge can you pack into the parameters
of a language model? arXiv preprint arXiv:2002.08910
31. Petroni F, Lewis P, Piktus A, Rocktäschel T, Wu Y, Miller AH, Riedel S (2020) How context
affects language models’ factual predictions. In: Automated knowledge base construction
32. Jiang Z, Xu FF, Araki J, Neubig G (2020) How can we know what language models know?
Trans Assoc Comput Linguist 8:423–438
33. Wang C, Liu X, Song D (2020) Language models are open knowledge graphs. arXiv preprint
arXiv:2010.11967
34. Poerner N, Waltinger U, Schütze H (2020) E-bert: efficient-yet-effective entity embeddings for
bert. In: Findings of the association for computational linguistics: EMNLP 2020, pp 803–818
35. Heinzerling B, Inui K (2021) Language models as knowledge bases: On entity representa-
tions, storage capacity, and paraphrased queries. In: Proceedings of the 16th conference of the
european chapter of the association for computational linguistics: Main Volume (pp 1772–1791)
36. Wang C, Liu P, Zhang Y (2021) Can generative pre-trained language models serve as knowl-
edge bases for closed-book qa? In: Proceedings of the 59th annual meeting of the association
for computational linguistics and the 11th international joint conference on natural language
processing (Volume 1: Long Papers) (pp 3241–3251)
37. Guu K, Lee K, Tung Z, Pasupat P, Chang M (2020) Retrieval augmented language model
pre-training. In: the International conference on machine learning, pp 3929–3938
38. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings
for modeling multi relational data. Advances in neural information processing systems, 26
39. Zhu Y, Wang X, Chen J, Qiao S, Ou Y, Yao Y, ... Zhang N (2023) Llms for knowledge graph
construction and reasoning: Recent capabilities and future opportunities. arXiv preprint arXiv:
2305.13168
40. Zhang Z, Liu X, Zhang Y, Su Q, Sun X, He B (2020) Pretrainkge: Learning knowledge repre-
sentation from pretrained language models. In: Findings of the association for computational
linguistics: EMNLP 2020 (pp 259–266)
41. Kumar A, Pandey A, Gadia R, Mishra M (2020) Building a knowledge graph using a pre-
trained language model for learning entity-aware relationships. In: 2020 IEEE international
conference on computing, power and communication technologies (GUCON) (pp 310–315).
IEEE
42. Razniewski S, Yates A, Kassner N, Weikum G (2021) Language models as or for knowledge
bases. arXiv preprint arXiv:2110.04888
43. Liu S, Gao C, Chen Y, Jin D, Li Y (2021) Learnable embedding sizes for recommender systems.
arXiv preprint arXiv:2101.07577
44. Liu H, Zhao X, Wang C, Liu X, Tang J (2020) Automated embedding size search in deep
recommender systems. In: Proceedings of the 43rd International ACM SIGIR conference on
research and development in information retrieval (pp 2307–2316)
45. Deng W, Pan J, Zhou T, Kong D, Flores A, Lin G (2021). Deeplight: deep lightweight feature
interactions for accelerating ctr predictions in ad serving. In: Proceedings of the 14th ACM
international conference on Web search and data mining (pp 922–930)
46. Ginart AA, Naumov M, Mudigere D, Yang J, Zou J (2021) Mixed dimension embeddings
with application to memory efficient recommendation systems. In: 2021 IEEE International
symposium on information theory (ISIT) (pp 2786–2791). IEEE
47. Wang H, Focke C, Sylvester R, Mishra N, Wang W (2019) Fine-tune bert for docred with
two-step process. arXiv preprint arXiv:1909.11898
48. Yan H, Gui T, Dai J, Guo Q, Zhang Z, Qiu X (2021) A unified generative framework for various
ner subtasks. In: Proceedings of the 59th annual meeting of the association for computational
linguistics and the 11th international joint conference on natural language processing (Volume
1: Long Papers) (pp 5808–5822)
49. Li B, Yin W, Chen M (2022) Ultra-fine entity typing with indirect supervision from natural
language inference. Trans Assoc Comput Linguist 10:607–622
50. Kirstain Y, Ram O, Levy O (2021) Coreference resolution without span representations. In:
Proceedings of the 59th annual meeting of the association for computational linguistics and the
21 Web Personalization with Large Language Models: Challenges … 281
11th international joint conference on natural language processing (Volume 2: Short Papers)
(pp 14–19)
51. Cattan A, Eirew A, Stanovsky G, Joshi M, Dagan I (2021) Cross-document coreference reso-
lution over predicted mentions. In: Findings of the association for computational linguistics:
ACL-IJCNLP 2021 (pp 5100–5107)
52. Lyu S, Chen H (2021) Relation classification with entity type restriction. In: Findings of the
association for computational linguistics: ACL-IJCNLP 2021 (pp 390–395)
53. Han J, Collier N, Buntine W, Shareghi E (2023) Pive: prompting with iterative verification
improving graph-based generative capability of llms. arXiv preprint arXiv:2305.12392
54. Trajanoska M, Stojanov R, Trajanov D (2023) Enhancing knowledge graph construction using
large language models. arXiv preprint arXiv:2305.04676
55. Wei J, Tay Y, Bommasani R, Raffel C, Zoph B, Borgeaud S, ... Zhou D (2022) Emergent
abilities of large language models. arXiv preprint arXiv:2206.07682
56. Wei J, Wang X, Schuurmans D, Bosma M, Chi E, Le Q, Zhou D (2022) Chain of thought
prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903
57. Wang L, Lim E-P (2023) Zero-shot next-item recommendation using large pretrained language
models. arXiv preprint arXiv:2304.03153
58. Wen T-H, Vandyke D, Mrksic N, Gasic M, Rojas-Barahona LM, Su P-H, Ultes S, Young S
(2016) A network-based end-to-end trainable task-oriented dialogue system. arXiv preprint
arXiv:1604.04562
59. Zhang Y, Sun S, Galley M, Chen Y-C, Brockett C, Gao X, ... Dolan B (2019) Dialogpt: Large-
scale generative pre training for conversational response generation. arXiv preprint arXiv:1911.
00536
60. Yao K, Zweig G, Hwang M-Y, Shi Y, Yu D (2013) Recurrent neural networks for language
understanding. In Interspeech (pp 2524–2528)
61. Mesnil G, He X, Deng L, Bengio Y (2013) Investigation of recurrent-neural-network
architectures and learning methods for spoken language understanding. In: Interspeech (pp
3771–3775)
62. Mrkšić N, Séaghdha DO, Wen T-H, Thomson B, Young S (2016) Neural belief tracker: Data-
driven dialogue state tracking. arXiv preprint arXiv:1606.03777
63. Cuayáhuitl H, Keizer S, Lemon O (2015) Strategic dialogue management via deep reinforce-
ment learning. arXiv preprint arXiv:1511.08099
64. Zhou H, Huang M, Zhu X (2016) Context-aware natural language generation for spoken
dialogue systems. In: Proceedings of COLING 2016, the 26th international conference on
computational linguistics: technical papers (pp 2032–2041)
65. Dušek O, Jurˇcíˇcek F (2016) Sequence-to-sequence generation for spoken dialogue via deep
syntax trees and strings. arXiv preprint arXiv:1606.05491
66. Zhang X, Zou Y, Zhang H, Zhou J, Diao S, Chen J, ... Xiao Y et al. (2022) Automatic product
copywriting for e-commerce. Proc AAAI Conf Artif Intell 36(11):12 423–12 431
67. Lei Z, Zhang C, Xu X, Wu W, Niu Z-Y, Wu H, ... Li S (2022) Plato-ad: a unified advertise-
ment text generation framework with multi-task prompt learning. In: Proceedings of the 2022
conference on empirical methods in natural language processing: industry track (pp 512–520).
68. Thomaidou S, Lourentzou I, Katsivelis-Perakis P, Vazirgiannis M (2013) Automated snippet
generation for online advertising. In: Proceedings of the 22nd ACM international conference
on information & knowledge management (pp 1841–1844)
69. Bartz K, Barr C, Aijaz A (2008) Natural language generation for sponsored-search advertise-
ments. In: Proceedings of the 9th ACM conference on electronic commerce (pp 1–9)
70. Fujita A, Ikushima K, Sato S, Kamite R, Ishiyama K, Tamachi O (2010) Automatic generation
of listing ads by reusing promotional texts. In: Proceedings of the 12th international conference
on electronic commerce: roadmap for the future of electronic business (pp 179–188)
71. Hughes JW, Chang K-H, Zhang R (2019) Generating better search engine text advertisements
with deep reinforcement learning. In: Proceedings of the 25th ACM SIGKDD international
conference on knowledge discovery & data mining (pp 2269–2277)
282 N. Bansal et al.
72. Wang X, Gu X, Cao J, Zhao Z, Yan Y, Middha B, Xie X (2021) Reinforcing pretrained
models for generating attractive text advertisements. In: Proceedings of the 27th ACM SIGKDD
conference on knowledge discovery & data mining (pp 3697–3707)
73. Chen C, Wang X, Yi X, Wu F, Xie X, Yan R (2019) Personalized chit-chat generation for
recommendation using external chat corpora. In: Proceedings of the 28th ACM SIGKDD
conference on knowledge discovery and data mining (pp 2721–2731)
74. Kanungo YS, Negi S, Rajan A (2021) Ad headline generation using a self-critical masked
language model. In: Proceedings of the 2021 conference of the north American chapter of the
association for computational linguistics: human language technologies: industry papers (pp
263–271)
75. Wei P, Yang X, Liu S, Wang L, Zheng B (2022) Creater: Ctrdriven advertising text generation
with controlled pre-training and contrastive fine-tuning. arXiv preprint arXiv:2205.08943
76. Kanungo YS, Das G, Negi S (2022) Cobart: Controlled, optimized, bidirectional and auto-
regressive transformer for ad headline generation. In: Proceedings of the 28th ACM SIGKDD
conference on knowledge discovery and data mining (pp 3127–3136)
77. Chen Q, Lin J, Zhang Y, Yang H, Zhou J, Tang J (2019) Towards knowledge-based personalized
product description generation in e-commerce. In: Proceedings of the 25th ACM SIGKDD
international conference on knowledge discovery & data mining (pp 3040–3050)
78. Yu C, Liu X, Tang C, Feng W, Lv J (2023) Gpt-nas: Neural architecture search with the
generative pre-trained model. arXiv preprint arXiv:2305.05351
79. Ying C, Klein A, Christiansen E, Real E, Murphy K, Hutter F (2019) Nas-bench-101: Towards
reproducible neural architecture search. In: The international conference on machine learning
(pp 7105–7114)
80. Zheng M, Su X, You S, Wang F, Qian C, Xu C, Albanie S (2023) Can gpt-4 perform neural
architecture search? arXiv preprint arXiv:2304.10970
81. Nasir MU, Earle S, Togelius J, James S, Cleghorn C (2023) Llmatic: Neural architecture search
via large language models and quality-diversity optimization. arXiv preprint arXiv:2306.01102
82. Chen A, Dohan DM, So DR (2023) Evoprompting: Language models for code-level neural
architecture search. arXiv preprint arXiv:2302.14838
83. Koren Y, Bell RM, Volinsky C (2009) Matrix factorization techniques for recommender
systems. IEEE Comput 42(8):30–37
84. Ak KE, Kassim AA, Lim JH, Tham JY (2018) Learning attribute representations with local-
ization for flexible fashion search. In: Proceedings of the IEEE conference on computer vision
and pattern recognition, pp 7708–7717
85. Hsiao W-L, Grauman K (2018) Creating capsule wardrobes from fashion images. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, 7161–7170
86. Simo-Serra E, Fidler S, Moreno-Noguer F, Urtasun R (2015) Neuroaesthetics in fashion:
Modeling the perception of fashionability. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, 869–877
87. Vittayakorn S, Yamaguchi K, Berg AC, Berg TL (2015) Runway to realway: visual analysis of
fashion. In: 2015 IEEE winter conference on applications of computer vision. IEEE, 951–958
88. Zielnicki K (2019) Simulacra and selection: clothing set recommendation at stitch fix. In:
Proceedings of the 42nd International ACM SIGIR conference on research and development
in information retrieval, 1379–1380
89. Kumar S, Gupta MD (2019) c+GAN: complementary fashion item recommendation. KDD
’19, Workshop on AI for fashion, Anchorage, Alaska-USA
90. Huynh CP, Ciptadi A, Tyagi A, Agrawal A (2018) CRAFT: complementary Recommendation
by Adversarial Feature Transform. In: ECCV Workshops (3) (Lecture Notes in Computer
Science), 11131, 54–66. Springer
91. Kang W-C, Fang C, Wang Z, McAuley JJ (2017) Visually-aware fashion recommendation and
design with generative image models. In: 2017 IEEE international conference on data mining,
ICDM 2017, New Orleans, LA, USA, November 18–21, 2017, 207–216
92. Shih Y-S, Chang K-Y, Lin H-T, Sun M (2018) Compatibility family learning for item
recommendation and generation. In: AAAI. 2403–2410. AAAI Press
21 Web Personalization with Large Language Models: Challenges … 283
93. Yang Z, Su Z, Yang Y, Lin G (2018) From recommendation to generation: a novel fashion
clothing advising framework. 2018 7th International conference on digital home (ICDH), 1, 1,
180–186
94. Fan W, Zhao Z, Li J, Liu Y, Mei X, Wang Y, ... Li Q (2023) Recommender systems in the era
of large language models (llms). arXiv preprint arXiv:2307.02046
95. Hou Y, Zhang J, Lin Z, Lu H, Xie R, McAuley J, Zhao WX (2023) Large language models are
zero-shot rankers for recommender systems. arXiv preprint arXiv:2305.08845
96. Wang W, Lin X, Feng F, He X, Chua T-S (2023) Generative recommendation: towards next-
generation recommender paradigm. arXiv preprint arXiv:2304.03516
97. Hou X, Zhao Y, Liu Y, Yang Z, Wang K, Li L, ... Wang H (2024) Large language models for
software engineering: a systematic literature review. arXiv preprint arXiv:2308.10620
Chapter 22
Deep Learning-Based Gland
Segmentation for Enhanced Analysis of
Colon Histology Images
Ajay Kumar, Vivek Kumar, Jay Prakash Singh, and Ashok Patel
1 Introduction
The colon, being the longest segment of the large intestine, plays a pivotal role in
the digestive cascade. It functions as the recipient of partially digested food, aiding
in its subsequent processing and the efficient absorption of essential nutrients [1, 2].
After the absorption phase, the colon orchestrates the transport of waste materials
toward the rectum for eventual expulsion [3]. However, the onset of malignancies
within the colon can be ascribed to uncontrolled and aberrant cellular proliferation,
resulting in anomalous cell growth [4]. Oncogenic metamorphosis is predominantly
instigated by genomic alterations, commonly known as “gene mutations”, which,
also, act as pivotal drivers in the disease’s progression [5, 6]. These genetic aberra-
tions disrupt the innate cellular life cycle, enabling affected cells to evade apoptosis,
a process not observed in their healthy counterparts [6, 7]. Colorectal cancer may
manifest across various age groups, albeit it predominantly afflicts adults. With time,
cell clusters amass, forming polyps, minute growths within the colon. These polyps
evoke concern due to their typically asymptomatic nature, underscoring the impera-
tive for routine screening as a preventative measure and early detection to facilitate
more efficacious treatment modalities [8, 9]. Consequently, there is an urgent need
for comprehensive research and meticulous analysis in this field. Precise delineation
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 285
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
286 A. Kumar et al.
and identify malignant regions with promising results [19–22]. Another study eval-
uated artificial intelligence (AI) techniques for gland and nuclei segmentation in
histology images, analyzing 126 AI-based methodologies [23]. Traditional manual
feature extraction methods were compared with deep learning-based neural network
strategies. While acknowledging the effectiveness of R-CNN and its variants, their
limitations in histopathological contexts were highlighted, addressing challenges
such as data scarcity and color inconsistency. Promising avenues for future research
include techniques such as FCN-based Atrous spatial pyramid pooling and Encoder-
Decoder U-Nets, alongside addressing staining variations and limited training data
in deep learning models [19–23]. A novel deep learning framework, the Attention-
Guided Deep Atrous-Residual U-Net, was introduced for gland segmentation in
colon histopathology images, Leveraging Atrous-Residual units, attention units, and
transitional atrous units, the model addressed concerns like data unpredictability,
overfitting, and resolution degradation. Model evaluation using the GlaS challenge
dataset, CRAG, and a private hospital dataset (HosC) demonstrated promising results,
enhancing the accuracy of colon cancer diagnosis [24]. Transfer learning techniques
were employed to address limitations stemming from the scarcity of large datasets,
achieving remarkably high accuracy rates in colon cancer diagnosis [25].
This research presents a deep learning-based approach for precise glandular struc-
ture segmentation in colon histology images, utilizing the UNet model architec-
ture. The approach has been rigorously trained and tested using the Warwick-QU
dataset from the Gland Segmentation in Colon Histology Images challenge (GlaS).
The dataset comprises diverse samples, including Hematoxylin and Eosin (H&E)
stained tissue slide images, with corresponding ground truth annotations meticu-
lously provided by expert pathologists [26]. The study’s central focus is evaluating the
UNet architecture’s efficacy in image segmentation, particularly its feature-capturing
capabilities.
2.1 Dataset
The dataset employed in this research is derived from the Warwick-QU dataset,
which is part of the Gland Segmentation in Colon Histology Images (GlaS) chal-
lenge ([Link] This dataset comprises
165 images, each in BMP format, extracted from 16 histological sections stained
with Hematoxylin and Eosin (H&E). These sections specifically relate to cases of
colorectal adenocarcinoma classified as stage T3 or T4. Importantly, the images rep-
resent individual patient samples. Moreover, each image in this dataset is paired with
a single ground truth object representing a label, as shown in Fig. 2.
2.2 Method
loading, which standardizes image and mask dimensions to 256 × 256 pixels and
normalizes pixel values within the range [0, 1]. After that, various performance met-
rics such as Intersection over Union (IoU), Dice coefficient, model loss, Dice loss,
Recall, Precision, and Accuracy are employed to evaluate the model’s performance.
Our proposed model is then trained on the dataset containing patient samples
with functional use. A variety of callbacks, including model checkpointing, learn-
ing rate reduction, CSV logging, TensorBoard logging, and early stopping, are uti-
lized to monitor the training process effectively. These callbacks ensure the efficient
management of the training process and optimize model performance. Furthermore,
the model evaluation incorporates custom metrics, and post-processing of predicted
masks is conducted. Lastly, the images are segmented by the trained model for med-
ical image segmentation, allowing for a comprehensive performance assessment by
comparing predicted masks with ground truth masks alongside the original images.
This systematic approach ensures that with each stage, there is progress in the devel-
opment of the model, with a particular emphasis on medical image analysis, as shown
in Fig. 4.
Deep learning algorithms, specifically utilizing the UNet model architecture and
CNNs, are successfully employed for gland segmentation in colon histology images.
Through a comprehensive evaluation utilizing diverse metrics, this method demon-
strates remarkable performance (training accuracy and loss obtained like 98% and
0.97% respectively), as shown in Fig. 5. The segmentation results exhibit (Fig. 5b
depicted 4 number of epochs at Dice Coefficient reached 0.024, and IOU reached
0.011) indicating the model’s capability to precisely identify glandular structures in
histological images, also shown in Fig. 6. The UNet-based approach adeptly captured
intricate details of gland boundaries, even in challenging images with overlapping
structures.
Fig. 4 Diagrams illustrating a–b the processes involving predicting a segmentation mask for a test
image and subsequently post-processing the mask to visualize the segmented region
290 A. Kumar et al.
Fig. 5 Visualizing model performance: a training accuracy, loss, and b Dice coefficient assessment
Fig. 6 Diagrams a–b–c illustrating the original test image, the ground truth mask, and the predicted
mask
22 Deep Learning-Based Gland Segmentation for Enhanced … 291
4 Conclusion
The study highlights the efficacy of deep learning methodologies, specifically the
UNet model, within colon histology image analysis. The achieved precision in
gland segmentation offers substantial potential for enhancing diagnostic accuracy
and facilitating pathological research pertaining to colon diseases. Obtained results
underscore the transformative impact of deep learning techniques in medical image
analysis, heralding a new era of more precise and efficient diagnostic tools. As
we progress in this domain, the integration of deep learning approaches into med-
ical imaging workflows stands to deepen our comprehension of intricate diseases,
ultimately driving improved patient outcomes and propelling medical research and
diagnostics forward.
Declarations
References
1. Debas HT (2004) Small and large intestine. Gastrointest Surg Pathophysiol Manag 239–310
2. Nichols TW, Faass N (2005) Optimal digestive health: a complete guide
3. Gustafsson J (2012) Colonic barrier function in ulcerative colitis-interactions between ion and
mucus secretion
4. Petrova TV, Nykänen A, Norrmén C, Ivanov KI, Andersson LC, Haglund C, Puolakkainen
P, Wempe F, Melchner H, Gradwohl G et al (2008) Transcription factor prox1 induces colon
cancer progression by promoting the transition from benign to highly dysplastic phenotype.
Cancer Cell 13(5):407–419
5. Pandurangan A, Divya T, Kumar K, Dineshbabu V, Velavan B, Sudhandiran G (2018) Colorectal
carcinogenesis: insights into the cell death and signal transduction pathways: a review. World
J Gastrointest Oncol 10(9):244
6. Raulet DH, Guerra N (2009) Oncogenic stress sensed by the immune system: role of natural
killer cell receptors. Nat Rev Immunol 9(8):568–580
292 A. Kumar et al.
1 Introduction
Monkeypox is a disease that emerged as a dangerous disease and had its outbreak
in 75 countries while the world was busy in recovering during Covid 19 [1]. The
first instance of this mysterious disease was reported on May 24, 2003. A small
negative bacillus was outlined in a laboratory which concluded that it was identified
as an Acinetobacter species and was contaminated in the means of its behavioral
symptoms. The early detection of monkeypox is helpful in treating the symptoms
to ensure faster recovery. This can be achieved by analyzing the skin texture using
computer vision and machine learning approaches [2]. It requires the dataset of the
skin infected with monkeypox. The skin changes due to monkeypox can be identified
using traditional computer vision or deep learning methods. The skin features can
be extracted using texture descriptors or convolutional neural networks (CNN) can
be used for feature extraction. The various machine learning based classifiers can be
used to categorize skin disorders.
The most vulnerable hit countries from the monkeypox are central and western
Africa [3]. To tackle the disease and help in diagnosis a light cycler method was
proposed which was also known as (LC-qPCR).
The biological findings of the monkeypox disease were conducted with the help of
various laboratory findings and dermatologists [4]. Apart from monkeypox, similar
findings were the Identification of endemic containing community spread.
The testing with the help of gonorrhea and chlamydia and recognized MPXV-
DNA-positive samples of four men presented a certainty of monkeypox cases [5].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 293
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
294 V. Gaikwad and T. Kinare
These cases were not all diagnosed which made an outcome that testing and quaran-
tining will not help to contain the widespread. Although the Varicella-Zoster Virus/
Chickenpox virus which is prevalent in humans is self-limiting, the monkeypox
spread did not exist many years back [6]. A study suggested by Kurt Daniel Reed,
Mary Beth Graham, Russell L Regnery, Yu Lithe suggested the isolation and verifi-
cation of monkeypox from humans in the top half of the world [7]. This was mainly
due to the direct contact of humans with the dogs who were kept alone or left as
street dogs or sold as pets. Although the main motive to work on this topic was to
clarify the difference in the accuracies of the ML model and open cv algorithm.
There are many systems that detect monkeypox disease. Some of the systems used
the machine learning model then some of those used image processing methods to
classify the skin diseases by training the dataset of different skin rashes and other
related symptoms. One of the existing systems used the CNN, VGG16 and PSOBER
algorithms which helps to optimize the neural network by tweaking the parameters
to obtain best accuracy.
The proposed system is based on the OpenCV platform and gives the accuracy about
85%. This system is user friendly as compared to other existing systems. Here we
have used various feature extraction algorithms like GLCM, LCBM and compared
it to machine learning algorithms like XGBOOST which makes the system more
unique. We solved the problem of dataset because some of the existing systems have
a limited number of datasets. We have not only a large number of data sets but also
an effective and efficient dataset. In short, the proposed system solved most of the
problems which were faced by the existing systems.
2 Literature Survey
Brief description of this section paper mentions research carried out during 2003
till 2022. The major findings that came across the detection, analysis, processing of
various computer vision or deep learning models are discussed.
Various recent developments in computer vision, emphasizing the integration of
image processing, pattern recognition, machine learning, and computer graphics to
extract and interpret information from images [8].
23 Monkeypox Detection and Other Skin Regularities Using OpenCV 295
A Study chosen to analyze and build a skin database consisting of skin rash
images [9] including five different anomalies viz. monkeypox, chickenpox, cowpox,
smallpox, and measles. This was done using a highly python imaging library
and scikit-image learning library 0.19.3 modules with a large range of trainable
parameters ranging from one to 26 million.
The overall understanding according to [10] is that the first step is Image
Processing and second is Machine Learning. The image processing phase involves
the uprooting of features from the input images. In the machine learning phase, many
feature-rich images are fed into the training set in which they have used the Keras
library for implementation. Another way of classification of skin anomalies/diseases
is by using popular CNN architecture viz. VGG16, ResNet50 and InceptionV3 as
mentioned in [11] These architectures can perform in large scale image processing
tasks including recognition of the the later architecture compiles of the reschedule
models in which the convolution operation is performed and the last performs the
better back propagation respectively. Authors in [12] explained about some of the
main causes of the spread of Monkeypox virus like frequent mobility of humans,
cross border transport of animals, biological warfare and potential threat of bioter-
rorism. Convolutional neural network was pre-trained followed by a SVM module
[13]. SVM is useful for the classification and regression challenges.
Another algorithm proposed is the PSOBER algorithm [14] which is based on
binary techniques of extraction, which indeed is a mix of bit error rate (BER) and
particle swarm optimization (PSO) algorithms and is intended to find the best and
most popular set of features. The latter is denoted by SCBER; which helps to optimize
the neural network by tweaking the parameters to obtain best accuracy.
Comparison of 13 different deep learning models was performed according to
[15] and the research identified the highest performing DL model to unite both to
improve the overall performance. Two limitations are: Data set is limited and the AI
method for performing and executing code is based on the grounds of NN models.
[16] The system accuracy is 98.25%, sensitivity is 96.55%, specificity and F1 score
as 100.00% and 98.25% respectively is obtained with the help of best performing
models namely MobileNet V2. Potential of using the fuzzy logic system and the
powerful capabilities of artificial neural networks [17] to construct a verse which can
be able to distinguish, classify monkeypox disease efficiently.
Also, implementation of the VGG16 model was done to study small and collectible
datasets as in [18]. It also uses Local Interpretable Model-Agnostic Explanations
(LIME) to explain the reason behind prediction systems, used in great demand over
various ML models in the industry nowadays.
A contemporary review [19] on the state of monkeypox and similar diseases in
the healthcare industry in relation to the ongoing outbreaks around the world. The
authors also discuss the future predictions of this disease. The detailed review [20]
gives insights about the current MPX understandings and draws a picture of world
leading understandings of management, treatment and prevention and diagnosis.
296 V. Gaikwad and T. Kinare
3 Methodology
This section of the paper explains the working of the project, where open cv is used
for the purpose of image processing. According to the flow of the project given in
Fig. 1, the working of the project is in the following manner: Firstly, the data set has
been chosen and imported for the working of a project of monkeypox detection Using
OpenCV. The data is then used for data augmentation in order to grab more features
which can be used later for feature extraction. Then the overall dataset including the
original images and the augmented images which is sent for preprocessing part of the
project. Where we change the size, that is, crop the image to certain pixel dimensions
and perform operations like gray scaling, magnifying, blurring etc.
After preprocessing it follows a commonly specified lists of operations like feature
extraction then training and testing the dataset using the above-mentioned machine
learning models.
The dataset includes 7136 images in total. Images in total which will be used for
training and testing purposes, out of which 5357 images are used for the training
purpose and the rest 1779 images are split to testing data shown in Table 1.
Performed the pre-processing on the training data set. Where separately trained
the monkeypox images and the non monkeypox images or the other images are shown
in Fig. 2. Furthermore, to extract more viable features for actual feature extraction
from the image later, we performed data augmentation on the dataset.
The data augmentation of the images is again done in two halves viz, for the
monkeypox images and other images differently. This is done in the batch of 10
images each by using a for loop and datagen_flow function.
The obtained images are then stored in a separate folder in which the size of
images is resized to 64 × 64, the images are then converted into grayscale and are
mapped to a matrix.
Once the processing of the images is done and the images are converted to matrices,
the gray level co-occurrence matrix (GLCM) algorithm is used to extract features
from the stored matrix.
The GLCM properties of an image are expressed as a matrix with the same number
of rows and columns as the gray values in the image. Mathematical operations which
are used in the system are based on the features. Equation 1 represents the contrast
of the image, which is extracted, whereas the dissimilarity is given in Eq. 2. Harlik
statistical features like the Homogeneity, attribute selection measure also known
as ASM and energy are represented in Eqs. 3, 4 and 5 respectively. Thus, these
mathematical operations are performed on the converted matrices, which help us to
extract features from the images.
23 Monkeypox Detection and Other Skin Regularities Using OpenCV 297
levels−1
Contrast = Pi,j (i − j)2 (1)
i,j=0
levels−1
Dissimilarity = Pi, j|i − j| (2)
i,j=0
levels−1 Pi,j
Homogeneity = Pi, j (3)
i,j=0 1 + (i − j)2
levels−1
ASM : P2i, j (4)
i,j=0
√
Energy : ASM (5)
The main logic behind the extraction of features based on all the above-mentioned
parameters is given below in Algorithm 1.
Algorithm 1: Feature Extraction using GLCM
The complete understanding of all types of features now helps us to go to the next
step, which is to train the data set using various training and testing machine learning
models.
Light Gradient Boosting Machine (LGBM) is a gradient boosting framework that
uses decision tree-based learning algorithms to classify between different classes. In
this project we have used an LGBM classifier with Dropouts meet Multiple Additive
Regression Tree (DART) boosting type. We have used a DART boosting classifier in
LGBM for training and testing in which the split is in the ratio of 75% for training
and 25% for testing.
The parameters for LGBM classifiers are one of the major factors in deciding the
overall accuracy of the model. Some of the most important of those are as num_
leaves are the most important parameters to calculate the overall accuracy of the
model as well in deciding whether the model is overfitted.
Max_depth has a major impact on the time required by the model to run and the
accuracy of the model.
In our project we have kept both above mentioned parameter’s value equal which
is 100, as this helps in getting highest possible accuracy.
An overall view of the different processes of the project is summarized in the
above-mentioned block schematic as shown in the below Fig. 3, which describes the
general working idea of the project.
4 Results
After storing the extracted features into a Pandas DataFrame and exporting them into
an Excel file according to Table 2, the interpreted data can be utilized to analyze the
significance of important features. This analysis typically involves plotting graphs
to visualize the relationships or distributions of these features. As shown in table 2
various GLCM features are extracted like energy, correlation, contrast etc. around
2141 features are extracted from the dataset.
For instance, in Fig. 4, you might expect to see a graph that illustrates how the
extracted features vary or correlate with each other. This could involve plotting the
features against each other, or perhaps plotting them against some target variable
if available, to understand their impact or significance in the context of the project.
Such visualizations can provide valuable insights into the underlying data patterns
and aid in making informed decisions or drawing conclusions about the project’s
outcomes.
After the successful feature extraction using GLCM we have predicted and classi-
fied the data whose performance parameters are accuracy, precision and recall which
are given by the Eqs. 6, 7 and 8 respectively.
TP + TN
Accuracy = (6)
TP + TN + FP + FN
TP
Precision = (7)
TP + FP
TP
Recall = (8)
TP + FN
where,
5 Conclusion
The system proposed in this project is a novel and unique approach that can success-
fully detect and classify the image whether it is of monkeypox or not. This is achieved
by the combination of various algorithms discussed in the above proposed method-
ology, which includes feature extraction algorithms like GLCM and then algorithms
for classification like LGBM, SVM and XG Boost. The accuracy of the system in
classification of monkeypox disease as compared to non monkeypox or other diseases
is 80%. The algorithm used for feature detection is GLCM and for training and testing
23 Monkeypox Detection and Other Skin Regularities Using OpenCV 305
is LGBM which makes the system more efficient and less complex and more effec-
tive than any other existing systems. But as in any case, our proposed method is not
perfect. The limitations of the proposed system are that the data set for the system
is limited. The system can predict only whether the disease is Monkeypox or not, it
cannot classify into other skin diseases and not 100% accuracy as some skin diseases
are quite similar to the Monkeypox skin rash. Future scope of the study will be to
overcome from the above limitations and make the system more efficient and more
effective.
References
1. Goswami T, Dabhi V, Prajapati HB (2020) Skin disease classification from image. [Link]
org/10.1109/ICACCS48705.2020.9074232
2. Hosny KM, Kassem MA, Foaud MM (2019) Department of information technology, “classi-
fication of skin lesions using transfer learning and augmentation with Alex-ne”
3. Manjurul Ahsan Md, Uddin MR, Farjana M et al. (2022) Image data collection and imple-
mentation of deep learning-based model in detecting monkeypox disease using modified
vgg1”
4. Masayuki et al. (2008) Diagnosis and assessment of monkeypox virus (MPXV) infection
by quantitative PCR assay: differentiation of Congo Basin and West African MPXV strains
61:140–142
5. Thornhill JP et al. (2022) Monkeypox virus infection in humans across 16 countries, P 207323
6. De Baetselier I, Van Dijck C, Kenyon C (2022) Retrospective detection of asymptomatic
monkeypox virus infections among male sexual health clinic attendees in Belgium
7. Ramana KV, Mahendra P et al (2017) Epidemiology, diagnosis, and control of Monkeypox
disease: a comprehensive review. American J Infectious Diseases Microbiol 5(2):94–99
8. Wiley V, Lucas T, Informatika T (2018) TSCC Jakarta Indonesia, “computer vision and image
processing, international journal of artificial intelligence research”
9. Hussain MA et al. (2022) Can artificial intelligence detect MONKEYPOX from digital skin
images. 08.08.50319
10. Rajasekaran G, Aiswarya N, Keerthana R (2020) Assistant Professor, Dept of CSE, Jeppiaar
SRR Engineering College, Padur, Chennai, Skin disease identification using image processing
and machine learning techniques. Int Res J Eng Techno (IRJET) e-ISSN: 2395–0056 Volume
2, 03
11. Ali SN et al. (2022) Monkeypox skin lesion detection using deep learning models: a feasibility
study
12. Ali SN et al. (2018) Emergence of Monkeypox as the most important Orthopoxvirus published:
04 September 2018 [Link]
13. Sklenovská N et al. (2021) Intelligent system for skin disease prediction using machine learning
14. Abdelaziz A et al. (2022) Classification of Monkeypox images based on transfer learning and
the Al-biruni earth radius optimization algorithm
15. Chiranjibi, Bahadur Shahi ST (2022) Monkeypox virus detection using pre-trained deep
learning-based approaches. J Med Syst
16. Akin KD et al. (2022) Classification of Monkeypox skin lesion using the explainable conference
on innovative academic studies ICIAS
17. Akin D, Gurkan C, Budak A, Karatas H (2023) Artificial intelligence assisted convolutional
neural networks
18. Tom, et al. (2018) A neuro-fuzzy based model for diagnosis of Monkeypox disease. Int J
Comput Sci Trends Techno (IJCST) 6(2)
306 V. Gaikwad and T. Kinare
19. Reed KD, Graham MB, Regnery RL, Li Y (2004) The detection of Monkeypox in Humans in
the Western Hemisphere. [Link]
20. Titanji BK et al. (2022) Monkeypox a contemporary review for healthcare professionals,
310.6615388
Different types of adders contribute to the performance of a hybrid adder by addressing specific needs within the VLSI circuit design. The Ripple Carry Adder (RCA) is suitable for small bit-width additions due to its simplicity . Han-Carlson, Weinberger, and Ling adders offer efficient parallel computation to enhance speed . Each type of adder is optimized for specific bit ranges and has mechanisms for conditional carry selection to improve functionality. The Weinberger adder minimizes logic stages and delays . This strategic combination enhances overall performance in terms of area, power, and delay, leading to improved VLSI circuit efficiency .
The Binary to Excess-1 Converter (BEC) is used in VLSI adder design to minimize logic redundancy, which helps in reducing power and area usage . It is employed to convert binary numbers to excess-1 notation, facilitating specific roles in functional operations such as efficient carry computations and optimizing the addition process . By minimizing redundant logic and improving area efficiency, BEC circuits contribute to the overall performance enhancements of hybrid adders, particularly in contexts where area and power utilization are critical metrics .
Hybrid technology in adder design enhances area, power, and delay efficiency by incorporating modules like Ling, Weinberger, and Han-Carlson adders, each addressing specific bottlenecks. Ling adders optimize long-distance carry propagation, Weinberger adders reduce logic stages and carry generation duration, while Han-Carlson adders ensure balance in complexity . These are complemented by Binary to Excess-1 Converter circuits to minimize logic redundancy. The strategic integration of these modules leads to improved overall efficiency, with hybrid adders demonstrating better performance metrics in VLSI circuit applications .
The hybrid adder improves performance metrics such as area, power, and delay by integrating various types of adders each optimized for specific tasks. For example, it combines Ripple Carry Adder, Han-Carlson Adder, Binary to Excess-1 Converter, Weinberger Adder, and Ling Adder, each with unique advantages like efficient parallel computation and reduced logic redundancy . These attributes help reduce area and power consumption by 4-35% and 10-50% respectively in 90 nm technology, and improve performance efficiency compared to traditional adders .
The Multi-objective Jaya Optimization (MJO) algorithm plays a key role in the Jaya Convolutional Neural Network (CNN) for handwritten Optical Character Recognition (OCR) by maximizing the initial weight number in the network . Its primary goals are to reduce intra-class variance and increase inter-class distance, which is crucial for improving the accuracy and reliability of the character recognition process. The MJO assists in refining the convolutional responses into a condensed feature space, thereby enhancing the performance of conventional classifiers to identify the characters from various datasets more effectively .
Bibliometric analysis in AI-guided breast cancer diagnosis and prognosis research offers insights by aggregating publications, discerning trends, and evaluating high-impact studies to inform clinical and research directions . Employing disruptive quality measures further emphasizes forward-thinking methodologies, highlighting pivotal studies that could shape future diagnostic advancements. This approach helps correlate citation impacts with research outcomes, guiding resource allocation and developmental prioritization in AI and cancer research domains .
Support Vector Machines (SVMs) offer opportunities in classification tasks due to their effectiveness in high-dimensional spaces and robustness against overfitting, particularly when dealing with a small number of examples relative to dimensions . However, challenges include computational inefficiency in very large datasets and the difficulty in selecting optimal kernel functions. SVMs also have limitations in handling non-linear data unless kernel tricks are applied. Despite these challenges, SVMs remain a powerful tool for various classification problems given their solid theoretical foundations .
The convolutional neural network (CNN) is effective for processing image-like data due to its ability to gather spatial data hierarchies representative of both basic and complex patterns . This is achieved through its architecture inspired by the visual cortex of animals . CNNs consist of three main types of learning layers: convolutional layers for feature detection, pooling layers for down-sampling, and fully connected layers for producing the final output such as classification . These components work in conjunction to autonomously extract and prioritize relevant features within the data .
Deep learning for recognizing Gujarati characters offers the advantage of self-learning representation features, which reduces manual feature extraction intervention . The approach achieved an accuracy of about 78.6%, indicating a higher performance compared to the hybrid method using k-Nearest Neighbor (k-NN) and tree classifiers, which only achieved a 63% accuracy . Deep learning models provide both feature extraction and classification autonomously, which makes them particularly suited for handling large volumes of input and complex recognition tasks, compared to traditional methods that might require more human oversight .
Deep learning models provide key advantages for feature extraction and classification tasks due to their self-learning capabilities which mimic the brain’s information processing . They offer autonomously determined feature prioritization with minimal programmer intervention, making them well-suited for handling high volumes of complex input and output data. These models can identify and utilize relevant features autonomously, enhancing the accuracy and speed of classification tasks compared to traditional methods that require manual feature extraction .