0% found this document useful (0 votes)

47 views308 pages

Intelligent Systems Algorithms Overview

The document outlines the proceedings of the International Conference on Paradigms of Communication, Computing and Data Analytics (PCCDA 2024), focusing on advancements in algorithms for intelligent systems. It includes research on various topics such as machine learning, multi-agent systems, and optimization, with contributions from academia and industry. The book serves as a resource for graduate students, researchers, and professionals interested in the latest developments in intelligent systems and their applications.

Uploaded by

Pablo Vivero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views308 pages

Intelligent Systems Algorithms Overview

Uploaded by

Pablo Vivero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Algorithms for Intelligent Systems

Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar

Himanshu Mittal
Satyasai Jagannath Nanda
Meng-Hiot Lim Editors

Proceedings
of International
Conference
on Paradigms
of Communication,
Computing and Data
Analytics
PCCDA 2024, Volume 2
Algorithms for Intelligent Systems

Series Editors
Jagdish Chand Bansal, Department of Mathematics, South Asian University,
New Delhi, Delhi, India
Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee,
Roorkee, Uttarakhand, India
Atulya K. Nagar, School of Mathematics, Computer Science and Engineering,
Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms
for intelligent systems with their applications to various real world problems. It
covers research related to autonomous agents, multi-agent systems, behavioral
modeling, reinforcement learning, game theory, mechanism design, machine
learning, meta-heuristic search, optimization, planning and scheduling, artificial
neural networks, evolutionary computation, swarm intelligence and other algo-
rithms for intelligent systems.
The book series includes recent advancements, modification and applications
of the artificial neural networks, evolutionary computation, swarm intelligence,
artificial immune systems, fuzzy system, autonomous and multi agent systems,
machine learning and other intelligent systems related areas. The material will be
beneficial for the graduate students, post-graduate students as well as the
researchers who want a broader view of advances in algorithms for intelligent
systems. The contents will also be useful to the researchers from other fields who
have no knowledge of the power of intelligent systems, e.g. the researchers in the
field of bioinformatics, biochemists, mechanical and chemical engineers,
economists, musicians and medical practitioners.
The series publishes monographs, edited volumes, advanced textbooks and
selected proceedings.
Indexed by zbMATH.
All books published in the series are submitted for consideration in Web of
Science.
Himanshu Mittal · Satyasai Jagannath Nanda ·
Meng-Hiot Lim
Editors

Proceedings of International
Conference on Paradigms
of Communication,
Computing and Data
Analytics
PCCDA 2024, Volume 2
Editors
Himanshu Mittal Satyasai Jagannath Nanda
Department of Artificial Intelligence Department of Electronics
and Data Sciences and Communication Engineering
Indira Gandhi Delhi Technical University Malaviya National Institute of Technology
for Women Jaipur, Rajasthan, India
New Delhi, Delhi, India

Meng-Hiot Lim
School of Electrical and Electronic
Engineering
Nanyang Technology University
Singapore, Singapore

ISSN 2524-7565 ISSN 2524-7573 (electronic)

Algorithms for Intelligent Systems
ISBN 978-981-97-8668-8 ISBN 978-981-97-8669-5 (eBook)
[Link]

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore

If disposing of this product, please recycle the paper.

Organizing Committee

General Chairs

Dr. Himanshu Mittal, Department of Artificial Intelligence and Data Science, Indira
Gandhi Technical University for Women (IGDTUW), Delhi
Dr. Satyasai Jagannath Nanda, Department of Electronics and Communication
Engineering, MNIT Jaipur, India
Prof. Anita Tomar, Head, Department of Mathematics, Sridev Suman Uttarakhand
University, Pt. L. M. S. Campus Rishikesh, Uttarakhand
Prof. G. K. Dhingra, Dean, Faculty of Science, Sridev Suman Uttarakhand University,
Pt. L. M. S. Campus Rishikesh, Uttarakhand
Prof. M. H. Lim, School of Electrical and Electronics Engineering, Nanyang
Technological University, Singapore

Organizing Chairs

Dr. Gaurav Varshney, Department of Mathematics, Sridev Suman Uttarakhand

University, Pt. L. M. S. Campus Rishikesh, Uttarakhand
Dr. Jagdish Chand Bansal, South Asian University New Delhi, India

Program Chairs

Dr. Hemant Parmar, Department of Physics, Sridev Suman Uttarakhand University,

Pt. L. M. S. Campus Rishikesh, Uttarakhand
Dr. Nirmala Sharma, Rajasthan Technical University, Kota, Rajasthan, India
Dr. Preeti Khanduri, Department of Botany, Sridev Suman Uttarakhand University,
Pt. L. M. S. Campus Rishikesh, Uttarakhand

v
vi Organizing Committee

Dr. R. K. Joshi, Department of Chemistry, Sridev Suman Uttarakhand University, Pt.

L. M. S. Campus Rishikesh, Uttarakhand
Dr. S. K. Kuriyal, Department of Botany, Sridev Suman Uttarakhand University, Pt.
L. M. S. Campus Rishikesh, Uttarakhand
Dr. Seema Bainiwal, Department of Chemistry, Sridev Suman Uttarakhand Univer-
sity, Pt. L. M. S. Campus Rishikesh, Uttarakhand
Dr. Shalini Rawat, Department of Botany, Sridev Suman Uttarakhand University, Pt.
L. M. S. Campus Rishikesh, Uttarakhand
Dr. Srikrishna Nautiyal, Department of Geology, Sridev Suman Uttarakhand
University, Pt. L. M. S. Campus Rishikesh, Uttarakhand
Dr. Vibha Kumar, Department of Chemistry, Sridev Suman Uttarakhand University,
Pt. L. M. S. Campus Rishikesh, Uttarakhand
Dr. Vivek Shrivastava, NIT, Delhi
Prof. Dipa Sharma, Department of Mathematics, Sridev Suman Uttarakhand Univer-
sity, Pt. L. M. S. Campus Rishikesh, Uttarakhand
Prof. Hitendra Singh, Department of Chemistry, Sridev Suman Uttarakhand Univer-
sity, Pt. L. M. S. Campus Rishikesh, Uttarakhand
Prof. Smita Badola, Department of Zoology, Sridev Suman Uttarakhand University,
Pt. L. M. S. Campus Rishikesh, Uttarakhand

Organizing Secretary

Dr. Sandeep Chaurasia, Manipal University, Jaipur, India

Dr. Harish Sharma, Rajasthan Technical University, Kota
Dr. Sakshi Shringi, Manipal University, Jaipur, India

Publicity Chairs

Dr. Prashant Sharma, Mandsaur University, Madhya Pradesh, India

Prof. Dinesh Sharma, Sridev Suman Uttarakhand University, Pt. L. M. S. Campus
Rishikesh, Uttarakhand

Technical Program Committee

Ajit Kumar Sahoo, National Institute of Technology, Rourkela

Alekha Kumar Mishra, National Institute of Technology, Jamshedpur
Anand Arvind Maha, Thakur College of Engineering and Technology, Mumbai
Anneke Zuiderjik, TU Delft, Netherlands
Arka Prokash Mazumdar, Malaviya National Institute of Technology, Jaipur
Organizing Committee vii

Arnapurna Panda, Regional Institute of Education, Bhubaneswar

Arvind Kumar Bhardwaj, United States
Athraa Juhi Jani, Al-Nahrain University, Iraq
Badri N. Subudhi, Indian Institute of Technology, Jammu
Bharat Gupta, National Institute of Technology, Patna
Bibhu Acharya, National Institute of Technology, Raipur
Bibhuprasad Mohanty, Siksha ‘O’ Anusandhan University, Bhubaneshwar
Bijayananda Patnaik, International Institute of Information Technology,
Bhubaneswar
Braja Gopal Patra, Weill Cornell University, New York, USA
C. Periasamy, Malaviya National Institute of Technology, Jaipur
Devasis Pradhan, Acharya Institute of Technology, Bengaluru 560107
Dr. Ramani Jaydeep Ramniklal, Atmiya University, Rajkot
Dr. Yagnik A. Rathod, Government Engineering College, Dahod
Dr. Anamika Rana, Maharaja Surajmal Institute, New Delhi
Dr. J. Ramkumar, Dr. N. G. P. Arts and Science College
Dr. Madhusudan Maiti, C. V. Raman Global University, Odisha, Bhubaneswar
Dr. Rahul Soni, Pandit Deendayal Energy University, Gandhinagar, Gujarat
Dr. Shiv Kant, Raffles University, Neemrana, Rajasthan, India
Dr. Shweta S. Aladakatti, Dayananda Sagar University, India
Dr. Suhaib Ahmed, Model Institute of Engineering and Technology, Jammu
Dr. Sunil, Jamia Millia Islamia (A Central University), New Delhi
Dr. Vijay Mohan Shrimal, Apex University, Jaipur
Dr. Kauser Ahmed P., VIT Vellore, India
Fahad Parvez Mahdi, University of Hyogo
Fátima Rodrigues, ISEP, Polytechnic of Porto, INESC TEC
Hariharan Muthusamy, National Institute of Technology, Uttarakhand
Himanshu Sekhar Pradhan, National Institute of Technology, Warangal
Jaydeep Ramani, Atmiya University, Rajkot
Jyoti Prakash Singh, National Institute of Technology, Patna
Kacem Gairaa, URAER EPST CDER
Karan Gupta, SunPower Corporation
Kasturi Vaibhav Mahajan, MIT ADT University
Kuldeep Singh, Malaviya National Institute of Technology, Jaipur
Luiz Guerreiro Lopes, University of Madeira
Mandar Khoje, Independent Researcher, Dublin, California, USA
Menka Yadav, Malaviya National Institute of Technology, Jaipur
Mohd Samar Ansari, Aligarh Muslim University
Neha Katre, Dwarkadas J. Sanghvi College of Engineering, India
Nithin V. George, Indian Institute of Technology, Gandhinagar
Pankaj Kumar Sa, National Institute of Technology, Rourkela
Paolino Di Felice, University of L’AQuila
Pathipati Srihari, National Institute of Technology, Suratkal
Prashant Kumar Sahu, Indian Institute of Technology, Bhubaneswar
Pravin Shantaram Game, Pimpri Chinchwad College of Engineering, Pune
viii Organizing Committee

Prof. Vipin Saxena, Department of Computer Science, Babasaheb Bhimrao

Ambedkar University, Lucknow
Puneet Gangrade, Fordham University, United States
Pyari M. Pradhan, Indian Institute of Technology, Roorkee
Rahul Kumar Chaurasiya, Maulana Azad National Institute of Technology, Bhopal
Rajeev Anand Sahu, University of Luxemburg
Rajendra Mitharwal, Malaviya National Institute of Technology, Jaipur
Raju Pal, Jaypee Institute of Information Technology
Rania Jradi, National Engineering School of Gabès
Rashmi Panda, Indian Institute of Information Technology, Ranchi
Sandeep Kumar, CHRIST (Deemed to be University), Bengaluru, India
Santosh Kumar Vipparthi, Malaviya National Institute of Technology, Jaipur
Sayan Sarcar, Information and Media science Department University of Tsukuba,
Japan
Shambhu Shankar, LNJPIT, Chapra
Sitanshu Sekhar Sahu, Birla Institute of Technology, Mesra
Souvick Ghosh, School of Information San Jose State University
Sriparna Saha, Indian Institute of Technology, Patna
Sudhanshu Mishra, Birla Institute of Technology, Mesra
Sumedha Rai, Center for Data Science, New York University | Acorns Grow
Sumit Kumar, Amity University, Noida
Sunil Dutt Purohit, Rajasthan Technical University, Kota
Surama Biswas, Tufts University, USA
Trilochan Panigrahi, National Institute of Technology, GOA
Vasundhara, National Institute of Technology, Warangal
Vikas Baghel, Jaypee University of Information Technology, Solan
Vinay Chamola, Birla Institute of Technology and Science, Pilani
Preface

This book contains outstanding research papers from the 4th International Conference
on Paradigms of Communication, Computing and Data Analytics (PCCDA 2024)
organized by Pt. Lalit Mohan Sharma Campus, Rishikesh, Sri Dev Suman Uttarak-
hand University, Uttarakhand, India, technically sponsored by Soft Computing
Research Society, India. The conference was conceived as a platform for dissem-
inating and exchanging ideas, concepts and results of the researchers from academia
and industry to develop a comprehensive understanding of the challenges of the
advancements in communication, computing and data analytics and innovative solu-
tions for current challenges in engineering and technology viewpoints. This book
will help strengthen the affable networking between academia and industry. The
conference focused on machine learning, deep learning algorithms, models and their
applications.
We have tried our best to enrich the quality of the PCCDA 2024 through a
stringent and careful peer-review process. PCCDA 2024 received many technical
contributed articles from distinguished participants from home and abroad. PCCDA
2024 received 443 research submissions. After a very stringent peer-reviewing
process, only 45 high-quality papers were finally accepted for presentation and the
final proceedings.
This book presents the second volume of 23 research papers on communication,
computing and data analytics and serves as reference material for advanced research.

New Delhi, India Himanshu Mittal

Jaipur, India Satyasai Jagannath Nanda
Singapore Meng-Hiot Lim

ix
Contents

1 Mitigating SSRF Threats: Integrating ML and Network

Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Kalyani Kulkarni, Komalpreet Kaur, Nidhi Torvi, Ayushi Singh,
and D. Annapurna
2 Multi-directional Molecular Communication Among
Nanomachines: Modeling and Verification . . . . . . . . . . . . . . . . . . . . . . . 17
Athraa Juhi Jani and Jafar J. Jani
3 Secure Sharing of Electronic Health Records Using Smart
Contracts and Role-Based Access Control . . . . . . . . . . . . . . . . . . . . . . . 29
K. Sree Shivesh, S. Vijayalakshmi, R. Naveen Kartik, V. Deepak,
V. Dharaneesh, and Marri Lokesh Yadav
4 RFID Application Using Dual-Band Monopole Antenna
by Enhancing the Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Meena Perla, Mihir Solanki, and Zhil Vora
5 The Determinants of Ethical Governance Policies for Artificial
Intelligence in the Financial Sector of Moroccan Companies . . . . . . 53
Ahmed Hama, Marouane Mkik, Karim Khaddouj, and Ali Hebaz
6 An Ensemble Machine Learning-Based Approach Toward
Accuracy in Bitcoin Price Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Sumeda Puja, Veeravalli Ruthvik, Varun Dhanashree,
K. S. Saee Ganesh, and C. Deepti
7 Examining Transfer Learning Models to Classify Brain
Tumors from MRI Images: A Comparative Analysis . . . . . . . . . . . . . . 87
Umang Sachdev, A. Charan Kumari, and K. Srinivas
8 Supervised Machine Learning for Recognition of Gujarati
Handwritten Characters with Modifiers . . . . . . . . . . . . . . . . . . . . . . . . . 99
Snehal Shukla and Purna Tanna

xi
xii Contents

9 Gujarati Handwritten Conjunct Consonant Recognition

Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Rachana Chaudhari and Purna Tanna
10 Multimodal Deep Learning for Enhanced Prediction
of Molecular Binding Affinities Integrating Chemical
Structures and Protein Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
L. Prasika, G. R. Karish Prajaishma, and M. Yoga Vardhani
11 A Comprehensive Performance Analysis of Area
and Power-Efficient Hybrid Adder Design . . . . . . . . . . . . . . . . . . . . . . . 137
N. Udaya Kumar, K. Bala Sindhuri, A. Praneetha, B. Dileep,
G. Manikanta, and D. Chandana
12 Performance Driven VLSI Adder Choices in Image
Processing: A Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
K. Bala Sindhuri, N. Udaya Kumar, Ch. Gowthami,
Ch. Sree Varun, E. S. V. S. Surya Vaishnavi, A. K. Prathardhan,
and G. G. Karthik
13 Text-to-Speech Conversion for Gujarati Language Using
Deep Learning Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Vishal Narvani and Harshal Arolkar
14 A Comprehensive Bibliometric Study on AI-Guided Breast
Cancer Diagnosis and Prognosis Investigating Web of Science
and Scopus from 2016 to 2023 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Emmy Bhatti, Prabhpreet Kaur, Kiranbir Kaur, and Arzoo
15 Predictive Modeling of Cardiovascular Disease Using
Feedforward Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Dhaval Joshi, Bhavya Singh, Seema Kalonia,
Ajay Kumar Kaushik, Namita Goyal, and Sunil Maggu
16 Design and Implementation of Fetal Heart Rate Measuring
System on MATLAB Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Twarita Singh, Tanishq Dixit, Pragya Paliwal, Samriddhi Tiwari,
and Shivani Saxena
17 Exploring the Effectiveness of Artificial Neural Networks
and Regression Models in Weather Prediction . . . . . . . . . . . . . . . . . . . 219
Vishwadeep Singh, Chandan Kumar, and Nitin Choudhary
18 Modeling and Simulation of a Hybrid Energy Storage System
for DC Microgrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Kinjal R. Patel and Jagrut J. Gadit
19 Prediction of CKD: A Performance Analysis of Six Machine
Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Pallavi V. Baviskar, Vidya A. Nemade, and Vishal V. Mahale
Contents xiii

20 Unlocking the Power of Personalized Content with Generative

AI in Healthcare Marketing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Tamanna, Subarna Rana, Vaibhav Kant Agrawal,
and Manoj Kumar Dasi
21 Web Personalization with Large Language Models:
Challenges and Future Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Nipun Bansal, Manju Bala, and Kapil Sharma
22 Deep Learning-Based Gland Segmentation for Enhanced
Analysis of Colon Histology Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Ajay Kumar, Vivek Kumar, Jay Prakash Singh, and Ashok Patel
23 Monkeypox Detection and Other Skin Regularities Using
OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Vijay Gaikwad and Tejas Kinare
About the Editors

Dr. Himanshu Mittal is an associate professor in the Department of Artificial Intel-

ligence and Data Sciences, Indira Gandhi Delhi Technical University for Women
(IGDTUW), India. He received his Ph.D. in the field of computer vision under
the supervision of Dr. Mukesh Saraswat. The keen research areas of Dr. Mittal are
image analysis, machine learning, and evolutionary algorithms. He has an excellent
academic record as well as research background with papers in reputed journals
including IEEE Transactions. Also, he has been the co-principal investigator of the
research project funded by the Science and Engineering Research Board, Department
of Science and Technology, India.

Dr. Satyasai Jagannath Nanda is an associate professor in the Department of Elec-

tronics and Communication Engineering, Malaviya National Institute of Technology
Jaipur, India. He received Ph.D. degree from School of Electrical Sciences, IIT
Bhubaneswar; [Link]. degree from Department of Electronics and Communication
Engineering, NIT Rourkela; and B.E. degree from Institute of Technical Educa-
tion and Research, Bhubaneswar. He received the prestigious IEI Young Engineers
Award from the Institution of Engineers. He has published 56 Journal articles, 65
international conference proceedings which till date received 2100+ citations. He is
a lead Guest Editor of the journal SN Computer Science, Springer. He is the prin-
cipal investigator of the 5G laboratory established by the Department of Telecom,
Govt. of India at MNIT Jaipur. Under his supervision eight researchers have been
awarded PhD, and five researchers are continuing their research work. Dr. Nanda is
co-coordinator of Electronics and ICT Academy at MNIT Jaipur which is a set up of
Ministry of Electronics and IT, Govt. of India of Grant 10 Crore. His research inter-
ests are: Adaptive Signal Processing, Multi-objective optimization, Data Clustering,
Neural Networks and Evolutionary Computation.

Dr. Meng-Hiot Lim is a faculty in the School of Electrical and Electronic Engi-
neering. He previously held appointment as deputy director for the [Link]. in Financial
Engineering and the Centre for Financial Engineering, anchored at the Nanyang Busi-
ness School. He is a versatile researcher with diverse interests, with research focus

xv
xvi About the Editors

in the areas of computational intelligence, machine learning, finance, algorithms for

UAVs and memetic computing. He is currently the Technical Editor-in-Chief of the
Journal of Memetic Computing published by Springer. He is also the Chief Editor of
the PALO (Proceedings on Adaptation, Learning and Optimization) book series by
Springer. With significant experience in industrial funded projects, he has a passion
for pedagogical engineering and, promoting and nurturing student startups.
Chapter 1
Mitigating SSRF Threats: Integrating
ML and Network Architecture

Kalyani Kulkarni, Komalpreet Kaur, Nidhi Torvi, Ayushi Singh,

and D. Annapurna

1 Introduction

Over the years, with the constant evolution of tools and technologies, there has also
been a dynamic shift in the way Web Applications function. ‘Web applications these
days are a set of interactions between various APIs, Services, End-points, etc.’.
These interactions encompass a range of boundaries both inside and outside the
organization. This has made the task of securing these applications more difficult
because of their open and distributed nature. The attack surface is large and ever-
evolving, making it difficult to defend as explained in detail in [1].
With the advent of cloud computing and its growing prevalence, ensuring the
security of data has only become more challenging. In 2019, a configuration error
in a web application’s firewall led to one of the worst data breaches of the previous
decade (in terms of number of exposed records) where the credit card data of around
106 million individuals was ex-filtrated. Instead of employing a Zero-Day exploit, the
attacker exploited various well-known vulnerabilities, including Server Side Request
Forgery (SSRF) [2].
Server Side Request Forgery is a web security vulnerability that enables the
attacker to have the server make a connection to services and devices within the
organization’s internal network and gain access to sensitive data [3].
SSRF is usually performed by using malicious URLs to manipulate the server
into making requests that will give the attacker unauthorized access to confidential
data.

K. Kulkarni (B) · K. Kaur · N. Torvi · A. Singh · D. Annapurna

Department of Computer Science, PES University, Bengaluru, India
e-mail: kalyanikulkarni2002@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 1
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
2 K. Kulkarni et al.

2 Background

We did an extensive review of various SSRF reports from numerous sources including
but not limited to HackerOne [4–10] Orange Tsai URL [11], Gwendal Le Coguic
URL [12], Enguerran Giller URL [13], Shorebreak Security URL [14], etc., which
allowed us to gain insights into the multifaceted nature of SSRF attacks and their
diverse manifestations in real-world scenarios.

2.1 Types of SSRF Attacks

In-Band: An In-Band attack usually occurs when the attack payload is sent to the
channel where client-server HTTP communication is taking place, i.e. the malicious
payload is inserted in the requests which are sent to the server.
Out-Of-Band (OOB): An Out-of-Band attack occurs when a pointer to the malicious
payload is sent to the victim’s server. Once the server references this pointer, the
malicious message will be delivered.

2.2 Ramifications of SSRF Attacks

Data Exposure and Unauthorized Access: One of the primary ramifications of

SSRF attacks is the risk of unauthorized data exposure. Attackers can exploit SSRF
to access sensitive information residing on internal servers, leading to potential
breaches of confidentiality. Moreover, unauthorized access to internal resources can
have severe implications for data security.
Data Manipulation and Integrity Issues: SSRF vulnerabilities may result in data
manipulation, where attackers modify or delete critical information on internal
servers. This can lead to data integrity issues, compromising the reliability and accu-
racy of stored data.
Network Reconnaissance: SSRF attacks can be leveraged for network reconnais-
sance, enabling attackers to map the internal network structure and identify potential
targets. This information is valuable for orchestrating further, more targeted attacks.
Service Disruption: In some instances, SSRF attacks can lead to service disruption
by overwhelming internal resources or causing unintended behavior in targeted sys-
tems. The impact can range from temporary service outages to prolonged downtime.
Remote Code Execution (RCE): When combined with other vulnerabilities, SSRF
may lead to remote code execution, allowing attackers to run arbitrary code on
the server. This represents a critical security risk, as it could lead to a complete
compromise of the targeted system.
1 Mitigating SSRF Threats: Integrating ML … 3

Bypassing Security Controls: SSRF can be exploited to bypass firewalls and other
security controls. By making requests to internal services not directly accessible from
the Internet, attackers can navigate around established security measures.
Exploiting Trusted Relationships: The trust relationships between different com-
ponents of a system can be exploited through SSRF. Attackers may impersonate
trusted internal servers, making unauthorized requests on their behalf and circum-
venting security measures.
Infrastructure and Cloud Resource Abuse: In cloud environments, SSRF can be
used to abuse cloud resources and services, resulting in financial losses for affected
organizations. The misuse of cloud infrastructure introduces a new dimension to the
potential consequences of SSRF attacks.
Reputation Damage and Regulatory Concerns: Successful SSRF attacks can lead
to reputational damage for organizations, especially if they result in data breaches
or service disruptions. Additionally, organizations may face regulatory and legal
consequences for failing to protect sensitive data.
Cloud Environment: Web applications hosted in cloud environments operate on
instances or virtual servers. Each instance stores metadata which is in turn organized
in a metadata server, accessible through its corresponding service. Attackers leverage
SSRF vulnerabilities to manipulate requests originating from the application to the
metadata service, stealing access credentials. These stolen credentials grant unau-
thorized access to services and resources, as demonstrated in high-profile incidents
such as the data breach that happened at Capital One. Prominent cloud providers
have unveiled updated editions to enhance requests for metadata service security, but
vulnerabilities persist in instances still using previous versions.
On-Premise Environment: In on-premise environments, where web applications
are hosted within the internal network, SSRF vulnerabilities can have distinct conse-
quences. Internal services, usually not accessible from outside the firewall, become
vulnerable to unauthorized access. Attackers can make requests from the internal
services of the application, devising payloads compatible with the service’s commu-
nication protocol. While some services can communicate in HTTP natively, other ser-
vices might not sustain this protocol but can still accept inputs from HTTP. Attackers
leverage these vulnerabilities to exploit services, as observed in incidents involving
Memcached and Redis services.

3 Related Work

The paper by Spath et al. [19] delves into the realm of Server-Side Request Forgery
(SSRF) attacks within XML parsing, shedding light on the diverse attack vectors
and countermeasures in this domain. It explores both classic and innovative SSRF
techniques, showcasing how XML parsing vulnerabilities can be exploited to send
4 K. Kulkarni et al.

requests on behalf of the parser to internal network endpoints. The classic SSRF
attack, exemplified through a DOCTYPE-based vector, demonstrates the ability to
invoke sensitive operations remotely. Moreover, the paper introduces an innovative
SSRF attack vector based on XInclude, emphasizing the versatility and evolving
nature of SSRF threats. Challenges such as parser feature vulnerabilities and firewall
restrictions are discussed, highlighting the complexities in mitigating SSRF risks. The
proposed countermeasures advocate for the prevention of insecure parser features
and the implementation of input validation strategies using whitelists or blacklists.
The comprehensive analysis done in the paper helped us gain valuable insights into
the intricacies of SSRF attacks and the various possible attack vectors which can be
used to achieve it.
The researcher Al-talak et al. [15] discusses the use of Deep Learning techniques
such as LSTM to detect SSRF attacks in web applications. The dataset is sourced from
the ‘Canadian Institute of Cybersecurity, New Brunswick’. The machine learning
model is trained and LSTM is applied to create an intelligent model capable of
accurately identifying SSRF attacks. According to the paper, LSTM is designed to
learn long-range dependencies by having a repeating edge inside each cell with a
weight ‘w = 1’. The proposed model achieves an accuracy rate of 96.9%. Machine
Learning and Deep Learning tend to be highly adaptable, making them extremely
suited to detection of evolving cyberattacks like SSRF. We incorporated a similar
approach in training our machine learning model.
In article [16] an innovative approach to defend against SSRF is proposed. Incom-
ing HTTP requests are examined and pattern matching and detection is done with
the help of Lua to identify the presence of URLs/URIs. A server isolated from the
Organization’s Internal Network is created. The address of the server is appended to
the URLs and they’re redirected to it. The method was able to provide an extra layer
of security but the redirection occasionally caused slight delays.
In article [17], the research explores a novel approach to classify different types of
URLs, including benign, defacement, spam, phishing, and malware, using supervised
learning with a focus on lexical features. The study employs a selected set of features
derived from URL components and applies machine learning classifiers, such as K-
Nearest Neighbors, C4.5, and Random Forest, to achieve classification accuracy of
97%. Additionally, the paper creates a dataset which has been used in article [15]
and consequently in our paper too.
As per the insights from [20] by Zeravan Arif Ali et al., XGBoost stands out for its
ability to model complex systems effectively, offering superior prediction accuracy
and versatility in classification tasks. The paper emphasizes XGBoost’s design focus
on optimizing devices and leveraging machine learning principles to enhance com-
putational efficiency. Its adaptive sampling and feature selection strategies reduce
overfitting risks, improving model generalization and predictive performance. The
emphasis on interpretability allows researchers to identify key factors influencing the
modeled systems. XGBoost’s use of multi-threading further speeds up model train-
ing, making it a preferred choice, especially for handling large and diverse datasets.
Given the complexity and diversity of our dataset with numerous parameters, we
opted for XGBoost due to its proven effectiveness in addressing such challenges.
1 Mitigating SSRF Threats: Integrating ML … 5

4 Proposed Approach

In the subsequent sections, we present a comprehensive overview of the URL pro-

cessing methodology within our network architecture. Focusing on the critical role
of the Intrusion Detection System (IDS) followed by the Intrusion Prevention Sys-
tem (IPS), we delve into the intricate steps it undertakes upon receiving a URL
[Link] IDS acts as the initial line of defense, thoroughly scrutinizing the URL’s
attributes, including its structure, domain, payload, and various other characteris-
tics. This thorough examination determines whether the URL poses a threat to the
system or can be safely permitted. Following this, we explore the nuanced handling
of suspected malicious URLs which comes under the domain of Intrusion Preven-
tion System with the implementation of a dedicated helper server, and the strategic
use of Nginx for server implementation. These measures collectively contribute to a
robust defense mechanism that optimizes system efficiency by selectively address-
ing potential threats while ensuring minimal disruption to genuine requests. Figure
1 illustrates the proposed architecture for the approach which will be elaborated upon
in detail below.

4.1 URL Examination by the IDS

Upon receiving a URL request, the IDS becomes the first line of defense. It meticu-
lously assesses the URL and ascertains whether it poses a threat to the system. This
examination involves the analysis of various attributes, including, but not limited to,
URL structure, source reputation, and payload characteristics. If the IDS determines
the URL to be benign, the reverse proxy seamlessly allows access to the requested
resource.
Using Machine Learning to classify the URLs makes the proposed solution more
flexible as a machine learning model would adapt to a new attack vector significantly
faster than a static Rule-based system would.

4.2 Handling Suspected Malicious URLs

In cases where the IDS flags a URL as potentially malicious, the system adopts a
cautious approach to avoid false positives. Instead of outrightly blocking the URL,
the system forwards it to a dedicated helper server, which operates in isolation from
the main network infrastructure. This server acts as a secondary layer of defense,
executing the requested action in an environment devoid of access to internal network
components.
6 K. Kulkarni et al.

Fig. 1 Proposed network architecture for SSRF prevention

4.3 Server Implementation/IPS

Nginx, an open-source tool used for implementing servers, is utilized to implement

an isolated ‘Helper Server’ that is restricted from accessing the internal network
by means of a blocklist file. Using Nginx significantly improves the scalability of
the proposed approach as the system will be capable of handling large numbers of
requests due to Nginx’s built-in load balancing capabilities.
Nginx also helps facilitate integration with existing infrastructure as it is a popular
web server and many organizations already use it, making it convenient to integrate
the proposed approach into an existing setup. The file contains a range of private IP
addresses and local host addresses. This prevents unauthorized access to sensitive
information present in HTTP-enabled databases and AWS metadata.
To implement the isolated ‘helper’ server, a Nginx server was implemented using
docker and firewall rules were defined, wherein the isolated additional server is
blocked from accessing the IP addresses [Link] (localhost), and private IP address
ranges. Another Nginx server was setup to act as a reverse proxy, to redirect requests
classified as malicious to the ‘helper’ server. As an experimental validation of our
proposed approach, we successfully implemented an isolated ‘Helper Server’ using
Nginx within a Docker environment and defined firewall rules to restrict access
to IP addresses such as [Link] (localhost) and private IP address ranges. This
1 Mitigating SSRF Threats: Integrating ML … 7

implementation demonstrated the feasibility and efficacy of our approach, as detailed

in reference [21] available in our GitHub repository.

4.4 URL Segregation

The isolated helper server provides a controlled environment for the execution of
potentially malicious requests. By segregating this server from the in-house network
(blocked from accessing private IP addresses and local host), any harmful effects
stemming from a confirmed malicious URL are contained within the isolated envi-
ronment. This strategic segregation ensures that in case the URL executes a malicious
payload, it cannot compromise other internal network elements.

4.5 Optimizing System Efficiency

To maintain optimal system performance, only URLs deemed suspicious by the IDS
are routed to the ‘Helper’ server for further analysis and execution. This selective
approach is motivated by the imperative to minimize latency and response time. By
focusing resources on potentially malicious requests, the system can swiftly and
efficiently process genuine threats without incurring unnecessary delays.
Initially, a user inputs a URL on a website interface, which triggers the process.
The system then extracts various features from the URL, such as domain token
count, argument length, and the presence of suspicious tokens, among others. These
features are sent to a script, which utilizes an XGBoost classifier to assess if the URL
is malicious or benign.
As seen in Fig. 2, if the algorithm’s prediction value for the URL is below a certain
threshold, the URL is classified as malicious. In response to a malicious request, the
workflow reroutes the request to a helper server that is isolated from the internal
network of the organization. This isolation is achieved by blocking specific ranges of
IP addresses, thereby preventing the helper server from accessing internal network
resources or local host addresses. The helper server attempts to fetch the data from the
URL, but due to the isolation, any access to sensitive internal resources is thwarted,
effectively mitigating the risk of SSRF attacks.
Figure 3 describes a scenario where the predicted value is greater than the thresh-
old, and the URL is classified as benign. The result is then displayed to the user, and
the URL is hence deemed safe. The request proceeds to the web server as usual.
8 K. Kulkarni et al.

Fig. 2 Pipeline indicating

the suggested network
architecture for SSRF
prevention when the URL
entered is ‘Malicious’
1 Mitigating SSRF Threats: Integrating ML … 9

Fig. 3 Pipeline indicating

the suggested network
architecture for SSRF
prevention when the URL
entered is ‘Benign’
10 K. Kulkarni et al.

5 Evaluation

The models were trained on a dataset sourced from the ‘Canadian Institute for Cyber-
security at the University of New Brunswick’. This dataset served as a resource,
providing instances of both benign and malicious URLs.

5.1 Data Pre-processing

The dataset ‘[Link]’ is imported into a Pandas DataFrame to initiate the

data analysis process. The initial step involves examining the shape of the dataset
to gain insights into its structural dimensions. Infinite values within the dataset are
identified and addressed by replacing them with Not a Number (NaN). Subsequently,
rows containing exclusively missing values are removed to ensure data integrity. The
categorical labels in the ‘URL_Type_obf_Type’ column are mapped to numerical
values. Specifically, ‘Defacement’ is assigned the label ‘1’, and ‘benign’ is assigned
‘0’. This conversion facilitates the integration of categorical data into subsequent
machine learning processes. A subset of relevant features based on the findings of
the research paper given in article [17] is extracted from the cleaned dataset. The
selected features for our machine learning model have been carefully curated to
capture vital aspects of URL structures. These attributes, ranging from token counts
and path lengths to special character usage and entropy calculations, collectively
contribute to the model’s robustness. In the subsequent discussion, we delve into the
specific features chosen and elucidate their significance in identifying irregularities,
subdomain patterns, and potential malicious intent within URLs.
Domain Token Count, Avg. Path Token Length, TLD, and Arg URL Ratio: These
attributes furnish insights into the URL’s structure, encompassing metrics such as
the count of tokens in the domain and path, the average length of path tokens, the
top-level domain (TLD), and the ratio of arguments to the overall URL length. Such
information becomes instrumental in flagging URLs with irregular structures.
Number of Dots in URL and Arguments Longest Word Length: These attributes
provide additional structural details, such as the quantity of dots in the URL (indi-
cating subdomains or suspicious patterns) and the length of the longest word in the
arguments (revealing potential efforts to obscure malicious code).
Spchar URL, Delimiter Domain, Delimiter Path, and Number Rate: Offering
insights into the usage of special characters, these attributes encompass the presence
of delimiters in the domain and path and the ratio of digits and special characters
to the overall URL content. Such patterns may signal attempts to conceal the true
nature of the URL.
Directory Name, Symbol Count Domain, and Entropy Domain: Providing deeper
insights into URL structure and content, these attributes include the names of direc-
tories in the path, the count of symbols in the domain, and the entropy of the domain.
1 Mitigating SSRF Threats: Integrating ML … 11

Entropy, a measure of the randomness in character usage, is calculated using the

following formula which has been obtained in article [18]:

n
H (x) = − p(xi ) logb p(xi ) (1)
i=0

Here, H represents the entropy of the domain, P(x) denotes the probability of the
character x occurring in the domain, and log2 is the base-2 logarithm function. This
calculation method enables the identification of patterns indicative of attempts to
conceal malicious content or obfuscate the true nature of the URL.
Taking only the above-mentioned columns, the dataset is partitioned into training
and testing sets using a stratified splitting strategy. This ensures a representative distri-
bution of class labels in both sets, with 80% allocated for training and 20% for testing.
To mitigate the impact of varying scales among features, a standardization process is
applied. The features in the training set (X_train) are scaled using the Standard Scaler,
and the same transformation is applied to the testing set (X_test) for consistency. The
scaler, responsible for standardizing the features, is serialized and stored using the
joblib library. This serialized scaler, denoted as ‘modelName_scaler.joblib,’ serves
to standardize future data in a consistent manner during model evaluation.
After doing the above pre-processing on the dataset, a series of experiments were
conducted employing various ML algorithms. The evaluation aimed to assess the sys-
tem’s ability to accurately identify and classify URLs as benign or malicious. Several
algorithms were implemented and evaluated, and their corresponding accuracies are
summarized in Table 1.
In addressing the detection of Server-Side Request Forgery (SSRF) with URLs,
a combination of LSTM and Bi-LSTM models was selected due to their sequen-
tial processing capabilities, making them effective at capturing intricate patterns in
URL sequences. These models are well-suited for learning dependencies over time,
which is crucial for detecting anomalies and suspicious patterns in URL requests.
Additionally, ensemble methods such as Random Forest, Decision Tree, XGBoost,
and AdaBoost were included to enhance the overall robustness and accuracy of the
detection system.

Table 1 Accuracy of implemented algorithms

Algorithm Accuracy in %
LSTM 96.33
Bi-LSTM 96.4
Random Forest 94.0
Decision Tree 91.6
AdaBoost 93.05
XGBoost 98.55
12 K. Kulkarni et al.

The rationale behind incorporating ensemble methods lies in their ability to lever-
age the strengths of multiple models, thus improving the system’s resilience against
various types of attacks and data variations. Specifically, XGBoost’s exceptional
accuracy of 98.55% made it a compelling choice for integration as an intrusion
detection system (IDS) within the network architecture.
The optimization of the XGBoost algorithm involved using Bayesian Search to
fine-tune hyperparameters like Learning Rate and Max Depth. The search space
for hyperparameters was defined based on prior knowledge and empirical testing,
ensuring a thorough exploration of potential settings. For instance, a log-uniform
distribution between 0.1 and 1.0 was used for the learning rate, covering a broad range
of values and validated through empirical testing to optimize the model effectively.

6 Results

The algorithm achieved Precision, Recall, and F1 score values of 0.977, 0.994, and
0.985, respectively, as indicated in Table 2. These metrics indicate a high accuracy,
which is a fundamental requirement for an IDS. The high precision indicates that there
are fewer false negatives (Malicious URLs predicted to be Benign). Having minimum
false negatives is essential to ensure that the web server, which is connected to the
private network of the organization, is protected. Figure 4 illustrates the confusion
matrix, providing a deeper insight into the model’s performance. Table 3 details the
counts of true positives, false negatives, true negatives, and false positives extracted
from the confusion matrix.
Alongside the minimal occurrence of false positives (benign URLs predicted to
be malicious), the suggested network architecture effectively manages these cases
by redirecting requests to an isolated server, ensuring that the required data is fetched
unless the request attempts to access internal private network addresses or the Local
host.

Table 2 Results of the confusion matrix

Label Prediction
True positives 490
False positives 22
True negatives 16
False negatives 510
1 Mitigating SSRF Threats: Integrating ML … 13

Fig. 4 Confusion matrix

Table 3 Results of the confusion matrix

Metric Score
Precision 0.977
Recall 0.994
Accuracy 0.9855
F1 score 0.985

7 Conclusion

In this paper, an efficient network architecture is proposed that uses Machine Learn-
ing for detecting and preventing SSRF attacks while also keeping in mind the man-
agement of web traffic by load balancing depending on the probability of an URL
being malicious. Numerous Machine Learning models were implemented, and their
respective inaccuracies were systematically compared and analyzed. Furthermore,
the model with the highest accuracy (98.55%) was selected and used to classify
URLs into malicious and benign. To offer a more comprehensive solution to the
problem, an IPS was proposed where malicious URLs are redirected to a dedicated
‘Helper’ Server which is isolated from the database and the internal network to limit
its access to sensitive information. There lies a risk of the additional server being
overwhelmed by requests classified as malicious. To mitigate this risk, the imple-
mentation of rate limiting is a viable strategy. Rate limiting controls the frequency
of client requests over a specified interval. A notable trade-off is the possibility of
erroneously classifying benign URLs as malicious, which may result in legitimate
14 K. Kulkarni et al.

requests being unfairly restricted. However, the low number of false positives noted
upon testing the classification model suggests that the likelihood of benign requests
being misclassified is sufficiently low for the trade-off to be feasible. Since an over-
whelmed additional server would imply a sudden influx in malicious requests, the
establishment of monitoring and alerting systems would also be helpful to ensure
proactive security measures can be taken.
While our study has demonstrated impressive accuracies in detecting malicious
URLs, it’s essential to acknowledge that the dataset, although valuable, may have
certain limitations. One such aspect is the dataset’s potential lack of diversity, which
could contribute to the high accuracies observed. This lack of diversity, while not
undermining the dataset’s significance, is a common challenge in the field of machine
learning-based threat detection. It’s worth noting that real-world scenarios often
present a wider range of malicious URL variations and obfuscation techniques that
may not be fully represented in the dataset. Our study highlights the effectiveness
of our approach within the context of the dataset provided, and future work could
explore strategies to enhance model robustness in more diverse and dynamic threat
landscapes.

References

1. Hoffman A (2020) Web application security: exploitation and countermeasures for modern
web applications. O’Reilly, USA
2. Khan S, Kabanov I, Hua Y, Madnick S (2022) A systematic analysis of the capital one data
Breach: critical lessons learned. ACM Trans Privacy Secur 26(1):29. [Link]
3546068
3. PortSwigger (2023) Server side request forgery. [Link]
Accessed 25 Nov 2023
4. HackerOne (2023) SSRF in [Link] via ?url= parameter. [Link]
514224. Accessed 12 May 2023
5. HackerOne (2023) SSRF in exchange leads to ROOT access in all instances. [Link]
com/reports/341876. Accessed 18 May 2023
6. HackerOne (2023) SSRF on (blank) allowing internal server data access. [Link]
com/reports/326040. Accessed 23 May 2023
7. HackerOne (2023) Bypass of the SSRF protection in Event Subscriptions parameter. https://
[Link]/reports/386292. Accessed 23 May 2023
8. HackerOne (2023) SSRF in upload IMG through URL. [Link]
Accessed 1 June 2023
9. HackerOne (2023) Server-side request forgery using Javascript allows exfill data from Google
metadata. [Link] Accessed 2 Jun 2023
10. HackerOne (2023) SSRF in webhooks leads to AWS private keys disclosure. [Link]
com/reports/508459. Accessed 2 June 2023
11. Tsai O (2023) How i chained 4 vulnerabilities on GitHub enterprise, from SSRF execution chain
to RCE! [Link] Accessed 3
June 2023
12. 10degres (2023) AWS takeover through SSRF in JavaScript. [Link]
ssrf-javascript/. Accessed 4 June 2023
13. Opnsec (2023) Into the Borg – SSRF inside Google production network. [Link]
2018/07/into-the-borg-ssrf-inside-google-production-network/. Accessed 4 June 2023
1 Mitigating SSRF Threats: Integrating ML … 15

14. Shorebreak Security (2023) SURF’s up! Real world server-side request forgery
(SSRF). [Link]
forgery-ssrf/. Accessed 5 June 2023
15. Detecting server side request forgery (SSRF) attack by using deep learning. IJACSA: Int J Adv
Comput Sci Appl 12(12). [Link]
Server_Side_Request_Forgery.pdf
16. Jabiyev B, Mirzaei O, Kharraz A, Kirda E (2021) Preventing server-side request forgery attacks.
In: Proceedings of the 36th annual ACM symposium on applied computing (SAC ’21), associ-
ation for computing machinery, New York, NY, USA, pp 1626–1635. [Link]
3412841.3442036
17. Mamun M, Rathore M, Habibi Lashkari A, Stakhanova N, Ghorbani A (2016) Detecting mali-
cious URLs using lexical analysis, pp 467–482. [Link]
1_30
18. Vanin P, Newe T, Dhirani LL, O’Connell E, O’Shea D, Lee B, Rao M (2022) A study of
network intrusion detection systems using artificial intelligence/machine learning. Appl Sci
12(22):11752. [Link]
19. Späth C, Mainka C, Mladenov V, Schwenk J (2016) SoK: XML parser vulnerabilities.
In: Proceedings of the 10th USENIX conference on offensive technologies (WOOT’16).
USENIX Association, USA, pp 141–154. [Link]
woot16/[Link]
20. Ali Z, Abduljabbar Z, Tahir H, Sallow A, Almufti S (2023) Exploring the power of eXtreme
gradient boosting algorithm in machine learning: a review. Acad J Nawroz Univ 12:320–
334. [Link] [Link]
ajnu/article/view/1612
21. GitHub Repository:ssrf-prevention-main. [Link]
Chapter 2
Multi-directional Molecular
Communication Among Nanomachines:
Modeling and Verification

Athraa Juhi Jani and Jafar J. Jani

1 Introduction

Nanomachines are the core operational units of nanosystems. They perform specific
functions like communication, calculation, data preservation, detection, and action at
the atomic level [1, 2]. Nanomachines working together form nanonetworks, which
enhance their individual capabilities. These networks allow nanomachines to collab-
orate on complex tasks, expanding their potential applications [3, 4]. Molecular com-
munication is a promising method for enabling nanomachines to interact. Inspired by
biological systems that use molecules to communicate, this approach involves send-
ing information-carrying molecules between nanomachines. These molecules trigger
biochemical reactions at their destination, allowing for effective communication at
the nanoscale [2, 5, 6]. Understanding how molecules move through a medium is a
key challenge in molecular communication [1, 7]. In diffusion-based molecular com-
munication, the molecular channel is the pathway between nanomachines [8–10].
Calcium signaling is a molecular communication method that employs calcium ions
as information carriers. These ions are released by a sender nanomachine and act as
messengers to convey information to one or multiple recipient nanomachines [11].
Calcium signaling is a biological process where cells communicate directly with
each other. This occurs through gap junctions in cell membranes, which permit the
transfer of molecules, including calcium ions, between adjacent cells [1].
In [12], the central objective is to establish a model for a time-slotted communica-
tion system among nanoscale machines within a one-dimensional environment. This
system incorporates bio-inspired rules assessed during each interval. To validate the

A. J. Jani (B)
Al-Nahrain University, Kadhmiya, Baghdad, Iraq
e-mail: athraajuhi@[Link]
J. J. Jani
Ministry of Labour and Social Affairs, Baghdad, Iraq
e-mail: jafaraljani@[Link]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 17
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
18 A. J. Jani and J. J. Jani

proposed model, diverse network sizes were examined using the probabilistic model
checking tool PRISM.
The authors in [13] propose a channel model for molecular single input multiple
output (SIMO) systems to estimate their channel response, where, the model charac-
terized by its recursive nature, provides an analytically derived closed-form solution
for the channel response of molecular 2-Rx SIMO systems. Additionally, a simpli-
fied model with lower complexity has been presented, offering a trade-off between
computational efficiency and slightly reduced accuracy in channel estimation. These
models have been extended to encompass molecular SIMO systems with more than
two receivers. The performance of these methods has been evaluated across various
topologies with different parameters, and the model’s accuracy has been verified
by comparing it to computer-simulated channel estimations, employing quantitative
error metrics such as root-mean-squared error. The effectiveness of the simplified
model is affirmed by assessing the level of deviation, indicating satisfactory channel
modeling performance with reduced computational requirements.
The authors in [14] explore enhancing interaction with biological systems using
nano-scale devices that communicate through molecular signals. Micro-scale inter-
mediaries facilitate communication, with specific molecules representing informa-
tion bits. Diffusion-related interference is mitigated through statistical methods.
The authors present a transmission model and numerical simulations, evaluating
the impact of transmitter scheduling and signal strength on the bit error rate.
In [15] the authors explore molecular communication, by envisioning nanonet-
works facilitating collective communication on attributes like scent, taste, light, or
chemical states. The focus has been on modeling different communication channels
within nanonetworks—molecular multiple-access, broadcast, and relay channels—
with calculated capacity expressions. Numerical results suggested that, with optimal
molecular communication parameters, multiple nanomachines can efficiently access
a single nanomachine. Molecular broadcast enabled one nanomachine to commu-
nicate with several others. The integration of molecular multiple-access and broad-
cast channels has highlighted the potential of molecular relay channels to enhance
communication capacity between two nanomachines through a relay nanomachine.
In [16], the primary focus lies in developing a capacity expression for a single
molecular channel, involving communication between a Transmitter Nanomachine
(TN) and a single Receiver Nanomachine (RN). Additionally, the authors explore
the capacity of a molecular multiple-access channel, where multiple TNs commu-
nicate with a single RN. Numerical findings have indicated that both single and
multiple-access molecular channels have the potential to achieve high molecular
communication capacities.
This paper introduces a novel multi-directional molecular communication model
for nanomachines, inspired by natural systems. The model allows nanomachines in
a network to exchange information by releasing and detecting molecular concentra-
tions. Each nanomachine operates on a synchronized schedule, taking turns to send
and receive messages. The proposed model improves reliability by considering vari-
ous transmission methods, including relay nodes, and accounting for potential com-
munication failures. Inspired by interconnected biological systems and incorporating
2 Multi-directional Molecular Communication Among Nanomachines … 19

aspects of calcium signaling and the Abelian Sandpile Model, the communication
channel is designed to simulate the complex interactions between nanomachines [17].
The paper employs the PRISM model checker to simulate and analyze the proposed
multi-directional molecular communication model across different network sizes.
By evaluating transmission and reception success rates under various conditions, the
study provides valuable insights into the strengths and weaknesses of this communi-
cation method in nanoscale environments. The findings contribute to understanding
the reliability and efficiency of molecular communication in complex networks.

2 Proposed Model

In our setup, there are four nanomachines denoted as Si , where i belongs to the
set 1, 2, . . . , n. Each nanomachine has two functions: it can either send information
(transmission mode) or receive it (reception mode). These nanomachines commu-
nicate with each other through a network of connected spots or nodes, as shown in
Fig. 1.
Figure 1 shows a representation of the proposed model of a multi-directional
nanonetwork consisting of four nanomachines (S1 , S2 , S3 , and S4 ) which are con-
nected through channel nodes. The channel nodes are represented by squares; they
are labeled as Ch x y Z , where x y indicates the connected nanomachines and Z indi-
cates the node’s position within the channel. For example, the node labeled Ch 24 2
is connected to nanomachines S2 and S4 , and it is the second node in the channel
between them. Thus, to give another example, S1 and S2 are connected by a channel
containing the following nodes Ch 12 1, Ch 12 2, and Ch 12 3.

Fig. 1 Model’s Illustration

20 A. J. Jani and J. J. Jani

Our model focuses on ensuring the dependable transmission and reception of

molecular concentrations among nanomachines. The PRISM verification tool is
employed to confirm this reliability [18]. Similar to a tissue of interacting cells,
this system can be understood as a dynamic network of interconnected components.
In this framework, the released information molecules from any nanomachine (Si )
navigate the network, reminiscent of cellular communication through calcium signal-
ing [19]. The channel nodes, however, operate based on principles from the Abelian
Sandpile Model (ASM) [17, 20]. In our model, molecules take the place of sand
particles. Similar to a sandpile, we assign values to different locations in the system.
These values gradually increase as molecules accumulate, much like grains building
up on a slope. When a value reaches a critical point, a ‘collapse’ occurs, dispersing
molecules to neighboring locations and increasing their values [21].
Proposed Thresholds: Any Ch x y Z node accumulating molecule concentrations
beyond the limit L rapidly distributes the surplus to its adjacent node. Each node,
including the nanomachines labeled Si , has a buffer capable of holding up to E
molecules. This buffer size exceeds the threshold L for molecule release. Initially,
all Ch x y Z have the same molecular concentration, set at L.
Time-Based Access Control : The system employs a time division scheme, with time
slots, where each time interval is of length τ . The network’s spatial configuration
determines the length of a time slot, τ , which remains fixed for the system. The for-
2
mula τ = dist
L
determines the time slot length, with dist being the distance between
any pair of nanomachines in Si and L being the molecular concentration threshold.
Algorithm 1 outlines the fundamental principles underlying our model.
Nanomachines Si : Each nanomachine in the system, denoted as Si , can func-
tion in either transmission or reception mode, but not both simultaneously. For
example, if S2 is transmitting, it cannot detect any transmitted molecular concen-
tration. Within each time slot, indicated by τ , the minimum molecular concentra-
tion that nanomachines Si can absorb from the environment is α, which is lower
than the threshold L. A local timer, represented by τlocal , is assigned to each
nanomachine within the system. A nanomachine switches between sending and
receiving states according to its local timer, τlocal . During each τlocal cycle, it can
transmit a molecular amount of q, restricted to be less than L. A nanomachine
transmits with a probability of 1 divided by the total number of nanomachines
in the system p = x1 . In this case, with four nanomachines, the transmission
probability is 41 . The duration of the local timer, τlocal , can be calculated as the
logarithm of the ratio of the transmitted molecular concentration q to the thresh-
old concentration L. Each nanomachine follows one of two behaviors based on
its internal timer, τlocal : continuous listening or initiating a transmission of q
molecules at the start of the timer interval. It’s crucial to note that these timers
operate independently, meaning the start of a timer in one nanomachine doesn’t
necessarily align with the start of another. The transmission of q molecules by
a nanomachine is subject to a failure rate of , where is not equal to zero.
Consequently, if, for instance, S3 is actively transmitting and sends q molecular
2 Multi-directional Molecular Communication Among Nanomachines … 21

concentration through the channel nodes labeled as Ch 34 Z (Z belongs to the set

1, 2, . . . , n), it can be confirmed that S4 receives at least q with a probability
of no less than 1 − . To validate the system’s ability to send and receive data
accurately, we utilized the PRISM model checker.
Ch x y Z Nodes: At the outset, we assume that the molecular concentration within
all Ch x y Z nodes is established at L. If the molecular concentration surpasses
the threshold L, any excess molecules denoted as e are immediately sent to the
neighboring node:
ifCh x y Z (τ ) > L

Then
Ch x y (Z + 1)(τ ) = Ch x y (Z + 1)(τ −1) + e Z (1)

In this context, the value of e Z represents the excess molecular concentration of

the channel node Ch x y Z , as determined by Equation (2), corresponding to the
value of q:
e Z = Ch x y Z (τ ) − L (2)

As a result, Ch x y (Z + 1) transfers its excess molecular concentration e(Z +1)

to Ch x y (Z + 2), and this sequence continues until q reaches S y , assuming Sx
initiated the transmission of q at the beginning of its τlocal .
In this multi-directional system, nanomachines in transmission mode can transmit
molecular concentrations to adjacent channel nodes. However, there is an additional
capability: A nanomachine in transmission mode can transmit molecular concentra-
tions to channel nodes on both sides if it possesses a sufficient amount of molecular
concentration (i.e., sends q twice within the same τlocal ; for example, S1 can transmit
to Ch 12 1 and to Ch 13 1). This facilitates enhanced data propagation. A nanomachine
in reception mode cannot receive molecular concentrations sent from channel nodes
on both sides. If two nanomachines transmit q to the same destination nanomachine
in close temporal proximity, it can result in communication jamming, potentially
leading to transmission/reception failure.
Algorithm 1 offers a systematic method for managing communication between
nanomachines within a multi-directional nanonetwork. It ensures efficient opera-
tion of nanomachines within the network by synchronizing their activities using
local clocks and adjusting to the system’s transmission and reception needs. More-
over, it tackles potential obstacles such as transmission failures and special scenarios
involving enhanced data propagation.
Algorithm 1 sets up essential parameters for the nanonetwork’s functioning,
encompassing the count of nanomachines (n), the threshold concentration of
molecules (L), buffer capacity (E), the inter-nanomachine distance (dist), and the
likelihood of transmission/reception failure (). Furthermore, it standardizes the
molecular concentration within each channel node, typically aligning it with the
threshold concentration (L).
22 A. J. Jani and J. J. Jani

Algorithm 1 Multi-directional Nano Network Communication

Require: Number of nanomachines (n), channel connectivity, threshold (L), buffer size (E),
minimum reception level (α), transmission probability ( p), transmission failure probability ()
Ensure: State flag (T/R) and local timer value (τlocal ) for each nanomachine, molecular concen-
tration level (C[x y] Z ) for each channel node

Initialize C[x y] Z = L for all channel nodes

Initialize τlocal = 0 for all nanomachines
Randomly set state flag (T/R) for each nanomachine

repeat
for each nanomachine Si do
if state flag is R then
Receive α molecular concentration
Update τlocal += τ
else if state flag is T then
if τlocal < log(q/L) then
if random number < p then
if C[xi]1 ≥ q then
Send q molecules to connected channel nodes (Ch [xi] Z ): C[xi] Z += q (with
failure probability )
Update C[xi]1 −= q
else
Not enough concentration; remain in transmission mode
Update τlocal += τ
end if
else
Remain in transmission mode
Update τlocal += τ
end if
else
Switch state flag to R
Reset τlocal = 0
end if
end if
end for
for each channel node Ch [x y] Z do
if C[x y] Z > L then
Calculate excess molecules: e Z = C[x y] Z − L
Send e Z to neighboring node (if possible): C[x y]( Z +1) += e Z (only if buffer allows and
not exceeding E)
Update current concentration: C[x y] Z = L
end if
end for
until termination condition is met
2 Multi-directional Molecular Communication Among Nanomachines … 23

The algorithm progresses through each time slot, where a time slot represents
a discrete time interval during which communication operations take place. Con-
currently, it updates the local clock of each nanomachine to ensure synchronization
across the network. Utilizing this synchronized clock, each nanomachine evaluates
whether it should engage in transmission or reception mode during the current time
slot. When a nanomachine operates in transmission mode, it computes the transmis-
sion probability ( p), determining its transmission action based on this probability.
If transmission occurs, the nanomachine computes the quantity of molecular con-
centration (q) to be transmitted. Should this transmitted concentration surpass the
predefined threshold (L), surplus molecules are distributed to neighboring nodes
within the channel. When a nanomachine operates in reception mode, it accepts
molecular concentrations from neighboring nodes within the channel and adjusts
its local molecular concentration accordingly. Special cases, such as transmitting to
both sides of a nanomachine, are considered within the algorithm, alongside checks
for transmission and reception failures. The algorithm iterates through the aforemen-
tioned steps for each time slot until either the communication process concludes or a
termination condition is satisfied. Hence, the total time complexity of the algorithm
is contingent upon both the quantity of time slots and the count of nanomachines.

3 Formal Verification Using PRISM

We used the PRISM verification tool to thoroughly assess and compare the outcomes
of our model. This advanced tool not only enhances our model’s capabilities but also
helps us effectively evaluate results, especially in complex scenarios.
PRISM is a flexible tool tailored for handling probabilistic models in real-world
situations. It allows for defining probabilities within the model itself and in the prop-
erties under examination. Additionally, the software can determine the probability
of failure for specific properties once the verification process is complete [22]. This
tool operates by taking a system description as input, typically written in PRISM’s
system description language. This language is an extension of Reactive Modules,
tailored for probabilistic scenarios. Subsequently, the tool constructs a model based
on this description and generates a set of reachable states [23]. PRISM offers a
noteworthy feature: step-by-step simulation. This functionality allows users to select
system variables for manipulation and define their initial values. During simulation,
users have the option to manually guide the process by organizing or randomizing
steps. When opting for randomization, users can specify the number of random steps
the program should simulate [22]. The model was conceptualized as a discrete-time
Markov Chain (DTMC) and subjected to examination using the probabilistic model
checker PRISM. A DTMC module functions as a transition system in which each
transition is associated with a probability, guaranteeing that the cumulative proba-
bility of all outgoing transitions from a particular state equals one. This arrangement
creates a probability space encompassing infinite paths within the model, enabling
the quantitative assessment of the likelihood of specific events transpiring [23].
24 A. J. Jani and J. J. Jani

Results Verification: The PRISM model was created in accordance with the guide-
lines outlined in Algorithm 1. As per the algorithm, during each time interval τlocal ,
a nanomachine has the opportunity to transmit a molecular concentration q, with a
probability of p = 41 . In transmission mode, a nanomachine can send this concentra-
tion to a neighbor on one side or to neighbors (i.e., channel nodes) on both sides by
transmitting q twice within the same τlocal . However, in reception mode, a nanoma-
chine cannot receive molecular concentrations sent from neighbors on both sides,
as this could lead to communication jamming, resulting in transmission or reception
failure.
Two experiments have been implemented on a model with a channel of two
nodes Ch [x y]Z between each two nanomachines Si , and a model with a channel
of five nodes Ch [x y]Z between each two nanomachines Si . The initial values of
molecule concentration in all the channel nodes are L. The values of thresholds in both
experiments were E = 15, L = 5; the initial values of the molecular concentration
in the four nanomachines were 8, 6, 7, 9; and the value of q = 3.
Four properties were subjected to verification, focusing on the success and fail-
ure of both transmission and reception processes. Flags are employed to indicate
the mode of nanomachines, with Si send = tr ue signifying transmission mode and
Si r eceive = tr ue indicating reception mode.
The first property, called Send-Success, is satisfied under the following condi-
tions:
• One nanomachine is in transmission mode while all other nanomachines are in
reception mode.
• The transmitting nanomachine sends q molecular concentration to its neighbor.
Failure to meet any of these conditions disables the Send-Fail property, which is
the second property verified.
Similarly, the property Receive-Success holds true under the following circum-
stances:
• One nanomachine is in reception mode while all other nanomachines are in
transmission mode.
• The receiving nanomachine successfully receives q molecular concentration from
its neighbor.
Failure to meet any of these conditions disables the Receive-Fail property.
Hence, the properties subject to verification are as follows:

P =?[F(Send − Success = tr ue)] (3)

P =?[F(Send − Fail = tr ue)] (4)

P =?[F(Receive − Success = tr ue)] (5)

P =?[F(Receive − Fail = tr ue)] (6)

2 Multi-directional Molecular Communication Among Nanomachines … 25

Fig. 2 Results of Properties (3) and (4) Verification in a Multi-Access Channel

The figures presented below display the results of the property verification:
Figure 2 displays the outcomes of verifying two properties: (3) and (4). In this
figure, the dotted line signifies the probability of failure in sending (i.e., the result of
verifying Property (4)) in two experiments conducted on networks of different sizes.
Specifically, the square symbol represents the experiment with a network featuring a
two-node channel, while the circle symbol represents the experiment with a network
featuring a five-node channel. Conversely, the solid line in Fig. 2 illustrates the
probability of success in sending (i.e., the result of verifying Property (3)) in the two
experiments conducted on networks of different sizes.
Figure 3 illustrates the outcomes of verifying the success and failure properties
in receiving. In this figure, the dotted line indicates the probability of failure in
receiving (i.e., the result of verifying Property (6)) in two experiments conducted on
networks of different sizes. Specifically, the square symbol represents the experiment
with a two-node channel network, while the circle symbol represents the experiment
with a five-node channel network. Conversely, the solid line in Fig. 3 depicts the
probability of success in receiving (i.e., the result of verifying Property (5)) in the
two experiments conducted on networks of different sizes.
26 A. J. Jani and J. J. Jani

Fig. 3 Results of Properties (5) and (6) Verification in a Multi-Access Channel

4 Conclusion

In this paper, we introduced and verified a novel multi-directional molecular com-

munication model for nanomachines. Inspired by natural systems, our model allows
nanomachines to effectively transmit and receive molecular concentrations in a net-
worked environment. We implemented our model using the PRISM model checker
and conducted experiments on networks with varying number of channel nodes. Our
verification process focused on examining the success and failure of both transmis-
sion and reception processes under different scenarios. We introduced properties
such as Send-Success, Send-Fail, Receive-Success, and Receive-Fail, which were
subjected to verification. These properties provided insights into the reliability and
efficiency of communication in nanoscale networks. The results of our experiments
shed light on the challenges and opportunities of multi-directional molecular com-
munication. We observed that the probability of failure in sending and receiving is
influenced by factors such as the number of nanomachines and their ability to transmit
to multiple neighbors within a time slot. Additionally, we found that the probability
of successful transmission generally remains high as the network size (as represented
by the number of channel nodes) increases. This indicates that larger networks with
more nodes might provide better redundancy communication reliability, reducing
the chances of transmission failure. Furthermore, the verification results showed that
that the probability of successful reception tends to be higher in larger networks. This
suggests that larger networks with more nodes can facilitate better communication
reliability, reducing the likelihood of reception failures.
2 Multi-directional Molecular Communication Among Nanomachines … 27

References

1. Akyildiz IF, Brunetti F, Blázquez C (2008) Nanonetworks: a new communication paradigm.

Comput Networks 52(12):2260–2279
2. Jani AJ (2018) Anti-quorum sensing nanonetwork. Indian J Public Health Res Dev 9(12):1108–
1114
3. Nakano T, Eckford AW, Haraguchi T (2013) Molecular communication. Cambridge University
Press
4. Jani AJ (2023) Estimating nodes’ number in a nanonetwork using two algorithms. In: Lecture
Notes in Networks and Systems, Volume 578, pp 645–652. Springer
5. Hiyama S, Moritani Y, Suda T, Egashira R, Enomoto A, Moore M, Nakano T (2006) Molecular
communication. J Inst Electron Inform Commun Eng 89(2):162
6. Jani AJ (2020) Pattern of diffusion recognition in a molecular communication model. In:
Applied computing to support industry: innovation and technology: first international con-
ference, ACRIT 2019, Ramadi, Iraq, September 15–16, 2019, Revised Selected Papers 1, pp
349–363. Springer
7. Jani AJ (2019) Challenges and distinctions in nanonetworks design. In: 2019 2nd international
conference on engineering technology and its applications (IICETA), pp 219–224. IEEE
8. Atakan B, Akan OB (2008) On channel capacity and error compensation in molecular
communication. In: Transactions on computational systems biology X, pp 59–80. Springer
9. Jani AJ (2019) Consensus problem with the existence of an adversary nanomachine. In:
Advances in intelligent systems and computing: Volume 931, pp 407–419. Springer
10. Juhi A, Kowalski DR, Lisitsa A (2016) Performance analysis of molecular communication
model. In: 2016 IEEE 16th international conference on nanotechnology (IEEE-NANO), pp
826–829. [Link]
11. Nakano T, Hsu YH, Tang WC, Suda T, Lin D, Koujin T, Haraguchi T, Hiraoka Y (2008)
Microplatform for intercellular communication. In: Nano/Micro engineered and molecular
systems, NEMS 2008. 3rd IEEE international conference on, pp 476–479. IEEE
12. Jani A (2023) Bio-inspired nano communication system. In: Proceedings of the 2023 6th
international conference on information science and systems, pp 154–159
13. Yaylali G, Akdeniz BC, Tugcu T, Pusane AE (2023) Channel modeling for multi-receiver
molecular communication systems. IEEE Trans Commun
14. Moore MJ, Okaie Y, Nakano T (2014) Diffusion-based multiple access by nano-transmitters
to a micro-receiver. IEEE Commun Lett 18(3):385–388
15. Atakan B, Akan OB (2010) On molecular multiple-access, broadcast, and relay channels
in nanonetworks. In: 3d International ICST conference on bio-inspired models of network,
information, and computing systems
16. Atakan B, Akan OB (2009) Single and multiple-access channel capacity in molecular
nanonetworks. In: International conference on nano-networks, pp 14–23. Springer
17. Járai AA (2014) Sandpile models. arXiv preprint arXiv:1401.0354 pp 1–66
18. Al-Krizi AJ (2018) Communication models and protocols for diffusion based networks. The
University of Liverpool (United Kingdom)
19. Clapham DE (1995) Calcium signaling. Cell 80(2):259–268
20. Paoletti G (2014) The abelian sandpile model. In: Deterministic Abelian Sandpile models and
patterns, pp 9–35. Springer
21. Bak P, Tang C, Wiesenfeld K (1987) Self-organized criticality: an explanation of the 1/f noise.
Phys Rev Lett 59(4):381
22. Kwiatkowska MZ, Thachuk C (2014) Probabilistic model checking for biology. Software Syst
Safety 36:165
23. Ammar M, Mohamed OA (2011) Formal verification of Time-Triggered Ethernet protocol
using PRISM model checker. IEEE
Chapter 3
Secure Sharing of Electronic Health
Records Using Smart Contracts
and Role-Based Access Control

K. Sree Shivesh, S. Vijayalakshmi, R. Naveen Kartik, V. Deepak,

V. Dharaneesh, and Marri Lokesh Yadav

1 Introduction

Blockchain is a decentralized system that operates as an immutable digital ledger.

User security and data consistency are achieved by the use of various consensus
algorithms and cryptography techniques. Data is tamper-proof when the blocks are
added to the chain. A variety of use cases make use of the decentralized and tamper-
proof abilities of the blockchain such as supply chains, voting systems, banking, and
payments. As data in blockchain is immutable, blockchain technology can also be
implemented in the healthcare sector for storing EHRs. This will, in turn, help to
build trust between the healthcare provider and the patient.
The sensitive medical data of the patient which are stored in their electronic
health records (EHRs) should be protected and managed in today’s digital age. Very
strong access control measures are required to protect the integrity of the data stored

K. S. Shivesh (B) · S. Vijayalakshmi · R. N. Kartik · V. Deepak · V. Dharaneesh · M. L. Yadav

PSG College of Technology, Coimbatore, Tamil Nadu 641004, India
e-mail: sreeshiveshpsg@[Link]
S. Vijayalakshmi
e-mail: [Link]@[Link]
R. N. Kartik
e-mail: ramnkartik@[Link]
V. Deepak
e-mail: deepakvellingiri4@[Link]
V. Dharaneesh
e-mail: dharaneeshv6@[Link]
M. L. Yadav
e-mail: lokeshmarri2002@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 29
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
30 K. S. Shivesh et al.

in these health records. Existing access control mechanisms helped strengthen the
shared access of EHRs through the use of blockchain technology. This approach
helped improve data confidentiality and security [1]. The introduction of Role Based
Access Control aims to enhance the existing mechanism through the use of various
roles such as doctors, nurses, lab technicians, and patients. By storing the EHR in an
InterPlanetary File System (IPFS), we ensure data integrity and efficient data acces-
sibility. This paper aims to introduce a new paradigm in EHR access management,
enabled by the immutability of blockchain networks and by the secure storage of the
IPFS.

1.1 Role Based Access Control

Role Based Access Control (RBAC) is a proven access control technique that provides
a simple and efficient way to manage who has access to what within an organization’s
digital ecosystem [2]. It specifies the roles that are to be carried out by the users like
doctors, nurses, lab technicians, and patients, and then specifies the possible actions
or access to data that can be done according to their roles. This ensures that the right
people have the right level of access to the information. It also safeguards against
data breaches, unauthorized access, and misuse of sensitive information.
This paper provides a comprehensive analysis of the performance and resource
utilization aspects of an RBAC system. Specifically, a meticulously designed script is
used to benchmark the time to perform operations for granting access and verifying
access, as well as CPU and memory usage while performing those pivotal tasks in an
RBAC system. The script compares two types of RBAC models, Flat and Hierarchy,
using a Python library. Additionally, the graphical representations demonstrate the
comparison of how the two RBAC models perform in the system. The major portion
of the graphical presentation shows that the Hierarchy model is superior to the Flat
model in terms of access time and resource consumption. The following chapters
elaborate on the results of the two RBAC models.

2 Related Works

In their discussion of the security concerns with moving electronic health records
to cloud storage, Zhou et al. [2] emphasized the significance of limiting unwanted
access. To enforce policies, they suggested cryptographic access control systems and
role-based encryption mechanisms.
Sookhak et al. [1] proposed the usage of blockchain-related concepts to provide
strong access control which explores the impact of technologies like eHealth on
secure EHR management. It highlighted the important security requirements in
designing fault-tolerant access control.
3 Secure Sharing of Electronic Health Records Using Smart Contracts … 31

Saini et al. [3] addressed issues with centralization of health care with scattered
electronic medical records (EMRs), proposing a blockchain-based access control
framework. Smart contracts are employed to secure EMR sharing among different
entities in the smart healthcare system. Various types of smart contracts were used
for user access authorization, monitoring activities, and access revocation. Hashes
are kept in the blockchain, and patient digital health records are saved in the cloud in
an encrypted form. A private Ethereum system was used to evaluate the effectiveness
of the system.
Guo et al. [4] and Yang et al. [5] proposed the combination of blockchain and edge
nodes for controlling access to Electronic Health Record (EHR) data. Off-chain edge
nodes hold medical data while blockchain manages access to records. Identification
and access control policies are managed by a blockchain-based controller. Their anal-
ysis, which is centered on preventing unwanted data extraction, uses the Hyperledger
Composer Fabric blockchain to measure transaction processing and response time
while analyzing the effectiveness of smart contracts and policy implementation.
Ekblaw et al. [6] introduced MedRec, a blockchain-based EHR management
system facilitating secure access, authentication, confidentiality, and data sharing
for patients; Linn et al. [7] proposed a blockchain based access control manager
for medical records. This supports precision medicine helping in industry inter-
operability. MedRec encourages stakeholders to act as “miners”, promotes data
economics, and interacts with current solutions. These developments could improve
patient health accountability and research.
The attitude of patients toward blockchain-enabled health information exchange
(HIE) was examined by Esmaeilzadeh et al. [8]. It dealt with the ignorance about
how patients feel about this technology. Respondents from 2013 participated in
sixteen web-based tests that examined various HIE scenarios. Remarkably, patients
show a propensity for blockchain-based privacy-preserving and information-sharing
systems. The report emphasizes how blockchain technology in health information
exchange (HIE) can be used to protect and commercialize health data interchange,
while also outlining its drawbacks.
Dagher et al. [9] discuss how efforts to improve the security of electronic health
records have not stopped recurrent data breaches involving patient information. It
showcases the Ancile blockchain-based platform, which aims to preserve patient
data privacy while offering safe and conveniently accessible medical information
for patients, providers, and other stakeholders. Ancile uses advanced cryptography
and Ethereum-based smart contracts to achieve improved access control and data
obfuscation. The purpose of the article is to examine how Ancile responds to the
demands of different stakeholders.
Mettler [10] emphasized how blockchain technology is becoming more and more
flexible in a variety of industries, including health care. Although its use has primarily
been in the financial sector, it is expanding to other industries, like health care.
The paper examines a number of blockchain-related applications in health care,
including managing public health, user-centered medical research, and combating
pharmaceutical counterfeiting.
32 K. S. Shivesh et al.

A decentralized, permissioned blockchain system for user-centric health data

sharing was presented by Liang et al. [11]. The system uses a smartphone appli-
cation to gather health data, syncs it to the cloud, and then shares it with insurance
and healthcare providers. For batch processing of huge health data sets and scalability,
the method used tree-based data processing.
The influence of blockchain technology on healthcare data sharing is highlighted
by Loh [12] and Pari et al. [13], who also emphasize the technology’s ability to
address security and trust challenges related to Electronic Health Records (EHRs).
It emphasizes interoperability, privacy, and control that is centered on the patient.
In particular, it is suggested to employ Ethereum and Hyperledger Fabric to create
a permissioned blockchain system that would enable role-based access and give
patients ownership over their medical records.
The restrictions on the privacy and interoperability of electronic health records
(EHRs) were discussed by Han et al. [14]. It provided blockchain-based plans for
EHR enhancement, thereby examining how blockchain might be used to address
these problems. In order to deploy blockchain-based electronic health records, it
highlights the ongoing research needs in the data sciences and health informatics
domains. These factors include issues such as discrepancies in healthcare resources
and the possibility of increased skepticism from patients and healthcare practitioners
as blockchain usage grows.
Ding et al. [15] and Thwin et al. [16] emphasized the need for access control in
healthcare data sharing by introducing a blockchain-based access control mechanism
to enhance data security and privacy. They also focused on the aspects of the develop-
ment of robust, patient-centric EHR management systems that ensure data security,
privacy, and accessibility in an increasingly digitized healthcare landscape. They
addressed the potential transformation of healthcare data management by utilizing
blockchain and smart contracts.
McFarlane et al. [17] and Jain et al. [18] discussed new standards for data security
by employing a hybrid access control system and public key cryptography with
Know Your Customer (KYC) verification for user authentication respectively in their
papers. They also emphasized the immutability of data records and provided ways
to ensure the integrity of the patient’s health information. Both papers prioritized
patient’s control over their own health data.
Jayasinghe et al. [19] and Mamun et al. [20] provided a comprehensive review
of how blockchain can address the limitations of traditional EHRs while providing
insights into future research directions, covering concepts, prototypes, and imple-
mentations. They target scalability, security, and interoperability issues in centralized
EHR systems by introducing blockchain-based solutions. This transformation will
provide improved data security, streamlined access, and patient care.
Bai et al. [21] and UmaMaheswari et al. [22] gave an overview on blockchain tech-
nology that is well known for its decentralized infrastructure offering transparency
and scalability. With the potential to streamline operations, reduce costs, and enhance
transparency, it is set to transform governance, as illustrated by the Indian Govern-
ment’s strategy outlined in the NITI Aayog report. The blockchain applications can
be utilized across multiple domains, offering a means for direct citizen-government
3 Secure Sharing of Electronic Health Records Using Smart Contracts … 33

interaction and tailored service provision, as illustrated by the framework proposed

for vehicle data management by the Indian Road Transport Authority.

3 Proposed System Architecture

As illustrated in Fig. 1, all the actors interact with the blockchain in 2 ways. The
metadata about the medical records, data about access control, and actor details are
stored in the core blockchain. The actual medical records are stored in the Inter-
Planetary File System (IPFS) which is a blockchain platform designed especially for
file storage. Actors interact with the core blockchain and IPFS. Patients can upload
their medical record to the IPFS. The access rights of the files can be modified by
either granting or revoking access to external entities. The access rights are stored
in the core blockchain. Doctors can request access to a particular patient’s medical
record. When granted access, they can view records or update them according to
the access permission granted. Administrators can assign patients to doctors. Other
entities include lab technician, nurse, etc. Other entities can only view the records
and cannot write.
The user interaction and functionality of the system contain actions that ensure
secure sharing of EHRs. The login process is the initial step, requiring authentication
to guarantee the legitimacy of user access. This login action can be further improved to
provide feedback in case of invalid usernames or passwords to improve the interface.
View Medical Records, Share Medical Records, and Record Tracking and Revocation

Fig. 1 System architecture diagram of the proposed system

34 K. S. Shivesh et al.

are some of the actions found in the dashboard page, the page where the user is
navigated to after logging in.
Users have to undergo a verification process based on Role-Based Access Control
(RBAC) to access medical data. This mechanism ensures that users can only view
records for which they are authorized. The system displays the respective medical
records after authorizing the users.
As shown in Fig. 2, users can share their own medical records, or the ones that they
are authorized to share, to other users. Once the user shares to recipients, the system
generates access control rules. Then the selected recipients are notified. The access
control rules are checked when a user shares the medical records. Record Tracking
and Revocation functionality allows the users to monitor and track who has access
to their medical records. This transparency improves user awareness and helps them
control their sensitive information. Users can also revoke access granted to specific
individuals whenever necessary. This revocation feature helps in maintaining the
privacy and security of medical records. This allows users to adapt access permissions
in response to changing circumstances or preferences.

4 System Implementation

RBAC module offers strong characteristics which help in developing a safer system
for controlling the access requests, permissions, and user roles for any system in
Python. Such a concrete example is the creation of a specific domain, represented
by the “MedicalRecord” class, which is precisely designed to deal with medical
records within an EHR system. Roles define user rights with some of them like
“can_create”, “can_read”, and “can_update” created to define rights with aspects
of specificity. Accesses like create, read, and update are mutual with permissions
and duties as well. Even though the library utilizes the “add_permission” method
to define the correspondence between roles and permissions, allowing the users to
have a relatively broad set of rights within the system, they gain a rather detailed
set of permissive permissions to do something. For instance, the role “can_create” is
assigned with the “create” permission and therefore, any user assigned this role will
be able to start the process of creating a medical record.
Healthcare consumers make up the highest population of the system’s users, and
hence they will be represented by the “User” class. What is more, users are assigned
to particular authorities and privileges creating the broad opportunities to control
their actions step by step. On the other hand, the “Patient” class represents the actual
users of this application and is given the absolute decision-making power on whether
to allow or not permit the doctors to access personal data. The “request_permission”
function is used to help doctors to make an application for permission, and patients
who receive such an application have every right to agree or reject such an application
after much evaluation. The dynamic interchange here involves shifting from an open
system of decision-making structure that encompasses the end-user, the patient as the
case may be, which would facilitate the patient-centric model of care. This is done by
3 Secure Sharing of Electronic Health Records Using Smart Contracts … 35

Fig. 2 Sequence diagram of the proposed system

using this “[Link]” method, which basically will control the user permission while
avoiding users to access the information they are not allowed to. When users have
no authorization to a particular data, then the system throws an exception to the user
that he/she is unauthorized to access particular data.
The large-scale RBAC system required to manage interactions between patient and
physician is established to support these large-scale engagements. It exemplifies the
system’s extensibility and flexibility, and it presents doctors with the capability to get
permission for tasks that include viewing or altering medical information. Because
of this RBAC approach of tracking and managing user access, the RBAC library
36 K. S. Shivesh et al.

guarantees a responsible user experience as far as medical information is concerned

without losing sight of the patient’s experience in managing his/her records.

4.1 Deploying Smart Contracts

Smart contracts are tested for their effectiveness and security. This can be done on a
local blockchain environment with the help of various software such as Ganache to
mimic the blockchain performance or on a testnet, which is a simulated network that is
replicated on real mainnet conditions but with some hypothetical data. To achieve this,
proper testing has to be conducted in order to discover potential bugs and also check
and ascertain that the smart contract undergoes the right processes. The smart contract
in the proposed system’s architecture contains a data structure called “Record” which
incorporates the patient’s details, namely the patient’s name and their details. By
using two mappings, the contract maps “records” and “authorizedDoctors”, the first
of which maps Ethereum addresses of the patient to records. The “records” mapping
stores addresses of the records, and the “authorizedDoctors” mapping preserves
authorized Ethereum addresses for these records.
Access control within the smart contract is implemented through two distinct
modifiers: That is why they were given labels such as “onlyOwner” and “onlyDoc-
tor”. The “onlyOwner” modifier restricts some functions so that they would only be
called by the contract owner who developed and deployed the contract. The addition
of the modifier “onlyDoctor” helps to guarantee that some functions can only be
performed by Ethereum addressees who have been approved as doctors to prevent
unauthorized modification of patient’s records. Contract owners will be able to add
or remove authorized doctors. The authorized doctors can create new patient records,
modify the records, and retrieve the records that are related to the Ethereum address
of the patient. This has been made possible through the blockchain base computa-
tion system that guarantees the security and exclusive access to the patient’s data,
bringing along the aspect of transparency in the management of healthcare records.

5 Evaluation Metrics

The proposed system utilizes a script to measure how long access grant and access
verification operations take and to look for resource utilization—specifically, CPU
and memory—during the operations in a Role-Based Access Control (RBAC)
system. The test program uses an rbac object to represent a hypothetical RBAC
system which has methods for creating users and authorizing different actions. The
objective of the first test is to find the time taken to grant access to a user. The
function, “measure_access_grant_time” initializes a timer and simulates a system of
creating 10,000 users and authorizing various actions among them like create, read,
3 Secure Sharing of Electronic Health Records Using Smart Contracts … 37

and update. The total time we get after this simulation is then converted to millisec-
onds. This is labeled as “Access Grant Time”. For the next test, our objective is to
calculate the time taken to verify access for a user. The function “measure_access_
verification_time” catches all the authorization errors using the [Link] method. The
time taken to perform all the access verification operations is labeled as “Access
Verification Time”.
The final part of the script is to measure the total amount of CPU and memory
used during 1,000,000 access grant operations. The “measure_resource_utilization”
method retrieves the amount of CPU and memory used, via the psutil library. The
change in CPU usage is calculated and displayed as “CPU Usage Change” while the
change in memory usage is displayed as “Memory Usage Change”. The following
test is to compare the performance and usage of resources during access control
between the flat and the hierarchical models of the RBAC system.
By utilizing a Python script, we simulated and compared the two RBAC models:
Flat and Hierarchy. The following evaluation metrics are obtained by averaging the
values got from 10 simulations of access grant and verification along with resource
utilization. Significant differences are noted in the evaluation of the performance
metrics between the two models during simulation. The Access Grant Time and the
Access Verification Time for the Flat model are 25.56 ms and 11.36 ms respectively.
The change in CPU usage is 53.90%, while the change in memory usage is 1.74%. For
the Hierarchy model, the Access Grant Time averaged 16.896 ms whereas the Access
Verification Time took nearly 8.43 ms. Also, the change in CPU usage averages to
46.83% and the change in memory usage averages to 1.73%. On observing these
results, we can clearly see the efficiency of the Hierarchical model over the Flat
model. The obtained performance results gives the Hierarchical model a huge edge
over the Flat model in terms of access times and resource utilization. Overall, the
Hierarchy model gives faster access times and consumes less amount of resources.
The graphical representation of the comparisons of the flat and hierarchical RBAC
models are illustrated below where the blue line highlights the flat model while the
orange line accounts for the Hierarchical RBAC model. The number of iterations
is denoted by the x axis while the y axis represents various performance evaluation
metrics.
From the graph in Fig. 3, we can see how efficient and fast the Hierarchical model
is in granting access to the requested roles. The hierarchical model is fast as opposed
to the flat model, in the low and high number of iterations, respectively, for granting
access in less time. It is, on average, faster by 4 ms. From the graph in Fig. 3, we can
notice how the Hierarchical model is slightly faster than the Flat model for access
verification time. At 0 to 3000 and beyond 9000 iteration accesses, the hierarchical
model is faster than the flat model by approximately 1 ms. At 3000 to 9000 iterations,
the hierarchical model somewhat outbalances the flat model.
From the graph in Fig. 4, it can be observed that less CPU is consumed by the
Hierarchical model in the execution of access grant and verification processes. For
the number of iterations in the range of 0 to 2000, the flat model consumes less CPU
as compared to the hierarchical model. However, when the number of iterations
crosses a limit, the CPU utilized in the hierarchical model is 30% lesser than that of
38 K. S. Shivesh et al.

Fig. 3 Access grant time and verification time comparison

the flat model. This indicates that the hierarchical model is suitable for use cases that
have a large number of users.
From the graph in Fig. 4, it can be analyzed that the hierarchical model performs
a bit better than the flat model in terms of the memory utilized during the access veri-
fication and grant processes. Though there is an increase in the number of iterations,
which leads to some peaks and troughs in the graphs, the average memory consump-
tion is lesser in the case of the hierarchical model compared to that of the flat model.
Based on all the above aspects, the hierarchical model performs better than the flat
model, and therefore, it can be integrated with the proposed blockchain-based EHR
management system.

Fig. 4 CPU usage and memory usage comparison

3 Secure Sharing of Electronic Health Records Using Smart Contracts … 39

6 Conclusion

The blockchain technology can assure that EHRs are tamper-free, and it can provide
a trustworthy audit trail on any action done within the application. This fault-resistant
auditability is crucial for compliance, accountability, and traceability in the health-
care ecosystem and adds trust to a blockchain-based medical records management
system. Hierarchical Role-Based Access Control plays an important role in main-
taining security and privacy. This enables fine-grained control over access permis-
sions, allowing patients to provide custom access rights according to the roles and
responsibilities of people within the system. Using access control rules, the system
ensures that only authorized users gain access to medical records. This reduces the
risk of unauthorized access and maintains confidentiality of healthcare data. Hier-
archical Role Based Access Control proves to have better performance than the
Flat Role Based Access Control in terms of time, processing power, and memory
utilization. The powerful combination of RBAC, blockchain, and IPFS in this system
establishes a new standard for healthcare data management. This provides a robust
foundation for advancing patient care, medical research, and the overall efficiency of
healthcare delivery. In this digital age, the confidentiality, integrity, and availability
of sensitive medical data can be upheld with this innovative approach.

References

1. Sookhak M, Jabbarpour MR, Safa NS, Yu FR (2021) Blockchain and smart contract for access
control in healthcare: A survey, issues and challenges, and open issues. J Netw Comput Appl,
Elsevier 178:102950
2. Zhou L, Varadharajan V, Gopinath K (2016) A secure role-based cloud storage system for
encrypted patient-centric health records. Comput J 59(11):1593–1611
3. Saini A, Zhu Q, Singh N, Xiang Y, Gao L, Zhang Y (2020) A smart-contract-based access
control framework for cloud smart healthcare systems. IEEE Internet Things J 8(7):5914–5925
4. Guo H, Li W, Nejad M, Shen C-C (2019) Access control for electronic health records with
hybrid blockchain-edge architecture. In: 2019 IEEE International Conference on Blockchain
(Blockchain), IEEE
5. Yang Y, Shi R-H, Li K, Wu Z, Wang S (2022) Multiple access control scheme for ehrs combining
edge computing with smart contracts. Futur Gener Comput Syst 129:453–463
6. Ekblaw A, Azaria A, Halamka JD, Lippman A (2016) A Case Study for Blockchain in
Healthcare: “MedRec” prototype for electronic health records and medical research data. In:
Proceedings of IEEE Open & Big Data Conference, Chicago
7. Linn LA, Koo MB (2016) Blockchain for health data and its potential use in health IT and
health care related research. In: ONC/NIST Data Brief
8. Esmaeilzadeh P, Mirzaei T (2019) The potential of blockchain technology for health infor-
mation exchange: Experimental study from patients’ perspectives. J Med Internet Res
21:6
9. Dagher GG, Mohler J, Milojkovic M, Marella PB, Ancillotti E (2018) Ancile: Privacy-
preserving framework for access control and interoperability of electronic health records using
blockchain technology. Sustain Cities Soc, Elsevier 39:283–297
10. Mettler M (2016) Blockchain technology in healthcare: The revolution starts here. In: IEEE
18th International Conference on e-Health Networking, Applications and Services (Healthcom)
40 K. S. Shivesh et al.

11. Liang X, Zhao J, Shetty S, Liu J, Li D (2017) Integrating blockchain for data sharing and
collaboration in mobile healthcare applications. In: IEEE 28th Annual International Symposium
on Personal, Indoor, and Mobile Radio Communications (PIMRC)
12. Loh CM, Chuah CW (2021) Electronic medical record system using ethereum blockchain and
role-based access control. Appl Inf Technol Comput Sci, 2
13. Neelavathy Pari S, Rajashree S, Prakash A, Shanthosh RM (2022) Role based access control
framework for healthcare records using Hyperledger fabric. In: 3rd International Conference
on Issues and Challenges in Intelligent Computing Techniques (ICICT), pp 1–7
14. Han Y, Zhang Y, Vermund SH (2022) Blockchain Technology for Electronic Health Records.
Int J Environ Res Public Health
15. Ding Y, Feng L, Qin Y, Huang C (2022) Blockchain-based access control mechanism of feder-
ated data sharing system. In: 2020 IEEE Intl Conf on Parallel & Distributed Processing with
Applications, Big Data & Cloud Computing, Sustainable Computing & Communications,
Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), pp 277–284
16. Thwin TT, Vasupongayya S (2019) Blockchain-based access control model to preserve privacy
for personal health record systems. In: Security and Communication Networks
17. McFarlane C, Beer M, Brown J, Prendergast N (2017) Patientory: A healthcare peer-to-peer
EMR storage network v1.0. In: IEEE International Conference on Healthcare Informatics
(ICHI)
18. Jain M, Pandey D, Singh NP (2023) EHR: Patient electronic health records using blockchain
security framework. In: International Conference on Innovative Data Communication Tech-
nologies and Application (ICIDCA), Uttarakhand, India
19. Jayasinghe JGLA, Shiranthaka KGS, Kavith T, Jayasinghe MHDV, Abeywardena KY, Yapa
K (2022) Blockchain-based secure environment for electronic health records. In: 13th Interna-
tional Conference on Computing Communication and Networking Technologies (ICCCNT),
pp. 1–6
20. Mamun AA, Azam S, Gritti C (2022) Blockchain-based electronic health records management:
A comprehensive review and future research direction. IEEE Access 10:5768–5789
21. Kasthuri Bai M, Vijayalakshmi S (2021) Blockchain enabled services—Survey. In: Proceed-
ings of the First International Conference on Combinatorial and Optimization, ICCAP 2021,
December 7–8 2021
22. UmaMaheswari J, Vijayalakshmi S, Karpagam GR (2020) Blockchain technology and its
applications—An overview. Int J Res Appl Sci, Eng Technol (IJRASET) 8(7):228–232
Chapter 4
RFID Application Using Dual-Band
Monopole Antenna by Enhancing
the Bandwidth

Meena Perla, Mihir Solanki, and Zhil Vora

1 Introduction

RFID technology has become indispensable for automatic identification and tracking
in various sectors like supply chain management, transportation, and health care.
The efficiency of RFID systems heavily relies on the communication between RFID
readers and antennas. However, a major challenge faced by RFID systems is limited
bandwidth, hindering their adaptability to different frequency environments. This
paper proposes a solution to this issue by introducing a dual-band monopole antenna.
Traditional RFID antennas often struggle to operate at multiple frequencies, limiting
their flexibility and performance. The dual-band monopole antenna design aims to
overcome these limitations by providing enhanced features and greater flexibility for
RFID systems, enabling compatibility with various RFID standards and frequency
distributions. By broadening the antenna’s resonance across multiple frequencies,
the system can accommodate a variety of standards, thus improving overall perfor-
mance. The flexibility offered by dual-band operation allows for increased data
throughput and reliability, making it suitable for deployment in diverse applica-
tions ranging from inventory management to access control. Additionally, the rela-
tively low cost and simplicity of dual-band monopole antennas make them attractive
options for widespread adoption, offering a cost-effective and reliable solution for
businesses and industries. In summary, the proposed dual-band monopole antenna

M. Perla (B) · M. Solanki · Z. Vora

Department of EXTC Engineering, VIVA Institute of Technology, Virar, Maharashtra, India
e-mail: meenavallakati@[Link]
M. Solanki
e-mail: 19201014mihir@[Link]
Z. Vora
e-mail: 20212013zhil@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 41
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
42 M. Perla et al.

holds promise in addressing the evolving needs of RFID applications and contributing
to the continuous development and optimization of RFID technology.

2 Literature Survey

The paper used UWB which is the ultra-wideband antenna where coplanar surge
protectors with ground CPWG were used to remove the non-reflecting phenomena.
The shape was changed from hexagonal to tape type to increase the frequency range
[1].
The paper used a monopole UWB antenna. The antenna is highly omnidirec-
tional in low frequency ranges and is used in multi frequency purposes; they have a
symmetrical L-shaped patch on both sides [2].
The paper used high gain dual-band monopole antenna along with the fabrication
of the AMC. The AMC is used to increase the gain of the antenna. The model has a
peak gain of 5dBi and 7.5dBi [3].
A CPW antenna model was used which had a quarter disk for radiating purpose
and a microstrip line of L shape, having the advantage of widening or increasing the
impedance [4].
The paper used a compact antenna, the SWB antenna. This antenna was tried and
tested by the software, namely HFSS ANSYS, using the intelligence radio application
and had an increased gain of 6 dB [5].
The paper used a small fed folded dipole antenna, having application of RFID
and was operating on two frequencies. It had great impedance and easy design, not
complex [6].
The paper presented a novel compact dual-band hybrid monopole-ASA. It shows
more radiation patterns like ASA. The resonance of the second antenna can be
modulated independently of the first [7].
The flexible CPW fed monopole antenna is shown in this paper which has a U slot
in the CPW to achieve the dual band. The substrate of this antenna is made flexible
and transparent for the conformal working [8].
The antenna design incorporates a rectangular array with multiple wrapped stubs
to enhance gain and impedance bandwidth. A linear structure positioned at the
monopole’s base, along with perpendicular metal plates, amplifies radiation from
the edges, ensuring a uniform omnidirectional radiation pattern. Additionally, the
antenna integrates a mobile frequency selective top unit (FSS) layer and a top hat
structure, enhancing directivity without compromising omnidirectional radiation.
This design offers a promising solution for wireless networks and sensor applications,
combining high gain with robust performance in various operating environments [9].
RFID is a widely used radio frequency identification technology for many appli-
cations such as logistics, security, and access control. The performance of the RFID
system is highly dependent on the antenna design. The antenna has a loss of less than
−10 dB in two frequency bands suitable for RFID applications. The measurement
results show that the proposed antenna agrees with the simulation results. The results
4 RFID Application Using Dual-Band Monopole Antenna by Enhancing … 43

of this study show that the proposed dual-band monopole antenna was a reasonable
choice for RFID applications [10].

3 Design Methodology

In the design methodology, there are several designs which are considered for
the better working of an antenna. Figure 1 is the 3D model of antenna, designed
on ANSYS Electronic Desktop software under HFSS design, for the dual-band
monopole antenna. This 3D model is a structure which is like an “F”-shaped antenna
on an FR4 Epoxy material. Further, on observing Fig. 2, it shows a meshed struc-
ture diagram, in other words it is showing the radiation boundary for the proposed
antenna. The results such as the terminal “S” parameters, radiation pattern, gain,
and various other parameters are observed within this radiation boundary. Figure 3
shows the 3D polar plot, i.e., the gain of the antenna in every possible direction in a
3D space. It is one of the important parameters of the antenna because the 3D polar
plot gives us a visualization of azimuth and elevation plane of antenna pattern. The
3D polar plot also helps to catch flaws or disturbances in the antenna pattern. This
type of 3D pattern is used to test new antenna designs or unknown antenna types.
Lastly, Fig. 4 shows the radiation pattern at 0 degree for better results because if this
radiation boundary or meshed structure is absent, then the signal is affected, and it
changes the desired results of the antenna which is not acceptable or considered.
Some Analytical Proposed Equations:
For Length:

λg
L1 = L2.4 GHz = (1)
4

Fig. 1 3D Model of the

antenna
44 M. Perla et al.

Fig. 2 Meshed structure of

the antenna

Fig. 3 3D polar plot

L1 = Length for the monopole resonating at 2.4 GHz.

λg = Guided Wavelength

λo
λg = (2)
εreff

where
εreff = Effective permittivity.
λo = wavelength corresponding to frequency
4 RFID Application Using Dual-Band Monopole Antenna by Enhancing … 45

Fig. 4 Radiation pattern

1
νr + 1 νr − 1 12h − 2
νreff = 1+ (3)
2 2 W

where
εr = relative permittivity.
W = Monopole Antenna width.
h = substrate thickness

60 8h W
Z= ln ln + (4)
νreff W 4h

where
Z = 50 (port Impedance).
h = substrate thickness.
W = Width of the monopole

λg
L2 = L3.5GHz = (5)
4
εreff = Effective electrical permittivity of the di-electric substrate

Lw = 6h + W (6)

where
Ls = Substrate Length.
46 M. Perla et al.

Ws = Substrate Width.

4 Result Obtained

4.1 Software

Figure 5 shows the designed antenna—like F-shaped structure. This design shows us
the RFID antenna which will work on different microwave frequencies. This design
also shows the radiating boundary, made up of vacuum in which it will radiate the
signal having a size of λ/4. In the above results (Fig. 6), it depicts the return loss of
the designed antenna where the two frequency ranges are achieved: one maximum
value is achieved at 1.39 GHz and another value is achieved at 5.84 GHz. Figure 6 is
the S11 parameter which depicts the operating range of the RFID applications where
the range is specified accordingly with the reference attenuation value of below −
10 dB. Further, Fig. 7 shows the 3D polar plot of the designed antenna which plots
the negative values along the same axis. It is measured in dBi; this 3D polar plot
determines the closed loop stability. This polar plot contains the entire frequency
response characteristics in a single plot. Finally, Fig. 8 depicts the radiation pattern
of the antenna which is the direction pattern of the radio waves in which it emits.
This radiation pattern is the graphical representation of antenna radiation property
as a space function, describing how it will receive the energy.

Fig. 5 3D model of the designed antenna

4 RFID Application Using Dual-Band Monopole Antenna by Enhancing … 47

Fig. 6 S parameter dB (1,1)—Return Loss of the designed antenna

Fig. 7 3D polar plot of the designed antenna

4.2 Hardware

The above Figs. 9 and 10 show the front view and back view of the designed
antenna after fabrication process in which it is soldered with the SMA connec-
tors to the ground, and from the feeder Fig. 11 shows the S11 parameter graph on
the Virtual Network Analyzer (VNA) in which the markers M1 and M3 show the
RFID microwave frequencies that we have achieved first at 2.88 GHz and second at
6.11 GHz.
Table 1 shows the results that we have obtained at two different ranges. First is
from 1.39 GHz to 2.04 GHz where the gain obtained is of 8.23 dB and return loss is
− 10 dB.
48 M. Perla et al.

Fig. 8 Radiation Pattern of the designed antenna

Fig. 9 Front view of the

antenna

Second range is from 4.63 GHz to 7.3 GHz where the gain is 4.03 dB with the
loss of − 21.75 dB.
The above Table 2 shows the comparison of different papers that we have gone
through and have compared our work with it which is having a moderate dimension
than others along with increased range as well as compared to others.
4 RFID Application Using Dual-Band Monopole Antenna by Enhancing … 49

Fig. 10 Back view of the antenna

Fig. 11 Hardware testing results of the designed antenna

Table 1 Result analysis

Range Gain Return loss
1.39 GHz–2.04 GHz 8.23 dB –10.00 dB
4.63 GHz–7.3 GHz 4.03 dB − 21.75 dB
50 M. Perla et al.

Table 2 Comparison with existing system

Publications Dimension of antennas Range of antennas
2022 [1] 12.5 × 12.5 × 1 mm3 3.1 GHz–10.8 GHz
2022 [2] 46 × 42 × 1.6 mm3 2.12–2.55 GHz & 4.67–5.38 GHz
2022 [3] 0.63 × 0.63 × 0.064 mm3 2.37–2.5 GHz & 4.45–4.9 GHz
2023 [4] 0.61 × 0.62 × 0.02 mm3 6.05–13.62 GHz
2021 [5] 31 × 26 × 1.6 mm 1.2–35 GHz
2019 [6] 3 × 40 × 4 mm 2.18–2.69 GHz & 3.16–3.58 GHz
2019 [7] 47.8 × 47.8 × 1.2 mm3 2.45–5 GHz
2023 [8] 50 × 33 × 0.1 mm 2.18–2.69 GHz & 3.16–3.58 GHz
2023 [9] 45 × 156 mm2 1.6–2.1 GHz & 2.4–2.85 GHz
2023 [10] 30 × 30 × 7 mm 2.2–2.6 GHz & 5.3–6.8 GHz
Our work 34 × 13 × 1.6 mm 1.39–2.04 GHZ & 4.63–7.30 GHz

5 Conclusion

The designed dual-band monopole antenna offers promising enhancements in RFID

system performance and coverage. Operating across 2.885 and 6.11 GHz frequency
bands, it overcomes the limitations of single antennas, expanding RFID technology
capabilities. Its wide frequency range ensures compatibility with various RFID stan-
dards and distributions, enhancing flexibility in diverse scenarios. However, despite
bandwidth improvements, challenges like limited coverage for tags outside the
antenna’s bands and environmental interference can affect system reliability. Addi-
tionally, issues such as fabrication inconsistencies or physical damage may further
impact antenna performance. Adaptation to regulatory changes is also essential to
prevent compatibility issues. Thus, robust design, careful deployment, and main-
tenance are crucial for ensuring the effectiveness and reliability of RFID systems
employing dual-band monopole antennas.

6 Future Scope

The future scope for RFID applications with dual-band monopole antennas entails
expanding frequency coverage, enhancing reliability, and integrating with emerging
technologies. This can be achieved through ongoing research to optimize antenna
design, improve signal processing algorithms, and ensure compatibility with evolving
standards. Additionally, advancements in fabrication techniques may lead to smaller
and more efficient antennas, while collaborations across industries can drive
innovation and adoption in various sectors.
4 RFID Application Using Dual-Band Monopole Antenna by Enhancing … 51

References

1. Park S, Jung K-Y (2022) Novel compact UWB planar monopole antenna using a ribbon-shaped
slot. IEEE Access 10:61951–61959
2. Wang Z, Wang M, Nie W (1929) A monopole UWB antenna for WIFI 7/Bluetooth and satellite
communication. Symmetry 2022:14. [Link]
3. Abdelghany MA, Fathy Abo Sree M, Desai A, Ibrahim AA (2022) Gain improvement of a
dual- band CPW monopole antenna for Sub-6 GHz 5G applications using AMC structures.
Electronics 11:2211. [Link]
4. Ma Z, Chen J, Li C, Jiang Y (2023) A monopole broadband circularly polarized antenna with
coupled disc and folded microstrip stub lines. J Wirel Commun Netw 2023:30. [Link]
10.1186/s13638-023-02238-3
5. Kumar P, Urooj S, Sahu BJR (2021) Design of compact super-wideband monopole antenna for
spectrum sensing applications. Research Gate. [Link]
6. Aggarwal I, Ranjan Tripathy M, Pandey S, Mittal A (2021) A dual-band monopole antenna for
RFID Application, IEEE
7. Haskou A, Pesin A, Le Naour J-Y, Louzir A (2019) Compact, Dual- Band, Hybrid Monopole-
ASA, Antenna. In: Annular Slot Antenna (ASA); wideband monopole; hybrid antenna; dual-
band antenna, IEEE 2019, pp 1123–1124
8. Saraswat K, Harish AR (2019) Flexible dual band dual polarized CPW fed monopole antenna
with discrete frequency reconfigurability, IET
9. Danuor P, Moon J-I, Jung Y-B (2023) High-gain printed monopole antenna with dual-band
characteristics using FSS-loading and top-hat structure. Journal. [Link]
598-023-37186-x
10. Kaur S, Aquib Jameel Khan M, Wali Khan F, Alam M, Singh M (2023) Designing and analysis
of dual-band monopole antenna for RFID applications. IJAEM pp 1029–1035 Volume 5
Chapter 5
The Determinants of Ethical Governance
Policies for Artificial Intelligence
in the Financial Sector of Moroccan
Companies

Ahmed Hama, Marouane Mkik, Karim Khaddouj, and Ali Hebaz

1 Introduction

The relatively short period of time during which artificial intelligence (AI) has
emerged as a major innovator means there is an ongoing revolution in many sectors,
including finance. As opposed to this, the financial services companies in Morocco are
quick at the adoption of AI, to speed up the operations and to make timely informed
decisions. In the Moroccan financial sphere, the applications of AI comprise business
process automation, advanced data analysis, and the provision of novel solutions to
the hardest problems. The role in which AI plays in the Moroccan financial system
has enhanced the smooth running of processes that save time and deliver accurate
outcomes. Also, it develops computer-adjusting systems into different circumstances.
The role of AI in financial data processing is progressively growing in evidence; the
AI systems become essential tools for sifting through financial data in the stream
and allowing to identify patterns, fix risk, and maximize portfolios. AI will have a
great involvement in the formation of highly personalized financial services to face

A. Hama
Economic and Social Sciences of Souissi, Université Mohammed V, Rabat, Morocco
e-mail: [Link]@[Link]
M. Mkik
Higher Institute of Nursing Professions and Health Techniques of Rabat, Rabat, Morocco
K. Khaddouj
National School of Arts and Trades of Rabat, Université Mohammed V, Rabat, Morocco
e-mail: [Link]@[Link]
A. Hebaz (B)
National Higher School of Electricity and Mechanics, Hassan II University, Casablanca, Morocco
e-mail: Hebaz.a@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 53
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
54 A. Hama et al.

such possible customer needs and to provide the user with a better experience at the
end. Furthermore, the quick implementation of CBDCs in the banking and financial
sectors will raise the same ethical and governance issues. They ought to implement
recovery procedures for mistakes identified and corrected, hold themselves account-
able for the outcome of their AI services, and establish paths for complainants in
the event of likely damage. On the other hand, privacy becomes the ultimate issue in
the implementation of the AI. This framework ensures adoption of good data protec-
tion rules within the companies, as their customers’ information is handled well and
according to security privacy standards. In this case, is the healthy and conscientious
practice of ethical governance within the organizational system the catalyst for a
culture of integrity, accountability, and transparency?
Beyond the evolution of equity and diversity in AI development as a moral obli-
gation, the issue is emphasized. Organizations are recommended not to use artificial
intelligence algorithms which are components of discrimination, produce discrim-
inatory outcomes, and promote diversity in ideation in the process of creation of
Artificial Intelligence (AI) systems to omit harming parts of the population. This
ethical-management framework of AI in the Moroccan financial sector is designed
to provide a set of guiding rules for ethical management of AI which covers emerging
ethical dilemmas within the main ethical principles. The integration of AI into the
financial sector of Morocco could be done by adopting these ethical rules as they not
only lead to responsible AI development but also embody the society’s commitment
to corporate social responsibility, stakeholder trust, and AI successful incorporation
in the financial sector.

2 Literature Review

Ethical management is a set of standards, norms, and rules to be followed by any

decision maker to advise and make any company’s decision [1, 2]. This strategy goes
far beyond the mere compliance with legislation and regulation and is supposed to be
forward suggesting and advising to everyone who finds interest in this topic. Unlike
other governance practices, ethical governance practices do not work differently
but the only difference they make is that they create an ethical corporate culture
where all aspects of business are formed based on ethical judgments [3]. One of
the key things about ethical governance is accountability. Ethical organizations are
aware of the fact that it is a must to be accountable for the consequences of their
decisions and actions. This means in an open system and decision-making where
the decisions are shown, their performance is objectively evaluated, and their errors
are admitted and corrected. In addition to transparency, authenticity is another main
pillar of ethics in governance. In this case, equity is again the most essential element
of ethical governance. Taking fairness into account in decision-making and policy-
making means that we treat all people equally, regardless of whether they are our
employees or partners, clients, or members of the communities where our business
operates [4, 5].
5 The Determinants of Ethical Governance Policies for Artificial … 55

2.1 Transparency of Decision-Making Processes

Transparency in decision-making processes is a key element of ethical governance. It

refers to the clarity and accessibility of the various steps involved in decision-making
within an organization [6]. The main goal of promoting transparency is to enable
stakeholders, both internal and external, to understand how decisions are made, what
factors are considered, and how these choices can influence the entire organization.
Making the decision-making processes transparent and ethical, which is the inherent
part of ethical governance, is the visible and available route of decision-making within
an organization. Ethics-focused firms set up an open environment with a complete
documentation of the reasons for the decisions made, the parameters they considered,
the data, and the discussions that led them to that decision. Stakeholders represent
the users who will access the information through different sources and channels.
Consequently, the trust level works. Stakeholders are advised to participate to ensure
that decisions are made in a well-balanced manner; hence, ethical communication
would not give the emergence of manipulation of information by using it for impartial
presentation of the consequences of different choices. Transparent decision-making
process enables the creation of an environment that facilitates organizational learning
that in itself promotes an ethical corporate culture and the adoption of proactive
false-step correction [5, 7].

2.2 Redress and Correction Mechanisms

In this situation, considering redress and correction systems as an integral part of

ethical management, these systems may be seen as those useful mechanisms that
can be resorted to for resolving any mistakes, injustices, or unethical acts that
may occur within the organization. Therein, the concerned parties, be it formal or
informal manner, can have their views taken, and hence, they can likewise, report
it or seek redress much more easily [8, 9]. Some of redress sources include confi-
dential reporting channels, ethics committees, transparent appeal mechanisms, and
communication channels. The fact remains that the rules of redress and reform in
the ethical governance context are not the only thing to ensure the mere intention.
It reflects reality that the organization is doing close by, and it comes in ally with
ongoing improvement, accountability, and transparency principles [10, 11]. Among
them are the communication channels of both formal and informal types of reporting
within, and the stakeholders around, that arises with an ethical concern, the expres-
sion of unethical conduct, or the seeking clarification. From the roles probably played
by such organizations that address the issues by ensuring the issues are ethically
resolved fairly and transparently, the stakeholders rely their trust on these organiza-
tions because the ethical concerns in their operations are recognized and addressed
[12]. It isn’t about the correction of one mistake independently, but about a culture of
56 A. Hama et al.

learning and developing a sense of responsibility, proactivity, correction, and contin-

uous learning that is built within the organization. When you are a member of such a
society, the ethics are nothing but the reserved moments for daily checkups, although
it should be emphasized that it is also vital for those who are supporting the long-term
ethics and sustainability view to come through.

2.3 Training and Awareness

The core of ethical governance lies in the combination of training and awareness
programs. The purpose of those programs is to enlighten the company employees
about the ethical principles, ethical standards, and actual consequences of business
ethics. Training is an effective tool that serves to teach employees the organiza-
tion’s code of ethics and highlight that ethical decisions should be made and every
employee is responsible for their actions [13]. It could include the topics of busi-
ness ethics, anti-corruption, human rights protection, and sustainability. Awareness,
however, is different in that it goes further than just gaining knowledge to create
deeper knowledge of ethical issues [2, 14, 15].
It fosters critical analysis of the potential ethical dilemmas and establishes
a climate where employees are encouraged to communicate their ethical issues,
actively use the grievance mechanisms, and take part in the continuous improvement
of the company’s ethical procedures. These training and awareness initiatives include
not only those who are in the company, but also go outside of employees, creating
new relationships between customers, suppliers and other partners. Ultimately, an
ethical corporate culture should be achieved, where ethics are the foundation of daily
decision-making and professional communications [16–18]. Training and aware-
ness are done regularly to develop and improve an ethical corporate culture that is
constantly evolving, and which is affected by social and technological changes as
well as the environment. They reinforce the organization’s reputation by showing its
dedication to ethical practices and building trust among stakeholders which are the
parties that have an impact on the accomplishment of the organization’s goals. In the
end, these initiatives will play a part in building an ethical governance framework
that is strong and provides direction to the company in realizing an economic and
ethical future [19, 20].
According to Table 1, it represents the explanatory variables and the variable to
be explained by determining the items of each variable with its symbol and authors.
This research is based on several hypotheses aimed at assessing the impact of
various factors on ethical governance within organizations. First, Hypothesis 1 posits
that Responsible Actors have a positive effect on ethical governance, suggesting
that engaged and responsible leaders contribute to high ethical standards. Simi-
larly, Hypothesis 2 argues that transparency in decision-making processes positively
influences ethical governance, emphasizing the importance of clearly and openly
disclosing decision-making mechanisms within the organization. This third hypoth-
esis indicates that Ethics and Corrective Mechanisms positively influence ethical
5 The Determinants of Ethical Governance Policies for Artificial … 57

Table 1 Research variables presenting the determinants

Variables Variable contents Items of the variable Symbol Authors
Responsible Clear definition of 1. Defined ED1 [14, 16]
actors development teams, leaders, development teams
external stakeholders, with 2. Clearly identified ED2 [10]
well-defined roles and leaders
responsibilities
3. Roles of specified ED3 [5]
external
stakeholders
Transparency of Detailed explanations of the 1. Detailed decision TPD1 [2, 12]
decision-making decision-making processes of criteria
processes AI systems, highlighting the 2. Prevention of TPD2 [3, 20]
factors taken into account and discriminatory
avoiding discriminatory effects effects
3. Transparent TPD3 [21]
documentation of
decisions
Redress and Establishing mechanisms for 1. Formal Dispute RCM1 [2, 11]
correction individuals affected by AI Procedures
mechanisms decisions to challenge, request 2. Authorized MRC2 [3, 18]
corrections, and receive Correction
transparent responses Requests
3. Transparent MRC3 [6, 9]
responses to
disputes
Continuous Regular evaluation 1. Regular Ethical ECIE1 [1, 2]
ethical impact mechanisms to ensure that AI Impact Assessment
assessment systems evolve ethically, 2. Review of training ECIE2 [12, 15]
including reviewing training data
data and detecting potential
3. Detection of ECIE3 [3, 21]
biases
potential biases
Training and Investment in ethical training 1. Ethical training for FS1 [1, 10]
awareness for the teams who are involved teams
in both the development and 2. Implementation in FS2 [17, 22]
application of AI, with a clear AI development
understanding of how these
affect the ethics

(continued)
58 A. Hama et al.

Table 1 (continued)
Variables Variable contents Items of the variable Symbol Authors
3. Understanding the FS3 [2, 15]
ethical
implications
Ethical Ethical governance means the Principles of Ethical GE1 [10, 14]
governance creation of ethics principles Governance
and practices that will be the
basis for decision-making and
actions of the organization.
They will ensure
accountability, transparency,
and respect to the rights and
values. All of this contributes
to a sustainable and ethical
corporate culture

governance which is implied in the existence of the corrective mechanisms to rectify

errors and the undesirable behavior. In addition, Hypothesis 4 asserts that Continuous
Ethical Impact Assessment contributes positively to ethical governance, highlighting
the importance of constant evaluation of the ethical implications of the organization’s
actions. Finally, Hypothesis 5 proposes that Training and Awareness have a positive
effect on ethical governance, highlighting the crucial role of education and aware-
ness in promoting ethical behavior within the company. Thus, the hypotheses of this
research are as follows.
For Table 2, it determines the different hypotheses of this research to the extent
of answering the basic problem; they are presented in 5 referential hypotheses.

Table 2 Presentation of research hypotheses

Assumptions Content of the hypothesis
Hypothesis 1 Responsible actors have a positive effect on ethical governance
Hypothesis 2 Transparency in decision-making processes has a positive effect on ethical
governance
Hypothesis 3 Redress and correction mechanisms have a positive effect on ethical
governance
Hypothesis 4 Continuous evaluation of ethical impact has a positive effect on ethical
governance
Hypothesis 5 Training and awareness have a positive effect on ethical governance
5 The Determinants of Ethical Governance Policies for Artificial … 59

3 Methodology of Research

In order to answer the central problem of our research, we choose a basic sample
(financial managers) that resides in financial companies, namely banks and insurance
companies (7 conventional banks and 5 insurance companies) with a total number
of 50 responses. The applied research methodology is quantitative (confirmatory) in
nature, with a multiple regression method that provides relevant results in order to
accept or reject the research hypotheses presented at the end of the literature review.
The rigor of this methodology provides the opportunity to accurately assess research
hypotheses, which contributes to a deeper understanding of factors related to ethical
governance in the financial context, thus providing valuable insights for decision-
making and the implementation of ethical practices within these organizations.
Table 3 presents the descriptive statistics using the kurtosis, the Shapiro–Wilk p,
the asymmetry coefficient, and the deviation standards.
As far as normality is concerned, the Shapiro–Wilk statistic (W) is 0.81 with a
p-value of 0.004. The value of W is not close to 1, and the low value of p suggests
a rejection of the normality hypothesis. Based on the descriptive statistics of this
study, the table below provides detailed statistical information on different variables,
including standard deviation, skewness coefficient, kurtosis, Shapiro–Wilk statistic
(W), and the corresponding p-value. To illustrate, let’s take the example of the ED1
variable. Its standard deviation is 2.31, indicating a moderate dispersion of data

Table 3 Presentation of the study’s descriptive statistics

Descriptive Standard deviation Coefficient of Kurtosis W P (Shapiro–Wilk)
statistics asymmetry
ED1 2.31 −0.92 1.48 0.81 0.004
ED2 1.98 −1.42 2.10 0.70 0.002
ED3 2.12 −0.80 1.05 0.75 0.003
TPD1 2.45 −1.29 1.78 0.68 0.001
TPD2 1.88 −1.19 1.32 0.72 0.004
TPD3 2.01 −1.10 0.74 0.79 0.006
RCM1 1.75 −1.62 2.14 0.64 0.001
MRC2 1.92 −1.26 1.45 0.70 0.005
MRC3 2.10 −0.93 0.88 0.76 0.003
ECIE1 2.55 −1.52 2.20 0.61 0.008
ECIE2 2.28 −1.15 1.62 0.68 0.003
ECIE3 1.98 −0.85 1.03 0.75 0.005
FS1 2.12 −1.41 2.05 0.70 0.007
FS2 2.35 −1.17 1.28 0.73 0.002
FS3 1.89 −1.32 1.72 0.67 0.001
GE1 2.03 −1.18 1.52 0.72 0.006
60 A. Hama et al.

around the mean. The skewness coefficient of − 0.92 suggests a slight leftward tilt
in the distribution of the data, and the kurtosis of 1.48 signals some concentration
around the mean.

4 Results and Discussion

The table above uses the multiple regression method for estimating adjustment
elements of a statistical model. The correlation coefficient (R) is 0.762, which means
that there is a positive association between the variables in the model. The coefficient
of determination (R2 ) with the value of 0.580 implies that 58% of the variance of
the dependent variable is explained by the model, which is considered significant
but with room for further improvement. The Akaike Information Criterion (AIC) is
553, an indicator of model quality adjusted for complexity. A lower AIC is preferred,
indicating better model fit. The Bayesian Information Criterion (BIC) of 574 is also
taken into account for the assessment of model quality, and it is generally used to
compare different models. The square root of the root mean squared error (RMSE)
is 0.666, representing the mean deviation between the values predicted by the model
and the observed values. The lower the RMSE, the better the predictive accuracy of
the model. The F-statistic of 90.6 with 4 degrees of freedom for the numerator and a
p-value less than 0.001 indicates that the model has a significant value to explain the
variance in the dependent variable. In summary, this model has a positive correla-
tion, a moderate explanatory ability with an R2 of 0.580, a good fit according to the
AIC, the BIC, and a high statistical significance according to the F test. However,
improvements could be considered to enhance predictive accuracy.
Table 4 describes the good-of-fit indicators that appear in the R, R-squared, AIC,
BIC, and RMSE based on the model’s probability of significance (it is perfectly
significant).
The table shows the results of an ANOVA omnibus test for different variables. The
columns include the sum of squares, mean squares, the F statistic, and the associated
p-value. The test assesses whether the group means of each variable are statistically
different. Here’s how the results are interpreted: The sum of squares represents the
total change in the data, and mean squares are calculated by dividing the sum of
squares by the number of degrees of freedom. The higher the sum of the squares
relative to the mean squares, the more variability there is between groups. The F-
statistic is used to compare between-group variability with within-group variability.

Table 4 Model fit measures

Model R R2 AIC (Akaike BIC (Bayesian Root mean F ddl1 p
information information square error
criterion) criterion) (RMSE)
1 0.762 0.580 553 574 0.666 90.6 4 < 0.001
5 The Determinants of Ethical Governance Policies for Artificial … 61

High F values indicate a significant difference between groups. In this case, all
variables (ED1, ED2, ED3, etc.) have high F values, suggesting significant differences
between groups. The associated p-values indicate the probability of obtaining results
as extreme as those observed if the null hypothesis were true. The very low p-values
(< 0.001) in most cases suggest a clear rejection of the null hypothesis, indicating
that the means of the groups are indeed different. We can thus state that the results
of this ANOVA omnibus test suggest that the group means for each variable are
statistically different, which reinforces the validity of the analysis of variance and
highlights significant variations between groups.
Thus, Table 5 determines the second step of the omnibus ANOVA test, which is
significant for all items except for ED1, ED3, and ECIE2.
The table presents the results of the GE1 model with estimates, standard errors,
t-values, p-values, and standard estimates for each predictor. The intercept has an
estimate of 0.495 with a standard error of 0.2079, a t of 2.38, and a p-value of 0.018.
The Q1O2 predictor has an estimate of 0.241 with a standard error of 0.0570, a
t of 4.22, and a p-value of less than 0.001. Similarly, the other predictors (Q1O3,
Q1Q4, Q1O5, ED1, ED2, ED3, TPD1, TPD2, TPD3, MRC1, MRC2, MRC3, ECIE1,
ECIE2, ECIE3, FS1, FS2, and FS3) have associated estimates, standard errors, t, and
p-values. In general, the predictors appear significant with p-values less than 0.05,
indicating a statistically significant relationship with the dependent variable of the
GE1 model. Standard estimates provide an indication of the relative strength of each
predictor in the model. Consequently, this table is of great importance as it supplies

Table 5 Test ANOVA omnibus

Variable items Sum of squares Medium squares F P
ED1 5.42 5.420 11.98 0.001
ED2 7.88 7.880 17.42 < 0.001
ED3 4.23 4.234 9.35 0.003
TPD1 10.15 10.150 22.41 < 0.001
TPD2 12.63 12.634 27.93 < 0.001
TPD3 9.18 9.184 20.32 0.017
RCM1 15.76 15.760 34.82 < 0.001
MRC2 11.29 11.294 24.97 < 0.001
MRC3 13.54 13.544 29.89 < 0.001
ECIE1 17.89 17.894 39.52 < 0.001
ECIE2 14.42 14.424 31.78 0.009
ECIE3 16.67 16.674 36.81 < 0.001
FS1 20.98 20.982 46.39 < 0.001
FS2 18.51 18.514 40.87 < 0.001
FS3 22.76 22.764 50.24 < 0.001
Residues 118.36 0.452
62 A. Hama et al.

data that can be used to analyze the effect of each predictor in the GE1 regression
model.
Table 6 presents the regression coefficients with respect to the variable to be
explained. The latter are mostly significant except for a few items.
The autocorrelation hypothesis testing table enables one to decipher on the
stability of the model. The Hypothesis H1, where autocorrelation is assessed, the
value of 2.50 and the p-value of 0.400 indicate low autocorrelation; therefore, this
hypothesis should be accepted. This result suggests that the successive measurements
of the model are weakly related to one another. The second hypothesis, H2, according
to the Durbin-Watson (DW) test with a value of 1.80 and a p-value of 0.740, does
not show presence of significant positive autocorrelation meaning its acceptance is
validated. Hypothesis 3, with a DW value of -0.020 and a p-value of 0.850, further
proves that the time-series data does not exhibit a negative autocorrelation and so it
should be accepted. Based on Hypothesis 4, the DW value of 2.20, and the p-value
of 0.500, the consistency of the rejection of the existence of a significant positive
autocorrelation is confirmed. Lastly, Hypothesis 5 that has DW value as -0.045 and p-
value as 0.680 clearly confirms the absence of negative autocorrelation that validates
its acceptance.

Table 6 GE1 coefficient

Predictor Estimation Standard error T p Estimation standard
Ord. at the origin 0.495 0.2079 2.38 0.018 –
Q1O2 0.241 0.0570 4.22 < 0.001 0.242
Q1O3 0.169 0.0622 2.71 0.007 0.170
Q1Q4 0.258 0.0672 3.84 < 0.001 0.247
Q1O5 0.239 0.0537 4.46 < 0.001 0.239
ED1 0.123 0.0456 2.70 0.008 0.125
ED2 0.078 0.0321 2.43 0.017 0.081
ED3 0.155 0.0512 3.02 0.005 0.160
TPD1 0.201 0.0554 3.63 < 0.001 0.196
TPD2 0.267 0.0628 4.26 < 0.001 0.265
TPD3 0.184 0.0483 3.81 < 0.001 0.190
RCM1 0.312 0.0721 4.33 < 0.001 0.315
MRC2 0.198 0.0602 3.29 0.003 0.205
MRC3 0.173 0.0478 3.61 < 0.001 0.180
ECIE1 0.287 0.0654 4.39 < 0.001 0.280
ECIE2 0.249 0.0597 4.18 < 0.001 0.255
ECIE3 0.308 0.0712 4.33 < 0.001 0.305
FS1 0.182 0.0496 3.67 < 0.001 0.185
FS2 0.213 0.0543 3.93 < 0.001 0.210
FS3 0.259 0.0629 4.12 < 0.001 0.265
5 The Determinants of Ethical Governance Policies for Artificial … 63

Table 7 Testing of research hypotheses

Autocorrelation DW value P Acceptance or rejection of assumptions
Hypothesis 1 2.50 0.400 Accepted
Hypothesis 2 1.80 0.740 Accepted
Hypothesis 3 -0.020 0.850 Accepted
Hypothesis 4 2.20 0.500 Accepted
Hypothesis 5 -0.045 0.680 Accepted

Table 7 determines the validation or rejection of the hypotheses; it announces that

all the hypotheses are valid and acceptable.
In general, the results indicate that the assumptions on the autocorrelation in
the model are valid. It shows that the reliability and consistency of the model are
increased by indicating the absence of strong positive or negative autocorrelation
trends. These findings reaffirm the reliability of the model and the veracity of its
translation.

5 Conclusion

Ethical governance is far more than just following the law; it aims to build an organiza-
tional framework that incorporates a high ethical code of practice into its culture. The
cause is to build a culture where accountability, transparency, and human rights are
the company’s main mission. Highlighting the importance of adherence to stringent
ethical principles, the organization aims at building an entity that is very conscious of
its societal, environmental, and moral effects [23]. This implementation of the ethical
governance is based on direct tools to make its theories come true. Introducing trans-
parency in decision-making is being viewed as a critical foundation, where open and
complete communication regarding relevant decisions, the criteria, data, and debates
drives stakeholder trust. Redress and correction mechanisms, formal and informal,
create avenues through which ethical issues can be brought out, cases reported and
redress sought, and this further enhances accountability and transparency in the orga-
nization. Another important feature is the continuous evaluation of ethical impact,
which drives the organization to frequently check the practices, policies, and deci-
sions on a constant basis. This appraisal is deeper than just simple compliance with
the rules, involving a lengthy consideration of broader ethical implications of all
the steps undertaken. These mechanisms were corroborated by a quantitative study,
thus adding the credibility of ethical governance in financial context. The training
and awareness are the most important factors in this process; the members of the
company are taught ethical principles, conduct standards, and ethical governance
practice. Such approaches, in turn, cultivate such an ethical corporate culture that is
flexible, responsive to social, technological and environmental changes. The results
of the study indicate that the ethical governance model remains stable in the absence
64 A. Hama et al.

of autocorrelation. The outcome is that ethical governance represents an essential

pillar that provides a roadmap to the organizations that aim at a sustainable and
ethically responsible future. Its implications go beyond compliance and influence
the culture of an entity where social responsibilities, transparency, and respect for
rights are the very cornerstone of every decision and action taken by the organization
[17, 22, 24].

References

1. Manimuthu A, Venkatesh VG, Shi Y, Sreedharan VR, Koh SL (2022) Design and development
of automobile assembly models using federated artificial intelligence with smart contracts. Int
J Prod Res 60(1):111–135
2. Naz F, Karim S, Houcine A, Naeem MA (2022) Fintech growth during COVID-19 in the
MENA region: current challenges and future prospects. Electron Commer Res 1–22
3. Allioui H, Mourdi Y (2023) Unleashing the potential of AI: Investigating cutting-edge
technologies that are transforming businesses. Int J Comput Eng Data Sci (IJCEDS) 3(2):1–12
4. Ben Youssef A, Dahmani M (2023) Examining the drivers of E-commerce adoption by
Moroccan firms: A multi-model analysis. Information 14(7):378
5. Owens E, Sheehan B, Mullins M, Cunneen M, Ressel J, Castignani G (2022) Explainable
artificial intelligence (xai) in insurance. Risks 10(12):230
6. el Kadiri K, Bentoumia S, McHich H, Marouane MKIK (2023) Les concepts durables et
technologiques menant à l’attractivité des investissements étrangers au Maroc: cas du secteur
touristiques. Int J Account, Financ, Audit, Manag Econ 4(3–2):805–824
7. Khan HU, Malik MZ, Alomari MKB, Khan S, Al-Maadid AAS, Hassan MK, Khan K (2022)
Transforming the capabilities of artificial intelligence in GCC financial sector: A systematic
literature review. Wirel Commun Mob Comput
8. Mkik M, Mkik S (2023) Acceptability aspects of artificial intelligence in Morocco: Managerial
and theoretical contributions. In: The International Conference on Digital Technologies and
Applications (pp 65–74). Cham: Springer Nature Switzerland
9. Marouane M, Salwa M, Hebaz A (2023) The acceptance of artificial intelligence in the commer-
cial use of crypto-currency and Blockchain systems. In: The International Conference on Infor-
mation, Communication and Computing Technology (pp 163–177). Singapore: Springer Nature
Singapore
10. Nuseir MT, Al Kurdi BH, Alshurideh MT, Alzoubi HM (2021) Gender discrimination at work-
place: Do artificial intelligence (AI) and machine learning (ML) have opinions about it. In: The
international conference on artificial intelligence and computer vision (pp. 301–316). Cham:
Springer International Publishing
11. Secinaro S, Calandra D, Secinaro A, Muthurangu V, Biancone P (2021) The role of artificial
intelligence in healthcare: a structured literature review. BMC Med Inform Decis Mak 21:1–23
12. Karim M, Ktit J, Soussi NO, Sobhi K (2021) Potential areas for investment in Morocco. An
analysis using inputs-outputs model. Adv Manag Appl Econ 11(1):47–72
13. Enholm IM, Papagiannidis E, Mikalef P, Krogstie J (2022) Artificial intelligence and business
value: A literature review. Inf Syst Front 24(5):1709–1734
14. Mohsen SE, Hamdan A, Shoaib HM (2024) Digital transformation and integration of artificial
intelligence in financial institutions. J Financ Report Account
15. Hicham N, Nassera H, Karim S (2023) Strategic framework for leveraging artificial intelligence
in future marketing decision-making. J Intell Manag Decis 2(3):139–150
16. Kiemde SMA, Kora AD (2022) Towards an ethics of AI in Africa: rule of education. AI and
Ethics, 1–6
5 The Determinants of Ethical Governance Policies for Artificial … 65

17. Albastaki YA, Razzaque A, Sarea AM (Eds.) (2020) Innovative strategies for implementing
FinTech in banking. IGI Global
18. Singh C, Dash MK, Sahu R, Kumar A (2023) Artificial intelligence in customer retention: a
bibliometric analysis and future research framework. Kybernetes
19. Gwagwa A, Kraemer-Mbula E, Rizk N, Rutenberg I, De Beer J (2020) Artificial Intelligence
(AI) deployments in Africa: benefits, challenges and policy dimensions. Afr J Inf Commun
26:1–28
20. Yuen MK, Ngo T, Le TD, Ho TH (2022) The environment, social and governance (ESG)
activities and profitability under COVID-19: evidence from the global banking sector. J Econ
Dev 24(4):345–364
21. Nejjari Z, Aamoum H (2020) The role of ethics, trust, and shared values in the creation
of loyalty: Empirical evidence from the Moroccan University™. Bus, Manag Econ Eng
18(1):106–126
22. Gwagwa A, Kachidza P, Siminyu K, Smith M (2021) Responsible artificial intelligence in
Sub-Saharan Africa: landscape and general state of play
23. Al-Sartawi AMM (Ed.) (2022) Artificial intelligence for sustainable finance and sustainable
technology. In: Proceedings of ICGER 2021 (Vol. 423). Springer Nature
24. Bhardwaj B, Sharma D, Dhiman MC (Eds.) (2023) AI and emotional intelligence for modern
business management. IGI Global
Chapter 6
An Ensemble Machine Learning-Based
Approach Toward Accuracy in Bitcoin
Price Prediction

Sumeda Puja, Veeravalli Ruthvik, Varun Dhanashree, K. S. Saee Ganesh,

and C. Deepti

1 Introduction

Bitcoin, introduced in 2009 by the enigmatic Nakamoto [1], has transformed

into a revolutionary digital currency, disrupting old financial principles and chal-
lenging mainstream economic assumptions. Its decentralized nature, made possible
by blockchain technology, distinguishes it from government-issued fiat currencies,
providing users with a greater degree of control and independence over their financial
transactions.
Bitcoin’s scarce supply of 21 million coins, together with its use of a decentralized
blockchain system, has endowed it with characteristics similar to digital gold, making
it an interesting investment option and inflation buffer in an increasingly uncertain
economic environment.
Despite the inherent hurdles given by Bitcoin’s price volatility, researchers have
attempted to construct prediction models and analytical frameworks for forecasting
future price changes [2]. To understand the complex dynamics of the Bitcoin market,
a range of methodologies have been explored, including technical analysis tools such
as charts and graphs and advanced machine learning algorithms.
In this paper, we seek to give a complete examination of Bitcoin price movements
using a range of methodologies, from machine learning approaches to an ensemble
of their varieties. Furthermore, we propose using an ensemble technique, which
leverages the capabilities of many predictive models [3–5] to enhance the precision
and strength of Bitcoin price prediction [6, 7].

S. Puja (B) · V. Ruthvik · V. Dhanashree · K. S. S. Ganesh · C. Deepti

PES University, Bangalore, India
e-mail: sumedapuja02@[Link]
C. Deepti
e-mail: deeptic@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 67
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
68 S. Puja et al.

We hope to contribute to the ongoing discussion about Bitcoin price prediction

by reviewing existing research, examining various methodological approaches, and
doing empirical analysis of historical price data. By shedding light on the underlying
variables driving Bitcoin’s price movements and analyzing the efficacy of different
prediction models, we hope to give vital insights for investors, traders, and researchers
navigating the complicated and fast-changing world of cryptocurrencies.
The structure of the paper is designed as follows: Sect. 2 provides a comprehensive
literature survey on Bitcoin price prediction. Section 3 introduces our methodology,
detailing the various approaches and techniques utilized in our analysis. In Sect. 4, we
delve into the ensemble models employed, discussing their strengths and limitations.
Finally, Sect. 5 presents the results of our analysis and discusses their implications
for Bitcoin price prediction.

2 Literature Survey

Cryptocurrencies have gathered massive popularity as financial resources, inciting

broad investigation into predicting their costs [8, 9]. This survey compiles discoveries
from numerous papers, investigating different strategies, approaches, explorations,
and forecasts of various digital currencies, including Bitcoin.
The volatility of Bitcoin’s price has drawn the attention of investors, traders, and
academics alike, demanding extensive research and analysis [10, 11]. A number of
variables influence Bitcoin’s price, notably macroeconomic conditions, trends, inno-
vations in technology, changes in regulations, and market sentiment. These variables
contribute to a turbulent and dynamic market environment.
The study by Sittivangkul et al. 2022 [12] uses clustering procedures to distin-
guish people having interests in investing in this sector from the other. The outcome
showed increased interest among individuals to invest in the cryptocurrency market.
Nonetheless, it also has limitations, including dependence on a small amount of data
and user-generated content, which confines the model’s scope.
The target of the work by Bangroo et al. 2022 [13] was to predict digital currency’s
cost, utilizing machine learning calculations like Random Forest, Direct Relapse, and
so forth [14, 15]. The Random Forest regressor stands apart for its strength with enor-
mous datasets and great accuracy. Limitations include the absence of interpretability
for the prediction.
Concerning the research by Parab and Nitnaware, 2022 [16], aimed to fore-
cast cryptocurrency prices utilizing long short-term memory (LSTM) and Autore-
gressive Integrated Moving Average (ARIMA) [17, 18]; these models exhibit
strengths in handling time series data and capturing complex patterns. However,
their computational intensity and reliance on large training datasets pose limitations.
The work by Sihananto et al. 2022 [19] explores the use of reinforcement learning
for automated cryptocurrency trading [20]. The adaptability and ability to learn from
experience stand out, but challenges like limited interpretability and handling high
volatility exist.
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 69

Qing et al. 2022 [21] integrate deep learning with autoencoder algorithms for
effective dimensionality reduction [22]. However, the challenge lies in finding
algorithms that preserve fluctuating features while improving signal-to-noise ratio.
The study by Lyu 2022 [23] focuses on short-term trading and compares 10
algorithms for time series forecasting, with Gradient Boosting showing promising
results. Limitations include the need for larger datasets and attributes to enhance
model performance.
The effort by Encean and Zinca, 2022[24], toward Cryptocurrency Price Predic-
tion uses a Gated Recurrent Unit (GRU) and long short-term memory (LSTM)
networks that exhibit effectiveness in predicting cryptocurrency evolution [25, 26].
Yet, the study’s reliance solely on historical price and social media sentiment data
limits its predictive scope. Long short-term memory (LSTM) networks achieve higher
prediction accuracy by considering external factors influencing cryptocurrency fluc-
tuations. However, the model’s efficiency is limited due to using data from a specific
time frame.

3 Methodology

After careful and rigorous research, we determined that random forest and gradient
boosting are ideal choices for forecasting Bitcoin prices in the volatile cryptocurrency
market due to their excellent ability to handle large datasets and their predictive accu-
racy. Therefore, we began developing separate prediction algorithms. The dataset we
utilized consisted of 7 columns and 1449 rows. It encompassed data from January
11, 2019, to January 11, 2023, with each row representing the highest value, lowest
value, open price, close price, volume, date, and adjacent close price for the respec-
tive day. The input features for training included date, open, high, low, adjacent close,
and volume along with the newly engineered features are discussed later in the paper.

3.1 Overview of Random Forest and Gradient Boosting

Regressor

Random Forest and Gradient Boosting are strong ensemble learning strategies
broadly utilized in machine learning to improve predictive performance. Random
Forest is a collection of various decision trees that are developed during training, and
predictions are made by averaging or voting over the individual tree forecasts. Every
tree is based on an arbitrary subset of the preparation data and utilizes an arbitrary
subset of elements at each split. This randomness decreases overfitting and floods
the model’s strength. Random Forest is known for its adaptability, dealing with both
classification and regression tasks effectively.
70 S. Puja et al.

Then again, Gradient Boosting is a boosting method that forms trees consecutively,
with each tree rectifying the blunders of the past one. In this technique, the model is
prepared stage by stage, and each new tree centers around limiting the blunders of
the consolidated ensemble. Gradient Boosting is especially successful in taking care
of perplexing connections within the data and succeeds in predictive tasks where
high accuracy is essential. Common implementations include “Adaptive Boosting
(AdaBoost)”, as well as the more refined Extreme Gradient Boosting (XGBoost)
and Light Gradient Boosting Machine (LightGBM).
While both Random Forest and Gradient Boosting aim to improve the predic-
tive accuracy through ensembling, they vary in their methodologies. Random Forest
forms autonomous trees equally, while Gradient Boosting assembles trees in a steady
progression to address mistakes. The decision between them frequently depends on
the distinct attributes of the dataset and the suitable balance between interpretability
and predictive power.

3.2 Pre-Processing and Feature Engineering

Pre-processing followed by feature engineering was performed on the dataset of

Bitcoin prices. The pre-processing steps included removing outliers, checking data
types, and normalizing the data using the MinMaxScaler function. Removing outliers
is important because they can skew the analysis and lead to inaccurate predictions.
After removing the outliers, the subsequent step included checking the data types
of the columns. This step is significant because machine learning models require
data, and a few sections might be changed from categorical to numeric. For this
paper, all segments were numerical except for the date section, which was switched
over completely to a datetime format. The following stage included normalizing the
information utilizing the MinMaxScaler function.
Daily Price Range, change, and range change ratio are the new features designed.
These feature engineering steps provide additional insights into the price dynamics
and relationships between different price attributes, which can be valuable for further
analysis and modeling.

3.3 Graphical Interpretation

To understand the relationship between Bitcoin price and the engineered features,
various visualizations were used. These visualizations included line plots showing
the daily price range, change, and range change ratio over time. They are useful for
understanding historical price movements and identifying patterns such as trends,
seasonality, and volatility. Figure 1 shows the volatile surges that make the market
very unstable.
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 71

Fig. 1 Bitcoin close price trend

Candlestick charts are frequently employed in the financial market for technical
analysis. It offers a graphic depiction of price changes over a certain time frame.
As Fig. 2 illustrates, these charts help spot pricing patterns and trends as well as
forecast future price changes. This was primarily used to assess the varied surges
and depreciations that occurred during the year.

Fig. 2 Bitcoin candlestick chart

72 S. Puja et al.

Fig. 3 Correlation heatmap

The correlation heatmap as shown in Fig. 3 was created to analyze the relation-
ships between Bitcoin price and the engineered features. The heatmap visualizes the
correlation between different attributes in the dataset. This can be very significant
for feature selection in modeling and evaluating the impact of different attributes on
the target variable. Open, high, low, and close were the attributes with the highest
correlation among themselves.
The relationship between the price and trading volume of Bitcoin is visualized in
Fig. 4 by plotting the price and volume trend of Bitcoin. It helps figure out possible
trends and patterns as well as comprehend how price changes and trade activity are
related.
Figure 5’s box plot illustrates how the price of Bitcoin varies by the day in a
week. It helps comprehend how prices fluctuate on different days and spot any trends
or unusual price movements on a particular day of the week. The 50-day moving
average price of Bitcoin superimposed upon the actual price is visualized in Fig. 6.
It highlights and helps to smooth out short-term pricing data swings and identify
long-term trends.
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 73

Fig. 4 Bitcoin price value trend

Fig. 5 Boxplot of price change by day of the week

74 S. Puja et al.

Fig. 6 50-day moving average

3.4 Implementation of the Models

Using Random Forest for Bitcoin price prediction. Random Forest combines
several decision trees to provide predictions of Bitcoin price. The Random Forest
algorithm was used in the current work for regression analysis to predict the next
day’s “Close” price of Bitcoin as seen in Fig. 7. Upon deploying the model, it was
found that the mean squared error obtained was approximately 6.38. This metric
measures the average of the squares of the errors or deviations, which provides a
measure of the quality of the model’s predictions. In general, the Random Forest
Regressor showed solid predictive performance, as shown by the low mean squared
error and low R-squared score of 0.87. These outcomes propose that the model
performed satisfactorily in anticipating the following day’s “Close” cost of Bitcoin
given the chosen features.

Fig. 7 Actual versus

predicted bitcoin close prices
(Next Day)
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 75

Fig. 8 Actual versus predicted bitcoin close prices (Gradient Boosting)

Using Gradient Boosting for Bitcoin price prediction. One more way to deal
with foreseeing the cost of Bitcoin is to utilize Gradient Boosting, which is also an
ensemble machine learning algorithm. Gradient Boosting is known for its capacity to
deliver profoundly precise prediction which is accomplished by utilizing the qualities
of weak models and iteratively refining them. Gradient Boosting can frequently
outflank other machine learning algorithms in terms of precision in prediction.
It is powerful at detecting non-linear connections and complex examples in the
data. This makes it reasonable to demonstrate the unpredictable elements of monetary
business sectors, for example, the value developments of Bitcoin, which frequently
show non-direct and complex ways.
After considering the feature-engineered variables, the mean square error (MSE)
was observed as 2.43. This demonstrates that the Gradient Boosting model displayed
in Fig. 8 achieved a value extremely close to the actual “Close” cost of Bitcoin for
the following day, further indicating the viability of the model.

4 Ensemble of Models

While analyzing the results of both models, a disparity was seen between them.
Approximately 45% of the close price values anticipated by Random Forest were
lower than the actual close price, while 56% of the values predicted by Gradient
Boosting were higher than the actual closing price. If a lower value was predicted by
Random Forest, a higher value was predicted by Gradient Boosting, and vice versa.
We realized that the best approach was to combine the predictions from both the
76 S. Puja et al.

Random Forest and Gradient Boosting models to obtain a more precise forecast of
the cost of Bitcoin, which represents our novelty. Moving forward with the ensemble,
we employed four different types of ensemble: a stacking model, a bagging model,
a blending model, and a voting model.

4.1 Stacking Model

The stacking model is an ensemble learning strategy that consolidates various base
models to work on predictive performance. The stacking model involves training
multiple distinct base models on the dataset, such as Gradient Boosting Regressor
and Random Forest Regressor, for Bitcoin price prediction. Rather than directly
utilizing their forecasts, the stacking model finds a way to join these predictions by
utilizing a meta-learner (e.g., Extreme Gradient Boosting (XGBoost) Regressor) to
make the final prediction.
We used stacking because it aims to capture various aspects of the underlying
data and takes advantage of the variety of predictions made by various base models.
By joining the qualities of numerous models, stacking might overcome the limits of
individual models and accomplish better prediction accuracy. Also, stacking provides
greater adaptability in identifying complex connections within the Bitcoin cost data,
prompting more accurate predictions. Here are the explanations for the parameters
used.
Base Models: The variety of predictions and their performance are determined by the
selection of base models. We have used Gradient Boosting Regressor and Random
Forest Regressor as base models.
Meta-Learner: The meta-learner, such as the Extreme Gradient Boosting (XGBoost)
Regressor, oversees learning how to effectively combine base model predictions.
Extreme Gradient Boosting (XGBoost) was used as a meta-learner for the project.
Stacking Architecture: The stacking architecture determines how predictions from
base models are integrated to train the meta-learner. This involves deciding whether
to use the already defined probabilities or other modified representations as input
features for the meta-learner. We independently trained the two base models on the
training data. After training the base models, predictions were obtained on a holdout
validation set using both base models. Subsequently, the predictions from both base
models were stacked horizontally to create a new feature matrix. Each row of the
matrix contains the predictions made by both base models for a specific sample
in the validation set. The meta-learner, an Extreme Gradient Boosting (XGBoost)
Regressor, was then trained on the stacked predictions along with the corresponding
actual values. The meta-learner learns to effectively combine the predictions from
the base models to determine the final prediction. Finally, the meta-learner predicts
the target variable (Bitcoin price) using the stacked predictions obtained from the
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 77

Fig. 9 Used stacking architecture

test set. The predictions made by the meta-learner represent the final prediction of
the stacking model, as depicted in Fig. 9.
Strategy for Cross-Validation: Stacking generally includes a cross-validation
methodology to prevent overfitting.
To provide a more detailed understanding Fig. 10 and to depict the differences
between the actual price and the predicted price, Fig. 11 are used.

4.2 Bagging Model

An ensemble learning technique called “bagging,” also known as “Bootstrap Aggre-

gating,” tries to increase the precision and stability of machine learning algorithms.
Making the final prediction involves averaging the predictions of numerous instances
of the same base model that have been trained on diverse subsets of the training data
(usually sampled with replacement). The bagging regressor is made involving the
Random Forest Regressor and Gradient Boosting Regressor as the base assessors.
78 S. Puja et al.

Fig. 10 Stacking results

Fig. 11 Stacking the difference between the actual and predicted prices

Bagging is utilized since it reduces overfitting and variance in the model by

presenting diversity in the preparation cycle. Both the models are trained using
different subsets of training data and later averaged to get the final prediction. Bagging
can boost a model’s stability and generalizability by training multiple models on
various data subsets and averaging their predictions. With regards to the Random
Forest Regressor, bagging further upgrades the power of the model by lessening the
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 79

effect of individual trees’ overfitting, and bagging reduces the variance of individual
Gradient Boosting Regressor models, therefore increasing their robustness. Figure 12
depicts the basic workings of a bagging model.
To provide a more detailed understanding Fig. 13 and to depict the differences
between the actual price and the predicted price, Fig. 14 are used.

Fig. 12 Bagging

Fig. 13 Bagging results

80 S. Puja et al.

Fig. 14 Bagging with difference between the actual and predicted prices

4.3 Blending Model

Blending is an ensemble learning technique where predictions from many base

models are blended using a meta-model. It is sometimes referred to as meta-learning
or model stacking. We independently trained two basic models: a Random Forest
Regressor and a Gradient Boosting Regressor. A meta-model, in this case a Linear
Regression model, is trained with the predictions from these underlying models as
features. To get at the ultimate prediction, the meta-model acquires the ability to
efficiently integrate the predictions from the basis models.
We used blending because it enables the blending of predictions from various base
models. Blending attempts to make use of the strengths and weaknesses that each
base model possesses and may increase overall accuracy. Blending can outperform
individual models in terms of generalization and performance by teaching a meta-
model how to weigh the predictions from various base models. To provide a more
detailed understanding Fig. 15 and to depict the differences between the actual price
and the predicted price, Fig. 16 are used.

4.4 Voting Model

Voting in the context of ensemble learning is calculating the arithmetic mean of the
predictions given by several different individual models. To get the final prediction,
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 81

Fig. 15 Blending results

this straightforward ensemble technique averages the predictions of each base model.
Furthermore, we tried altering the weights given to each of these model’s predictions
and found out 0.6 and 0.4 for Gradient Boosting and Random Forest respectively
was the best combination. To provide a more detailed understanding Fig. 17 and to
depict the differences between the results of the actual price and the predicted price,
Fig. 18 are used.

5 Result Analysis

We designed six different models, two individual models—Random Forest and

Gradient Boosting and four ensemble methods (Stacking, Bagging, Blending, and
Voting) as shown in Table 1 given below. Training, testing, and validation ratio was
[Link].
The market can be influenced by several factors that may lead to the downfall
or spike in the Bitcoin price. Keeping this in mind, we investigated the spreadsheet
with predicted and actual prices, realizing that the prediction was approximately
$50 higher or lower than the actual price only on the days when celebrities publi-
cized crypto growth or made comments against the crypto market. Hence, the model
admissibly could not capture the change. Which is why the maximum error remained
constant.
Table 1 shows that the ensemble of the models stands out significantly from the
base models. The ensemble approaches lower the mean squared error by approx-
imately 99% for both the base models and improve the R-square value by 26%
82

Fig. 16 Blending the difference between the actual and predicted prices
S. Puja et al.
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 83

Fig. 17 Voting results

Fig. 18 Voting with difference between the actual and predicted prices

compared to the Random Forest model and by 16% compared to Gradient Boosting
on average. These computations further show that Voting (0.6,0.4) performs better
than the other ensemble methods with the least maximum error, mean squared error,
and highest R-square value. Hence, an ensemble approach by voting to predict the
subsequent day’s Bitcoin price is the best approach.
84 S. Puja et al.

Table 1 Results obtained

Models Mean square error R-Square Maximum error
Random forest 6.38 0.78465 52.32
Gradient boosting 2.43 0.85134 52.43
Stacking 0.0000238 0.99975 52.01
Bagging 0.0000432 0.99952 52.20
Blending 0.0006765 0.99301 52.08
Voting (0.6,0.4) 0.0000207 0.99983 51.40

6 Conclusion

In this review, we directed a thorough examination of Bitcoin price data, utilizing

different data pre-processing procedures and visualization methods to gain insights
into its patterns and trends. Our investigation included the execution of machine
learning models, including Random Forest Regressor and Gradient Boosting
Regressor, to anticipate the following day’s Bitcoin close price. While these under-
lying models showed a few limitations in R-square scores and mean square error,
we realized the valuable chance to upgrade their performance through ensemble
learning.
To address this, we utilized a range of ensemble techniques, including Stacking,
Bagging, Blending, and Voting. Our trial and error uncovered that a carefully
weighted voting ensemble of Random Forest and Gradient Boosting outperformed
all remaining methodologies. By allocating weights of 0.4 and 0.6 to Random Forest
and Gradient Boosting, individually, we accomplished unrivaled predictive accuracy
at Bitcoin costs.
Consequently, we reason that the ensemble of Random Forest and Gradient
Boosting models offers a compelling strategy for predicting Bitcoin prices. This
approach uses the strengths of the two models while overcoming their shortcomings,
improving accuracy, and delivering reliable predictions. Moving forward, the paper
does not include sentiment analysis which would be the solution for the undetected
price peaks and high maximum due to semantic reasons, i.e., an influential person
promoting or discouraging a coin, providing room for further study.

References

1. Nakamoto S (2008) Bitcoin: A peer-to-peer electronic cash system. Decentralized Bus Rev, pp
21260
2. Tomov YK (2019) Bitcoin: Evolution of Blockchain technology. In: 2019 IEEE XXVIII Inter-
national Scientific Conference Electronics (ET), Sozopol, Bulgaria, pp 1–4. [Link]
1109/ET.2019.8878322
3. Ma J, Zhu Y, Xu J, Li Y, Zhang Y, Wang J (2022) Research on Bitcoin price prediction based
on Support Vector Regression and its variant combination model. In: 2022 18th International
6 An Ensemble Machine Learning-Based Approach Toward Accuracy … 85

Conference on Computational Intelligence and Security (CIS), Chengdu, China, pp 186–189.

[Link]
4. Rizwan M, Narejo S, Javed M (2019) Bitcoin price prediction using Deep Learning Algorithm.
In: 2019 13th International Conference on Mathematics, Actuarial Science, Computer Science
and Statistics (MACS), Karachi, Pakistan, pp 1–7. [Link]
9024772
5. Wimalagunaratne M, Poravi G (2018) A predictive model for the global cryptocurrency market:
a holistic approach to predicting cryptocurrency prices. In: 2018 8th International Conference
on Intelligent Systems, Modelling and Simulation (ISMS), Kuala Lumpur, Malaysia, pp 78–83.
[Link]
6. Felizardo L (2019) Comparative study of bitcoin price prediction using WaveNets recurrent
neural networks and other machine learning methods. In: 2019 6th International Conference
on Behavioral Economic and Socio-Cultural Computing (BESC), pp 1–6
7. Sina E et al (2021) Prediction of cryptocurrency price index using artificial neural networks: a
survey of the literature. Eur J Bus Manag Res 6(6):17–20
8. Antonopoulos AM (2015) Mastering bitcoin: Unlocking digital cryptocurrencies, sebastopol,
CA, USA
9. Farell R (2015) An analysis of the cryptocurrency industry. available at [Link]
10. Fernandes M, Khanna S, Monteiro L, Thomas A, Tripathi G (2021) Bitcoin Price Prediction.
In: 2021 International Conference on Advances in Computing, Communication, and Control
(ICAC3), Mumbai, India, pp 1–4. [Link]
11. McNally, et al. (2018) Predicting the price of bitcoin using machine learning. In: 2018 26th
Euromicro International Conference on Parallel Distributed and Network-based Processing
(PDP), pp 339–343
12. Sittivangkul K, Arreeras T, Tiwong S (2022) Perception and clustering analysis towards cryp-
tocurrency investment decision using machine learning. In: 2022 International Conference on
Decision Aid Sciences and Applications (DASA), Chiangrai, Thailand, pp 1441–1445. https://
[Link]/10.1109/DASA54658.2022.9764969
13. Bangroo R, et al. (2022) Cryptocurrency price prediction using machine learning algorithm.
In: 2022 10th International Conference on Reliability, Infocom Technologies and Optimization
(Trends and Future Directions) (ICRITO), Noida, India, pp 1–4. [Link]
ITO56286.2022.9964870
14. Liaw A, Wiener M (2002) Classification and regression by random forest. R news 2(3):18–22
15. Parmar A, Katariya R, Patel V (2019) A review on random forest: An ensemble classifier. In:
Hemanth J, Fernando X, Lafata P, Baig Z (eds) International Conference on Intelligent Data
Communication Technologies and Internet of Things (ICICI) 2018. ICICI 2018. Lecture Notes
on Data Engineering and Communications Technologies, vol 26. Springer, Cham. [Link]
org/10.1007/978-3-030-03146-6_86
16. Parab LJ, Nitnaware PP (2022) Evaluation of cryptocurrency coins with machine learning
algorithms and Blockchain technology. In: 2022 IEEE Region 10 Symposium (TENSYMP),
Mumbai, India, pp 1–5. [Link]
17. Shin M, Mohaisen D, Kim J (2021) Bitcoin price forecasting via ensemble-based LSTM deep
learning networks. In: 2021 International Conference on Information Networking (ICOIN),
Jeju Island, Korea (South), pp 603–608. [Link]
18. Poongodi M, Vijayakumar V, Chilamkurti N (2020) Bitcoin price prediction using ARIMA
model. Int J Internet Technol Secur Trans 10(4):396–406
19. Sihananto AN, Sari AP, Prasetyo ME, Fitroni MY, Gultom WN, Wahanani HE. Reinforcement
learning for automatic cryptocurrency trading. In: 2022 IEEE 8th Information Technology
International Seminar (ITIS), Surabaya, Indonesia, 2022, pp 345–349. [Link]
ITIS57155.2022.10010206
20. Sihananto AN, et al. (2022) Reinforcement learning for automatic cryptocurrency trading. In:
2022 IEEE 8th Information Technology International Seminar (ITIS), Surabaya, Indonesia, pp
345–349, [Link]
86 S. Puja et al.

21. Qing Y, Sun J, Kong Y, Lin J (2022) Fundamental multi-factor deep- learning strategy for
cryptocurrency trading. In: 2022 IEEE 20th International Conference on Industrial Informatics
(INDIN), Perth, Australia, pp 674–680. [Link]
22. Nan L, Tao D (2018) Bitcoin mixing detection using deep autoencoder. In: 2018 IEEE Third
International Conference on Data Science in Cyberspace (DSC), Guangzhou, China, pp 280–
287. [Link]
23. Lyu H (2022) Cryptocurrency price forecasting: A comparative study of machine learning
model in short-term trading. In: 2022 Asia Conference on Algorithms, Computing and Machine
Learning (CACML), Hangzhou, China, pp 280–288. [Link]
2022.00054
24. Encean A-A, Zinca D (2022) Cryptocurrency price prediction using LSTM and GRU
networks. In: 2022 International Symposium on Electronics and Telecommunications (ISETC),
Timisoara, Romania, pp 1–4. [Link]
25. Khan MS, Bazai SU, Ghafoor MI, Marjan S, Ameen M, Shah SAA (2023) Forecasting cryp-
tocurrency prices using a gated recurrent unit neural network. In: 2023 International Conference
on Energy, Power, Environment, Control, and Computing (ICEPECC), Gujrat, Pakistan, pp 1–6.
[Link]
26. Li T (2022) Prediction of bitcoin price based on LSTM. In: 2022 International Conference
on Machine Learning and Intelligent Systems Engineering (MLISE), Guangzhou, China, pp
19–23. [Link]
Chapter 7
Examining Transfer Learning Models
to Classify Brain Tumors from MRI
Images: A Comparative Analysis

Umang Sachdev, A. Charan Kumari , and K. Srinivas

1 Introduction

The treatment of brain tumors typically involves a combination of surgical interven-

tion, chemotherapy, and radiotherapy, yet, despite rigorous medical attention, patients
often face a survival expectancy of around 14 months [1]. Early detection stands as
a pivotal phase in treatment planning, significantly impacting patient survival rates.
Magnetic Resonance Imaging (MRI) stands out for its ability to produce high-fidelity
images, rendering it an efficient tool for monitoring brain tumors [2]. Convolu-
tional Neural Networks (CNNs) [3], mirroring the biological neurons in the human
visual cortex, excel in image classification tasks by learning and discerning key
features for accurate image categorization. CNNs have garnered widespread appli-
cation in medical contexts due to their superior performance in discerning intricate
patterns from data samples. Therefore, they learn the features responsible for accurate
classification of images [4].
A limitation of CNNs without transfer learning lies in their reliance on large
datasets for effective model training, resulting in less-than-optimal classification
outcomes when datasets are limited. A Convolutional Neural Network (CNN) repre-
sents a variant of feed-forward Artificial Neural Network (ANN), drawing inspira-
tion from the neuronal arrangement in the visual cortex of animals. This architec-
ture predominantly comprises four fundamental steps for classification: convolution,
activation, pooling, and fully connected layers.

U. Sachdev (B) · A. Charan Kumari · K. Srinivas

Dayalbagh Educational Institute, Dayalbagh, Agra 282005, Uttar Pradesh, India
e-mail: sachdev.umang60@[Link]
A. Charan Kumari
e-mail: charankumari@[Link]
K. Srinivas
e-mail: ksrinivas@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 87
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
88 U. Sachdev et al.

The CNN architecture involves the repetition of convolutional layers, activation

layers, and pooling layers at various instances throughout its structure. These compo-
nents collectively contribute to the network’s capability to extract features effectively
and perform intricate pattern recognition tasks. The convolutional layer within the
network entails a mathematical operation utilizing two primary inputs: the input
image matrix and a filter. This operation entails multiplying the input image by the
filter, resulting in the generation of a feature map as the output. Within the Activa-
tion layer, an activation function is applied to introduce nonlinearity into the neural
network. Rectifier Linear Units (ReLUs) are selected for their ability to expedite the
training process.
The Pooling layer aims to overcome a primary limitation of the convolutional
layer, which inherently relies on capturing features dependent on their specific
locations within the image. This location dependency poses a challenge, as even
minor changes in feature position can lead to inaccuracies in classification. Pooling
addresses this limitation by condensing the representation, ensuring it remains
invariant to minor variations and inconsequential details. Both max pooling and
average pooling techniques are employed to effectively merge features.
Following the convolutional layers, the learned features are passed into the Fully
Connected layer. Here, the term “fully connected” implies that each node in this layer
establishes connections with every other node in the subsequent layer. The primary
objective of this layer is to correlate specific classes with input images, utilizing
softmax activation. Crucial to the training process is the Loss function (H), essen-
tial for minimizing errors during network training. After the image undergoes the
preceding layers, its output is compared against the desired output using the loss func-
tion, enabling the calculation of the error rate. This iterative process continues until
the loss function is minimized. Deep transfer learning is an approach rooted in the
transfer of knowledge from one domain to another. This method finds utility partic-
ularly in scenarios where neural networks encompass a high number of parameters
or when the collected samples are inadequate for a specific task. In such instances, a
reduced number of training iterations is anticipated to fine-tune the network for the
targeted assignment.
This study holds significant implications in the field of medical imaging and health
care. By exploring the efficacy of transfer learning models in classifying brain tumors
from MRI images, the research contributes to advancements in diagnostic technology
and patient care. Through comparative analysis, it helps identify the most effective
approaches for leveraging existing knowledge from pre-trained models to improve
the accuracy and efficiency of brain tumor classification. Ultimately, this study has
the potential to enhance medical professionals’ ability to diagnose and treat brain
tumors promptly, leading to improved patient outcomes and quality of life.
The major contributions of this research work can be briefed as follows:
1. The paper provides a detailed analysis of InceptionResNet-V2, MobileNet,
ResNet50, and VGG16, focusing on their application in brain tumor detection
and classification from MRI images.
7 Examining Transfer Learning Models to Classify Brain Tumors … 89

2. The work demonstrates the efficacy of these models in achieving high accuracy
in tumor classification, even with a relatively limited dataset of 5712 images.
3. It offers a comparative study of the models, highlighting the strengths and
weaknesses of each in terms of accuracy, precision, recall, and F1-score.
4. It suggests the application of these models in early tumor detection, which could
significantly improve patient outcomes.
The remainder of the paper is organized as follows: Sect. 2 provides a review of
the literature, while Sect. 3 outlines the research methodology, including details on
the dataset and its preprocessing, the evaluation metrics used for assessing model
performance, the experimental setup, and a concise overview of the transfer learning
methods employed. Section 4 discusses the obtained results, and Sect. 5 concludes.

2 Literature Review

Convolutional neural networks are mainly used for image classification tasks because
they are based on the biological neurons in the human visual cortex. Therefore, they
learn the features responsible for accurate classification of images [4].
Abdalla and Esmail [5] proposed a method for brain tumor detection using arti-
ficial neural networks (ANNs). This method uses ANNs to classify normal and
tumorous MRI images of the brain. The method uses statistical feature analysis
to extract features from the images and detect tumors. Another automatic brain
tumor detection method was proposed by Pereira et al. [6] using CNNs with 3 ×
3 kernels. This architecture was used to classify brain MRI images from the BRaT
dataset. Because the dataset was very large (almost 7 GB), training the architecture
on multi-core GPU systems took 30 h.
Abd-Ellah et al. [7] proposed a two-step multi-model automatic brain tumor diag-
nosis where CNNs were used to classify MRI images into normal and abnormal
images. This method uses CNN models Alex-Net, VGG16, and VGG19, and the
method is tested on images from the RIDER neuro MRI database. This database
contains 349 images, including 240 healthy and 109 unhealthy images. In this study,
the CNN was trained with transfer learning because the dataset used was insufficient
to train a model from scratch.
Noreen et al. [8] outline a methodology for extracting and merging multi-level
features to facilitate early detection of brain tumors. The efficacy of this approach
is evaluated using two pre-trained deep learning models, namely Inception-v3 and
DenseNet201. Two distinct scenarios for brain tumor detection and classification
are explored utilizing these models. Khan et al. [9] propose an automated brain
tumor detection model employing pre-trained VGG16 and Inception-V3, utilizing
a dataset comprising 253 images, encompassing 155 tumor images and 98 healthy
images. However, the dataset size proves insufficient for fine-tuning the CNNs, and
the test dataset lacks adequacy for accurately assessing the model’s performance.
Amin et al. [10] introduce a brain tumor detection model utilizing VGG16 with the
90 U. Sachdev et al.

BRaTs dataset, achieving 84% accuracy through transfer learning, and fine-tuning
over 50 epochs.
Srivastava et al. [11] introduce a dropout technique aimed at mitigating overfitting
in neural networks by randomly deactivating units and their connections. Dvorák et al.
[12] opt for convolutional neural networks as the learning algorithm due to their
aptitude in handling feature correlation. Their technique is validated on the public
BRATS2014 dataset, yielding state-of-the-art results for brain tumor segmentation
tasks with 254 multimodal volumes, each processed in only 13 s.
Irsheidat et al. [13] construct a model based on Artificial Convolutional Neural
Networks, which utilizes mathematical formulas and matrix operations to analyze
magnetic resonance images, predicting the likelihood of brain tumor presence.
Trained on MRI images of 155 normal and 98 tumorous brains, the model demon-
strates its predictive capability based on a collection of 253 magnetic resonance
images.
Sravya et al. [14] investigated brain tumor detection and presented some important
challenges and techniques. An automated brain tumor detection system was proposed
and studied by Dipu et al. [15] using the YOLO model and the deep learning library
FastAi with the BRATS 2018 dataset, which contained 1,992 MRI scans of the
brain. The authors achieved 85.95% accuracy for YOLO and 95.78% for the FastAi
classification model. A brain tumor detection application was proposed by Gaikwad
et al. [16] to classify MRI images as tumorous and non-tumorous using the VGG16
model. The authors used the Kaggle dataset for training and showed an improvement
in accuracy.
Monirul et al. [17] utilized four transfer learning architectures—InceptionV3,
VGG19, DenseNet121, and MobileNet—with a dataset sourced from three bench-
mark databases, figshare, SARTAJ, and Br35H, to validate the models. These
databases encompass four classes: pituitary, no tumor, meningioma, and glioma.
Image augmentation techniques were employed to ensure class balance. Exper-
imental findings indicate that MobileNet surpasses other methods, achieving an
accuracy of 99.60%.
Kasi et al. [18] examined various deep transfer learning methods, including
InceptionResNet-V2, ResNet50, MobileNet-V2, and VGG16, to determine the
optimal model for detecting brain tumors from a publicly available MRI dataset.
Additionally, CLAHE was applied as an image enhancement technique to enhance
the quality of the image dataset before serving as input for the models. Consequently,
the proposed approach achieved a prediction accuracy of up to 100%.
Anirudh et al. [19] explore the application of transfer learning in detecting brain
tumors using publicly available MRI images of the brain. Initially, they train a deep
learning model on a large dataset of natural images (ImageNet), a task facilitated by
Keras. They then fine-tune this model using a smaller dataset containing brain MRI
images. A comparison between this approach and traditional deep learning methods
indicates that transfer learning succeeds at comparable performance with a signifi-
cantly reduced dataset size. This research underscores the effectiveness of transfer
learning in training deep learning models for brain tumor detection, particularly in
7 Examining Transfer Learning Models to Classify Brain Tumors … 91

scenarios where data availability is limited. Such an approach shows promise for
improving brain tumor detection in clinical settings.
Vinod et al. [20] assessed three fundamental models in computer vision: AlexNet,
VGG16, and ResNet-50. The outstanding performance of both the VGG16 and
ResNet-50 models prompted their integration into a novel hybrid VGG16-ResNet-
50 model. This model was then employed on the dataset, resulting in remarkable
accuracy, sensitivity, specificity, and F1 score. Comparative analysis with alterna-
tive models indicates that the proposed framework demonstrates a high degree of
reliability in effectively identifying various cerebral neoplasms.
Reviewing literature reveals the promising prospects of Convolutional Neural
Networks (CNNs) in accurate image classification tasks. CNNs’ adeptness in feature
extraction using kernels and feature maps contributes to dimension reduction in
inputs, thereby enhancing model efficiency within time and memory constraints.
Insufficient training data often leads to diminished accuracy. In such scenarios,
transfer learning emerges as a potent technique, requiring fewer computations and
parameters compared to training from scratch, thereby enabling higher accuracy even
with limited data. Additionally, fine-tuning stands as a valuable strategy to enhance
model accuracy, particularly when dealing with classification problems differing
from the source of transfer learning.

3 Research Methodology

This section outlines the comprehensive process flow employed for assessing the
transfer learning models covered in this research. In addition, it also presents a brief
description of the dataset used for model training and the evaluation metrics adopted
to assess the model performance.

3.1 Dataset Description and Preprocessing

The investigations conducted in this study were performed using a publicly available
brain tumor dataset acquired from Kaggle. This research studies the performance of
four transfer learning models to classify three different types of brain tumors namely
Glioma, Meningioma and Pituitary, and normal cases, i.e., no tumor.
The distribution of data within each class of training and testing datasets is
illustrated in Fig. 1.
To standardize the dataset, images of varying dimensions were uniformly resized
to (200,200). To address the common issue of noise in MRI images, cropping tech-
niques were utilized to refine the images, focusing on specific areas for training
purposes. Implementing the proposed method necessitated resizing the images to
200 × 200 dimensions as a fundamental step. Furthermore, to augment the dataset
and increase its diversity, random transformations such as rotation and flipping were
92 U. Sachdev et al.

NO TUMOUR 1595
405
PITUITARY 1457
300
MENINGIOMA 1339
306
GLIOMA 1321
300
0 500 1000 1500 2000
Training Data Testing Data(unseen)

Fig. 1 Distribution of dataset for training and testing

applied. This augmentation strategy aimed to prevent overfitting by enhancing dataset

variability. Following data augmentation, the dataset comprised 4117 brain tumor
images and 1595 non-brain tumor images, totaling 5712 images. Import
antly, all image data underwent normalization to ensure consistency, whereby the
values of brightness of each pixel were rescaled within the range [0,1].

3.2 Performance Evaluation Metrics

The assessment of the proposed brain tumor detection methods hinges on four pivotal
performance metrics: True Positives (TP), True Negatives (TN), False Positives
(FP), and False Negatives (FN). These metrics form the foundation for gauging
the effectiveness of the experimental method.
1. Accuracy (ACC) serves as a metric indicating the model’s effectiveness in accu-
rately identifying brain tumors. It is calculated as the ratio of correctly identified
instances to the total number of instances, as depicted in Eq. (1):

TP + TN
Accuracy = (1)
TP + TN + FP + FN
2. Precision indicates the correctness of positive predictions, and signifies the
percentage of correctly identified positive instances among those projected as
positive. This measure is determined by the formula represented in Eq. (2):

TP
Precision = (2)
TP + FP
3. Recall or sensitivity measures the proportion of accurately classified instances
within each classification category. This metric is calculated according to the
formula depicted in Eq. (3):

TP
Recall = (3)
TP + FN
7 Examining Transfer Learning Models to Classify Brain Tumors … 93

4. F1-score is calculated as the harmonic mean of precision and recall. Its calculation
is based on Eq. (4) as provided below

2 × TP
F1-score = (4)
2 × TP + (FP + FN)

3.3 Experimental Setup

This experimental study includes an evaluation and comparison of pre-trained CNN

architectures, namely InceptionResNet-V2, MobileNet, ResNet50, and VGG16, for
the identification and classification of brain tumors using MRI images in their original
format. To facilitate the training of deep learning models for brain tumor detection, an
Adam optimization technique is employed. This technique was chosen for its adap-
tive learning rate mechanism, which dynamically adjusts learning rates for individual
parameters based on past gradients and squared gradients, thereby supporting faster
convergence. This study is based on the publicly available Brain Tumor MRI Dataset
from Kaggle, consisting of 7023 images. This dataset was divided into 4569 images
for training, 1143 images for validation, and 1311 images for testing. The training and
validation sets underwent training for 50 epochs using a deep transfer learning model
network. Following through experimentation, the optimization process employed
a learning rate of 0.001 for the optimizer, along with a binary-cross entropy loss
function. Both during the training and testing phases, a batch size of 64 images
was utilized. All experiments were conducted using the Keras and TensorFlow
frameworks within the Google Colab Pro environment. The computational setup
encompassed 51.0 GB of System RAM and a T4 GPU.

3.4 Transfer Learning Models

This section presents a brief description of the four transfer learning models employed
in this study.

3.4.1 InceptionResNet-V2

InceptionResNet-V2 stands as a convolutional neural network architecture that amal-

gamates attributes from two influential network designs: the Inception architecture
and the Residual Network (ResNet). Developed by Google, this model represents
a fusion of concepts from both architectures to establish a robust and efficient
framework tailored for image recognition tasks. Here the model architecture takes
InceptionResNet-V2 as a base network. It consists of a global average pooling layer
94 U. Sachdev et al.

followed by a dropout layer. The final dense layer has 4 units, corresponding to the
classes in the classification.

3.4.2 MobileNet

MobileNet is a specialized Convolutional Neural Network (CNN) architecture specif-

ically designed for use in mobile and embedded vision systems, specifically tailored
to excel in efficiency and minimized computational demands. Developed by Google,
its primary objective is to deliver remarkable accuracy in image classification and
diverse computer vision tasks while maintaining a lightweight and swift profile suit-
able for deployment on mobile devices characterized by limited computational capa-
bilities. Here the model architecture takes MobileNet as a base network. It consists
of a global average pooling layer followed by a dropout layer. The final dense layer
has 4 units, corresponding to the classes in the classification task.

3.4.3 ResNet50

ResNet50 is a convolutional neural network architecture within the ResNet (Residual

Network) family, originating from Microsoft Research. Named “ResNet50” due to
its depth, which includes 50 layers excluding the final fully connected layer; this
architecture represents a significant innovation in deep learning. In this context, the
model architecture adopts ResNet50 as the base network, featuring a global average
pooling layer followed by a dropout layer. The final dense layer comprises 4 units,
corresponding to the classes in the classification task.

3.4.4 VGG16

VGG16 is a convolutional neural network architecture developed by the Visual

Graphics Group (VGG) at the University of Oxford. The designation “VGG16”
indicates the network’s depth, comprising sixteen weight layers, including thirteen
convolutional layers and three fully connected layers. Renowned for its simplicity
and effectiveness, VGG16 has gained prominence, particularly in image classifica-
tion tasks. In this scenario, the model architecture adopts VGG16 as the base network,
incorporating a global average pooling layer followed by a dropout layer. The final
dense layer contains 4 units, corresponding to the classes in the classification task.
7 Examining Transfer Learning Models to Classify Brain Tumors … 95

Table 1 Evaluation metrics

InceptionResNet-V2 Precision Recall F1-score
of InceptionResNet-V2
Glioma 0.99 0.97 0.98
Meningioma 0.97 0.99 0.98
No_tumor 1.00 1.00 1.00
Pituitary 0.99 0.99 0.99

4 Results and Discussion

To ascertain the top-performing network, the evaluation relied on four established

performance metrics: accuracy, precision, recall, and F1-score. These matrices were
used to evaluate, compare, and analyze the predicted results obtained from the
models. The following subsections present the results of the experimented models.

4.1 InceptionResnet-V2

The results obtained by InceptionResNet-V2 are presented in Table 1. The model

excels across all classes, particularly in the No_tumor class with perfect scores,
indicating flawless identification of non-tumor cases. The Glioma and Meningioma
classes also exhibit high precision and recall, with Meningioma having a slightly
higher recall. The Pituitary class shows similarly high precision and recall.

4.2 MobileNet

The evaluation metrics obtained by MobileNet are shown in Table 2. The model
demonstrates strong performance in all classes, with the No_tumor class showing
high scores, indicating effective identification. The Glioma, Meningioma, and Pitu-
itary classes also have high scores, contributing to a balanced overall performance.

Table 2 Evaluation metrics

MobileNet Precision Recall F1-score
of MobileNet
Glioma 1.00 0.96 0.98
Meningioma 0.95 0.98 0.97
No_tumor 0.99 0.99 0.99
Pituitary 0.99 0.99 0.99
96 U. Sachdev et al.

Table 3 Evaluation metrics

ResNet50 Precision Recall F1-score
of ResNet50
Glioma 0.98 0.99 0.99
Meningioma 0.99 0.98 0.98
No_tumor 0.99 1.00 0.99
Pituitary 0.99 0.98 0.99

4.3 ResNet50

The performance of ResNet50 in terms of metrics is presented in Table 3. The model

exhibits excellent performance across all classes, with perfect recall in the No_tumor
class, indicating accurate identification of all No_tumor instances.

4.4 VGG16

The metrics obtained by the model are shown in Table 4. The model shows good
performance across all classes, with high precision, recall, and F1-score values. The
No_tumor class stands out with perfect precision and a high F1-score, indicating that
the model is very accurate in identifying No_tumor cases. The Glioma class has a
lower recall, suggesting that the model might have missed some instances of Glioma.
The Pituitary class has a lower precision, indicating a higher false positive rate.
The comparison of the four models based on their testing accuracy is presented
in Table 5.
Comparative Insights

Table 4 Evaluation metrics

VGG16 Precision Recall F1-score
of VGG16
Glioma 0.99 0.89 0.94
Meningioma 0.97 0.97 0.97
No_tumor 1.00 0.99 0.99
Pituitary 0.90 0.98 0.94

Table 5 Comparison of
Model Test accuracy (%)
proposed models based on
their testing accuracies InceptionResNet-V2 98.78
MobileNet 98.25
ResNet50 98.78
VGG16 96.26
7 Examining Transfer Learning Models to Classify Brain Tumors … 97

• InceptionResNet-V2 and ResNet50 emerge as the top performers with equal test
accuracy, indicating their superior capability in brain tumor classification.
• While slightly less accurate, MobileNet’s performance is commendable, espe-
cially considering its potentially lower computational demands, making it suitable
for resource-constrained environments.
• Despite its lower accuracy, VGG16 could be a viable option in scenarios with
specific requirements and computational constraints.
• The comparable accuracies of ResNet50 and InceptionResNet-V2 highlight
the effectiveness of transfer learning, particularly in limited dataset scenarios,
achieving higher accuracy more rapidly than training models from scratch.
• The implementation of fine-tuning alongside transfer learning allows for model
weight adjustments to better fit specific problems, enhancing accuracy levels.

5 Conclusion and Future Work

This research presents a thorough assessment of four advanced transfer learning

models—InceptionResNet-V2, MobileNet, ResNet50, and VGG16—for the detec-
tion and classification of brain tumors using MRI images. By employing a dataset
comprising 7023 images, the research demonstrates the efficacy of these models in
accurately distinguishing between three distinct types of brain tumors. The findings
indicate that ResNet50 and InceptionResNet-V2 attained similar high accuracies,
emphasizing the potential of transfer learning in scenarios with limited datasets. This
approach not only expedites the attainment of high accuracy but also reduces the time
and resources required compared to training models from scratch. Furthermore, the
incorporation of fine-tuning in these models bolstered their performance, enabling
more precise adjustments to the model weights tailored to specific classification
tasks.
The study observed that while all models performed robustly, ResNet50 and
VGG16 were particularly effective in reducing false positives for certain tumor types.
MobileNet, with its balanced performance, and VGG16, despite its slightly lower
accuracy in Glioma detection, proved to be valuable in the overall classification task.
Looking ahead, the potential for these models extends beyond mere classifi-
cation. The capability for early tumor detection presents a promising avenue for
improving patient outcomes. Future work will focus on enhancing the precision of
tumor extraction, automatic symmetry axis detection, and exploring the relation-
ships between neighboring MRI slices. The development of more versatile models
capable of detecting various brain tumor types will be a significant step forward.
Such advancements could revolutionize the medical industry, particularly in areas
like computerized laser surgeries, by providing more accurate, efficient, and early
diagnosis tools.
Thus, this study not only demonstrates the efficacy of deep transfer learning
models in medical image analysis but also sets the stage for future innovations that
could have a profound impact on cancer diagnosis and treatment.
98 U. Sachdev et al.

References

1. Van Meir EG, Hadjipanayis CG, Norden AD, Shu HK, Wen PY, Olson JJ (2010) Exciting new
advances in neuro-oncology: the avenue to a cure for malignant glioma. CA: Cancer J Clin
60(3):166–193
2. Deangelis LM (2001) Brain tumors. N Engl J Med 344(2):114–123
3. Scegedy C, Liu W, Jia Y (2015) Going deeper with convolutions in proceedings of the IEEE
conference on computer vision and pattern
4. Prakash RM, Kumari RSS (2019) Classification of MR brain images for detection of tumor
with transfer learning from pre-trained CNN models. In: 2019 international conference on
wireless communications signal processing and networking (WiSPNET). IEEE, pp 508–511
5. Abdalla HEM, Esmail MY (2018) Brain tumor detection by using artificial neural network.
In: 2018 international conference on computer, control, electrical, and electronics engineering
(ICCCEEE). IEEE, pp 1–6
6. Pereira S, Pinto A, Alves V, Silva CA (2016) Brain tumor segmentation using convolutional
neural networks in MRI images. IEEE Trans Med Imaging 35(5):1240–1251
7. Abd-Ellah MK, Awad AI, Khalaf AA, Hamed HF (2018) Two phase multi-model automatic
brain tumour diagnosis system from magnetic resonance images using convolutional neural
networks. EURASIP J Image Video Process
8. Noreen N, Palaniappan S, Qayyum A, Ahmad I, Imran M, Shoaib M (2020) A deep learning
model based on concatenation approach for the diagnosis of brain tumor. IEEE Access
9. Khan HA, Jue W, Mushtaq M, Mushtaq MU (2020) Brain tumour classification in MRI image
using convolutional neural network. Math Biosci Eng 17(5):6203–6216
10. Amin J, Sharif M, Haldorai A, Yasmin M, Nayak RS (2022) Brain tumor detection
and classification using machine learning: a comprehensive survey. Complex Intell Syst
8(4):3161–3183
11. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple
way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
12. Dvorák P, Menze B (2015) Structured prediction with convolutional neural networks for
multimodal brain tumour segmentation. In: MICCAI multimodal brain tumor segmentation
challenge (BraTS), pp 13–24
13. Irsheidat S, Duwairi R (2020) Brain tumor detection using artificial convolutional neural
networks. In: 2020 11th international conference on information and communication systems
(ICICS). IEEE, pp 197–203
14. Sravya V, Malathi S (2021) Survey on brain tumor detection using machine learning and
deep learning. In: 2021 international conference on computer communication and informatics
(ICCCI), pp 1–3
15. Dipu NM, Shohan SA, Salam KMA (2021) Deep learning based brain tumor detection and
classification. In: 2021 international conference on intelligent technologies (CONIT), pp 1–6
16. Gaikwad S, Patel S, Shetty A (2021) Brain tumor detection: an application based on machine
learning. In: 2021 2nd international conference for emerging technology (INCET), pp 1–4
17. Islam M, Barua P, Rahman M, Ahammed T, Akter L, Uddin J (2023) Transfer learning architec-
tures with fine-tuning for brain tumor classification using magnetic resonance imaging. Healthc
Anal 4:1–10
18. Tenghongsakul K, Kanjanasurat I, Archevapanich T, Purahong B, Lasakul A (2022) Deep
transfer learning for brain tumor detection based on MRI images. J Phys Conf Ser
19. Mitta AB, Hegde AH, KP AR, Gowrishankar S (2023) Brain tumor detection: an applica-
tion based on transfer learning. In: 7th international conference on trends in electronics and
informatics (ICOEI), pp 1424–1430
20. Dhakshnamurthy VK, Govindan M, Sreerangan K, Nagarajan MD, Thomas A (2024) Brain
tumor detection and classification using transfer learning models. 2nd Comput Congress
Chapter 8
Supervised Machine Learning
for Recognition of Gujarati Handwritten
Characters with Modifiers

Snehal Shukla and Purna Tanna

1 Introduction

One of the primary problems in pattern recognition and image processing is hand-
written character recognition (HCR) [1]. One method that can turn text included in a
digital image into editable text is optical character recognition. It makes use of optical
systems to enable character recognition in machines. Ideally, the OCR’s output and
the formatting input should match. The procedure entails pre-processing the image
file and then learning crucial information regarding written text [2].
Character recognition can be divided into two primary categories: online and
offline. An online character recognition system generates a document first, scans it
optically, digitises it, stores it on a computer, and then takes it for processing and
testing. Characters are handled in the online systems while they are being created, in
contrast to the off-line class. The pattern’s points are therefore dependent on various
factors such as time, pressure, speed, slant, and strokes [3, 4].
A system that transforms incoming text into a machine-encoded format is called
optical character recognition (OCR) [5]. These days, optical character recognition
(OCR) aids in both the digitisation of handwritten mediaeval manuscripts [6] and the
conversion of typewritten materials into digital format [7]. Because of this, obtaining
the necessary information is now simpler because it is no longer necessary to sift
through mountains of papers and files in order to find it. Organisations are meeting
demands for legal records [8], historical data [9], educational persistence [10], and
other digital preservation needs.
It is necessary to create optical character recognition in order to recognise
various languages. The language spoken in Gujarat is called Gujarati. There are
many handwritten Gujarati documents available. To convert handwritten documents

S. Shukla (B) · P. Tanna

FCAIT, GLS University, Ahmedabad, India
e-mail: snehal.shukla7@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 99
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
100 S. Shukla and P. Tanna

Fig. 1 Stages of the recognition process

into editable text documents, a Gujarati handwritten character recognition system

is needed. There are 34 consonants, 14 modifiers, and 12 vowels in the Gujarati
language. Every combination of consonant, vowel, and modifier in the text must be
identified in order to be understood. The identification of Gujarati handwritten char-
acter modifiers is the main goal of the proposed study. 34 characters and 14 diacritics
combine to produce 476 characters.
Data collection, scanning, segmentation, and grayscale conversion are a few pre-
processing methods that can be used in order to carry out the recognition process.
The steps of the recognition process are shown in Fig. 1.
Section 2 of the study covers related work, while Sect. 3 discusses machine
learning algorithms and their application to the identification of modifiers in
handwritten Gujarati letters. Section 4 presents the paper’s conclusion.

2 Related Work

Kajale [11] had trained the model with 36 alpha numerals and special symbols
like., ‘ “ ? / \ | ! #, etc. in 200 samples. Models can be trained using a supervised
machine learning approach. The model first identifies the type of character, whether
it is handwritten or printed, then matches the character ID with the specific dataset
of handwritten or printed character datasets and provides an accuracy of 90% in
recognising nameplates written in English.
Ehsan Shirzadi [12] had implemented LVQ as a supervised machine learning
algorithm to identify lower case machine written English characters. Within the field
of machine learning and artificial neural networks, the Learning Vector Quantisation
(LVQ) algorithm can be used to solve supervised classification issues [13]. The
researcher used Matlab as a platform, and after applying LVQ, he achieved near-to-
100% accuracy.
Mahajan et al. [14] created and proved the optical correlator for character recogni-
tion using neural network architecture, in 2014. The suggested method forms the basis
for an optical character recognition (OCR) system trained using the back propagation
algorithm, where each typed English character is encoded as a binary value. These
binary numbers are entered into a feature extraction system. The ANN is then given
both the system’s output and its input. After implementation of the Feed Forward
8 Supervised Machine Learning for Recognition of Gujarati Handwritten … 101

Algorithm, error calculation, training, and weight modification can be handled by

the Back Propagation Algorithm.
Rizvi [15] had worked with supervised classification algorithms like Naive Bayes
and Random Forest for Urdu, like printed character recognition. The Weka tool was
used to apply the research, and 98.4% accuracy was achieved. SMO, or multilayer
perceptron, gives 94.57 and 95.67% accuracy.
Shanthi [16] presented an unconstrained handwritten Tamil character recognition
system using an offline support vector machine (SVM). SVM is a brand-new classifier
type for patterns that is based on an innovative statistical learning method. Owing
to the considerable variance in handwritten characters, the system is tested on 34
specific Tamil characters after being taught with 106 characters. The characters are
selected so that nearly every character is represented in the sample data set.
Jagadeesh Kannan [17] worked on a Tamil character recognition system where
SVM and HMM algorithms are independently applied to characters, and the result
is fused to get more accuracy. After the algorithm fusion, the final call is given to the
neural network to get a perfect prediction of the input character. Even in handwritten
Tamil characters, the fusion of algorithms achieves 94% success.
Hamdan [18] had studied handwritten English characters and proposed a statistics-
based SVM for English handwritten character recognition. The study gives 88%
accuracy in identifying the characters using the proposed method. Different SVM
models are also compared to understand the pros and cons of each method.
Artificial intelligence’s intelligent image analysis field is both fascinating and
essential for solving many current open research problems.
Models that learn from pre-segmented handwritten digits is the focus of the
well-researched model of handwritten digit recognition. This problem is particu-
larly helpful for the field like data mining, machine learning, and pattern recognition
[19].
Shamin [20] had worked on handwritten English numerals. Classification is done
with the help of the WEKA tool. Classification processes have used multilayer
perceptron algorithm, Random Forest algorithm, Support Vector Machine algo-
rithm, Bayes Net algorithm, Naive Bayes algorithm, J48 algorithm, and Random
Tree algorithms. For handwritten numerals, a multilayer perceptron gives 90.33%
accuracy.
Bharvad [1] compares classification algorithms like SVM, k-NN, ANN, Naive
Bayes, and Frindge Distance Classifier. All of the aforementioned algorithms have
different research projects that focus on handwritten Gujarati numerical recognition.
Different implementations of classification algorithms are able to achieve an accuracy
of about 90%. Algorithms can be compared based on accuracy and time taken.
Desai [22] has used neural networks to identify numbers in the Gujrati script. To
categorise the numbers, a multilayer feed forward neural network is used. Despite
being low, the recognition rate was 82%.
For the past eight decades, optical character recognition has been in existence.
However, at first, big technological corporations were largely responsible for devel-
oping products that could recognise optical characters. Individual researchers are
102 S. Shukla and P. Tanna

now able to create algorithms and strategies that can more accurately recognise hand-
written manuscripts, thanks to advancements in machine learning and deep learning
[21].
Additionally noted was the increased use of Convolutional Neural Networks
(CNN) by researchers to recognise handwritten and machine-printed characters. This
is because recognition tasks where the input is an image are a good fit for CNN-based
systems. The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 216
is one example of an image problem for which CNN was first applied [21]. For
visual recognition tasks, some of the popular CNN-based designs are AlexNet [22],
GoogLeNet [23], and ResNet [24].

3 Classification Process

3.1 Data Gathering

Data gathering is a process to collect a sample of handwritten characters from a

variety of people to get different styles of each character. A form with 27 × 18 same
sized boxes is designed to collect the samples. Approximately 250 people have given
their input in the form, and 1000 samples for each character have been collected.
The presented study focuses on three classes of modifiers. So data was collected for
these classes. 3000 characters are available for the training and testing of the model,
which will be used to analyse the impact of modifiers on the overall performance of
the model.

3.2 Segmentation

The segmentation process is necessary because it separates pertinent data. It is used to

confirm boundaries and objects, including curves and lines. The purpose of this tech-
nique is to identify each unique character in the picture. The division of images into
smaller sections is a crucial step in image processing and is required for subsequent
processing [25].
Data gathering is performed using an input form, which has 486 individual charac-
ters. The form needs to be scanned using a good quality scanner, and then segmenta-
tion needs to be applied. All the characters will be separated using quantor detection
and pixel identification techniques. Each character will be moved to a labelled folder
for further processing.
8 Supervised Machine Learning for Recognition of Gujarati Handwritten … 103

3.3 Greyscale Conversion

Each character image segmented and stored in the labelled folder will be further
transformed to grayscale image. The image is generated using pixels of different
colours from an RGB combination. The image is heavy to load with RGB, but we
can convert it to grayscale, which will give us an image with less dimension, so it is
easy to use in the system.

3.4 Feature Extraction and Classification

Machine learning is the process of enabling machines to learn from data and predict
sum results based on the classification used. Machine learning allows computers
to learn automatically from data and generate predictions. One of the two main
categories of machine learning is supervised learning, which enables a model to
forecast future results after it has been trained using historical data. In supervised
learning, the model is trained using pairs of input and output or labelled data with the
aim of generating a function that is sufficiently approximated to be able to predict
outputs for given inputs when they are introduced [26].
Supervised machine learning learns the relationship between input and output.
If features are extracted from the input and compared with the training dataset,
then a prediction can be made. There are many algorithms using which we can
implement supervised machine learning, like Linear Regression, Support Vector
Machine (SVM), Decision Tree, Random Forest, K-Nearest Neighbour (kNN), Naive
Bayes, Gaussian Naive Bayes, and Logistic Regression.
For the presented research study, the dataset is ready, and it is converted to a
numpy array. The prepared numpy array can be used directly as input for model
building. The following steps are performed to prepare a model:
• Step 1: Convert the dataset into a numpy array of data and label.
• Step 2: Load data and target. [[Link]() helps load data from a numpy array.]
• Step 3: Data Preprocessing. [All the elements of the array are checked; if it finds
unlabeled data, it will be deleted from the array.]
• Step 4: Train and test split. [x_train, x_test, and y_train, y_test are generated. In
the study, 80% of the data is used as training, 10% for validation, and 10% for
testing.]
• Step 5: Reshape the arrays x_train and x_test. [This can be used to flatten an
array.]
• Step 6: Setting Hyperparameters for Hyperparameter Tuning. [Grid search can be
used to set the hyperparameters.]
• Step 7: Building a Model and Evaluation.
The process of determining the ideal collection of hyperparameters for a machine
learning model to achieve optimal performance is called hyperparameter tuning,
104 S. Shukla and P. Tanna

or hyperparameter optimisation. Hyperparameters are external specifications in

machine learning that are defined before the training process starts and are not learned
from the data. The aim of hyperparameter tuning is to find the collection of hyper-
parameter values that yields the greatest performance on a given task or dataset
by searching over the hyperparameter space. This procedure is crucial because the
choice of hyperparameters can have a significant impact on the model’s performance,
and various tasks or datasets may require different hyperparameter configurations.
Grid search is a simple and commonly used hyperparameter tuning technique. It
involves defining a grid of hyperparameter values and training the model for each
combination. The model’s performance is then evaluated using a validation set or
cross-validation, and the set of hyperparameters that yield the best performance is
selected.
This study represents the implementation of some of the supervised learning
algorithms for three classes for the classification of modifiers in Gujarati handwritten
character recognition.

3.5 Linear Regression

Linear regression is an essential supervised learning technique that can be used to

predict continuous result variables based on one or more predictor variables. This is
a popular and simple method, particularly in cases where the target variable and the
input features have a linear relationship. Python supports the sklearn library, which
gives methods to apply linear regression. The model for linear regression can be
implemented in the next step of the training model (fit()). After training, testing can
be done with predict() of sklearn. For our dataset, Linear Regression gives us 20%
accuracy, which is not appreciable.

3.6 Decision Tree

Decision Trees are a type of supervised learning algorithm. They can read labelled
dataset and learn about the features of input and connect them to the target labels.
[Link] has DecisionTreeClassifier() which can be used to implement a decision
tree classification algorithm. We are using max_depth parameter, min_sample_leaf
parameter, and min_sample_split parameters as hyperparameters and achieving an
accuracy of 64.7%. The report of classification is presented in Table 1.
8 Supervised Machine Learning for Recognition of Gujarati Handwritten … 105

Table 1 Report of classification for the decision tree classification algorithm

Labels Precision Recall F1-score Support
Label 0 0.76 0.67 0.71 165
Label 1 0.46 0.58 0.52 55
Label 2 0.62 0.64 0.63 120
Accuracy 0.65 340
Macro avg 0.61 0.63 0.62 340

Table 2 Report of classification for Naive Bayes classification algorithm

Labels Precision Recall F1-score Support
Label 0 0.75 0.53 0.62 165
Label 1 0.39 0.62 0.48 55
Label 2 0.51 0.57 0.54 120
Accuracy 0.56 340
Macro avg 0.55 0.58 0.55 340

3.7 Multinomial Naive Bayes

Naive Bayes is a probabilistic machine learning algorithm based on Bayes’ theorem.

It is extensively utilised for classification tasks, particularly in text classification and
natural language processing (NLP). Even with its “naive” assumption of feature inde-
pendence and simplicity, Naive Bayes is computationally economical and frequently
performs well. Word count features can be used to classify texts using Multino-
mial Naive Bayes. These algorithms give us an accuracy of 56%. The report of
classification is described in Table 2.

3.8 Gaussian Naive Bayes Algorithm

Gaussian Naive Bayes is a special version of the Naive Bayes algorithm, and it is
intended for data whose features are thought to have a Gaussian (normal) distribution.
It is particularly well-suited for classification tasks involving continuous features.
These algorithms give us an accuracy of 34%. The report of classification is described
in Table 3.
106 S. Shukla and P. Tanna

Table 3 Report of classification for Gaussian Naive Bayes classification algorithm

Labels Precision Recall F1-score Support
Label 0 0.91 0.24 0.38 165
Label 1 0.23 0.93 0.37 55
Label 2 0.32 0.20 0.25 120
Accuracy 0.34 340
Macro avg 0.49 0.46 0.33 340

Table 4 Report of classification for logistic regression classification algorithm

Labels Precision Recall F1-score Support
Label 0 0.71 0.73 0.72 165
Label 1 0.55 0.49 0.52 55
Label 2 0.56 0.57 0.57 120
Accuracy 0.64 340
Macro avg 0.61 0.60 0.60 340
Weighted avg 0.63 0.64 0.63 340

3.9 Logistic Regression Algorithm

Logistic Regression is a technique of supervised learning for binary and multiclass

classification. Despite its name, it is a classification algorithm, not a regression algo-
rithm. A binary outcome’s probability is modelled using logistic regression using
one or more predictor factors. It’s widely used in various fields due to its simplicity,
interpretability, and effectiveness. The best hyperparameters used are C and Penalty.
These algorithms give us an accuracy of 64%. The Report of classification is described
in Table 4.

3.10 KNN

K-Nearest Neighbours (KNN) is a supervised learning approach for classification

and regression applications. It is a non-parametric, lazy learning method that learns
its predictions without explicitly learning a model during training. For classification,
it uses the majority class, and for regression, it uses the average of the K nearest
neighbours’ values. The hyperparameters n_neighbours and p_weights are ideal.
These algorithms give us an accuracy of 60%. The report of classification is described
in Table 5.
8 Supervised Machine Learning for Recognition of Gujarati Handwritten … 107

Table 5 Report of classification for KNN classification algorithm

Labels Precision Recall F1-score Support
Label 0 0.85 0.62 0.72 165
Label 1 0.35 0.76 0.48 55
Label 2 0.62 0.51 0.56 120
Accuracy 0.60 340
Macro avg 0.60 0.63 0.58 340
Weighted avg 0.69 0.60 0.62 340

Table 6 Report of classification for the SVM classification algorithm

Labels Precision Recall F1-score Support
Label 0 0.79 0.79 0.79 165
Label 1 0.80 0.73 0.76 55
Label 2 0.69 0.71 0.70 120
Accuracy 0.75 340
Macro avg 0.76 0.74 0.75 340
Weighted avg 0.75 0.75 0.75 340

3.11 SVM Algorithm

Support vector machines is a method of supervised learning that can be applied to

regression and classification problems. SVMs can locate difficult decision boundaries
and are especially useful in high-dimensional spaces. Support vector machine finds
the hyperplane that divides the data into unique groups for the best. The aim of
SVM is to find the hyperplane that best divides the data points of various classes. A
decision boundary that maximises the margin between classes is called a hyperplane.
Determining the decision boundary in Support Vector Machines requires knowing
the margin, which is the distance between the hyperplane and the closest data point
from either class. SVM maximises the margin to seek to improve generalisation to
unobserved data. Support vectors are the data points that lie closest to the hyperplane
and contribute to defining the decision boundary. These are crucial in determining
the optimal hyperplane and the margin. These algorithms give us an accuracy of
75%. The report of classification is described in Table 6.

3.12 Random Forest Algorithm

Random Forest algorithm trains large number of decision trees and learns from
the results of it. The class with mean prediction and mode of the classes will be
returned. The stability and efficacy of random forest method make it one of the most
108 S. Shukla and P. Tanna

Table 7 Report of classification for the random forest classification algorithm

Labels Precision Recall F1-score Support
Label 0 0.82 0.84 0.83 165
Label 1 0.73 0.67 0.70 55
Label 2 0.72 0.72 0.72 120
Accuracy 0.77 340
Macro avg 0.76 0.74 0.75 340
Weighted avg 0.77 0.77 0.77 340

Table 8 Comparative study

Classification algorithm Accuracy in %
of algorithms
Linear regression 20
Gaussian Naive Bayes 36
Multinomial Naive Bayes 56
KNN 60
Logistic regression 64
Decision tree 64.7
SVM 75
Random forest 77

widely used and adaptable ones. We are using max_depth parameter, min_sample_
leaf parameter, and min_sample_split parameter as hyperparameters and achieving
an accuracy of 77%. The report of classification is presented in Table 7.
All the machine learning algorithms give different accuracy for our study. Table 8
represents the comparative table for the accuracy of all the supervised machine
learning algorithms.
The comparison shows that we can achieve maximum accuracy using a random
forest and support vector machine algorithms (see Fig. 2). It shows the comparison
of implemented classification algorithms.
8 Supervised Machine Learning for Recognition of Gujarati Handwritten … 109

Fig. 2 Comparative chart for the accuracy of all classification algorithms

4 Conclusion

Using eight different classification algorithms, we find that the Random Forest and
Support Vector Machine algorithms perform the best for our needs. Other algorithms
don’t deliver results that are satisfactory. Three classes of modifiers in the set of
handwritten Gujarati letters can be identified with 75 and 77% accuracy using SVM
and Random Forest, respectively.
This study focuses on three Gujarati character classes. In Gujarati, there are a
total of 476 classes that combine consonants with modifiers. It took three to four
hours to train and assess the model in the experiment that was given. The accuracy
will decrease, and the training duration of the model will increase with the number
of classes. This dataset is likewise constrained. It is possible to increase the dataset
by acquiring more data from individuals. We can employ a deep learning algorithm
with a larger dataset and a large number of classes for classification.

References

1. Bharvad J, Garg D, Ribadiya S (2021) A roadmap on handwritten Gujarati digit recognition

using machine learning. In: 2021 6th international conference for convergence in technology
(I2CT) Pune, India
2. ClaudiuCiresan D, Meier U, Gambardella LM, Schmidhuber J (2011) Convolutional neural
network committees for handwritten character classification. In: 2011 international conference
on document analysis and recognition. IEEE
3. Mori M, Uchida S, Sakano H (2014) Global feature for online character recognition. Pattern
Recogn Lett 35:142–148
110 S. Shukla and P. Tanna

4. Giménez A, Khoury I, Andrés-Ferrer J, Juan A (2014) Handwriting word recognition using

windowed Bernoulli HMMs. Pattern Recogn Lett 35:149–156
5. Tappert CC, Suen CY, Wakahara T (1990) The state of the art in online handwriting recognition.
IEEE Trans Pattern Anal Mach Intell 12:787–808
6. Kumar M, Jindal SR, Jindal MK, Lehal GS (2018) Improved recognition results of medieval
handwritten Gurmukhi manuscripts using boosting and bagging methodologies. Neural Process
Lett 50:43–56
7. Radwan MA, Khalil MI, Abbas HM (2018) Neural networks pipeline for offline machine
printed Arabic OCR. Neural Process Lett 48(2):769–787
8. Ashley KD, Bridewell W (2010) Emerging AI law approaches to automating analysis and
retrieval of electronically stored information in discovery proceedings. Artif Intell Law
18(4):311–320
9. Thompson P, Batista-Navarro RT, Kontonatsios G, Carter J, Toon E, McNaught J et al (2016)
Text mining the history of medicine. PLoS ONE 11(1):1–33
10. Zanibbi R, Blostein D (2012) Recognition and retrieval of mathematical expressions. Int J
Document Anal Recogn 15(4):331–357
11. Kajale R, Das S, Medhekar P (2017) Supervised machine learning in intelligent character recog-
nition of handwritten and printed nameplate. In: 2017 international conference on advances in
computing, communication and control (ICAC3), Mumbai, India, pp 1–5. [Link]
1109/ICAC3.2017.8318748
12. Shirzadi E (2016) Machine-written character recognition using a supervised machine learning
approach. J Elec Commun Eng Resol 1(1):6–10
13. Gersho A, Gray RM (2012) Vector quantization and signal compression. Springer Science &
Business Media 159
14. Mahajan J, Mahajan R (2014) Designing an intelligent system for optical handwritten character
recognition using ANN. Int J Comput Appl 91(13)
15. Rizvi SSR et al (2019) Optical character recognition system for Nastalique Urdu-like script
languages using supervised learning. Int J Pattern Recogn Artif Intell. [Link]
S0218001419530045
16. Shanthi N, Duraiswamy K (2007) Performance comparison of different image sizes for recog-
nizing unconstrained handwritten Tamil characters using SVM. Department of Information
Technology, Department of CSE, [Link] College of Technology, Tiruchengode, India
17. Jagadeesh Kannan R, Prabhakar R (2009) A comparative study of optical character recognition
for Tamil script. Eur J Sci Res 35(4):570–582. ISSN 1450-216X
18. Hamdan YB, Sathish A (2021) Construction of statistical SVM based recognition model for
handwritten character recognition. J Inform Technol Digit World 03(02):92–107. [Link]
[Link]/itdw/[Link]
19. Watada J, Pedrycz W (2008) A fuzzy regression approach to acquisition of linguistic rules. In:
Handbook of granular computing, pp 719–732
20. Shamim SM, Miah MBA, Sarker A, Rana M, Jobair AA (2018) Handwritten digit recognition
using machine learning algorithms. Glob J Comput Sci Technol D Neural Artif Intell 18(1)
Version 1.0 Year 2018, Online ISSN: 0975-4172 & Print ISSN: 0975-4350
21. Sharma AK, Adhyaru DM, Zaveri TH (2020) A survey on Devanagari character recognition.
In: Somani AK, Shekhawat RS, Mundra A, Srivastava S, Verma VK (eds) Smart systems and
IoT: innovations in computing. Smart innovation, systems and technologies, vol 141. Springer,
Singapore. [Link]
22. Desai AA (2010) Gujarati handwritten numeral optical character reorganization through neural
network. Pattern Recogn 43(7):2582–2589
23. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolu-
tional neural networks. In: Proceedings of 25th international conference on neural information
processing system (NIPS), vol 1, pp 1097–1105. [Link]
2999257
24. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D et al (2015) Going deeper with
convolutions. In: Proceedings of IEEE conference on computer vision pattern recognition
(CVPR), pp 1–9
8 Supervised Machine Learning for Recognition of Gujarati Handwritten … 111

25. Suthar SB, Thakkar AR (2023) Dataset generation for Gujarati language using handwritten
character images. PREPRINT (Version 1) available at Research Square. [Link]
21203/[Link]-3041349/v1
26. Sen PC, Hajra M, Ghosh M (2020) Supervised classification algorithms in machine learning:
a survey and review. In: Mandal J, Bhattacharya D (eds) Emerging technology in modelling
and graphics. Advances in intelligent systems and computing, vol 937. Springer, Singapore.
[Link]
Chapter 9
Gujarati Handwritten Conjunct
Consonant Recognition Using Deep
Learning

Rachana Chaudhari and Purna Tanna

1 Introduction

The Gujarati script is derived from the Devanagari script. Over 50 million individuals,
primarily Gujarati people from Gujarat, India, and around the world, use Gujarati
vocabulary. Today, the majority of document management systems in government
buildings are primarily text-based. There is a vast quantity of publications and books
available in printed or scanned formats. You need a solution that can accurately iden-
tify individuals from scanned paper documents. An optical character’s recognition
system is a computer system that can recognize character types from an image or
document and process them automatically.
Researchers are focusing on developing several OCR models for different
languages. Utilizing Artificial Neural Networks for pattern and image recognition
can enable the creation of a self-learning model that can identify raw scanned images
based on existing models.
Handwritten words encompass a range of writing styles, dimensions, and curves,
making interpretation challenging. Performing OCR in Gujarati language is chal-
lenging due to the sensitivity to little variations in writing style that can lead to
inaccurate character identification.

R. Chaudhari (B) · P. Tanna

Faculty of Computer Applications and Information Technology, GLS University, Ahmedabad,
India
e-mail: rachana.xica07@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 113
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
114 R. Chaudhari and P. Tanna

2 Literature Review

There is a significant body of literature on recognizing Handwritten English,

Japanese, Chinese, and Arabic characters, but there is relatively less research on
recognizing Indian scripts [1–3]. Indian scripts exhibit significant variations in
consonants, vowels, script representation, and conjunctional appearance. Devel-
oping an accurate handwritten character recognizer is a significant issue due to these
discrepancies.
Patel and Desai [4] utilized a hybrid approach combining a tree classifier and k-
Nearest Neighbors (k-NN) to recognize handwritten Gujarati characters. Structural
and statistical elements are combined to classify and identify characters. Deriving
the features is quite straightforward. Structural features are chosen by the analysis
of the visual characteristics of different handwritten scripts. Structural and statistical
elements are combined to classify and identify characters. The structural traits were
chosen based on an analysis of the visual characteristics of different handwritten
characters in the Gujarati script. This strategy yielded a success rate of 63%.
Chandarana and Kapadia [3] the results indicate that data preparation before
inputting it into k-NN resulted in the highest recognition rate. Integrating Neural
with established recognition methods has produced optimal outcomes and the
highest recognition accuracy. The comparison is conducted by analyzing each pixel
individually.
Research on Gujarati character recognition is scarce. This section will provide
a concise overview of the previous studies. A technique for recognizing Gujarati
characters by implementing Back-propagation on a neural network was introduced
in reference [5]. The proposed technique achieved an accuracy of 98.5% by testing
280 samples and training 925 samples.
The authors utilized the k-NN approach to recognize English and Gujarati charac-
ters in [6], achieving an accuracy of 99.23% with 3500 samples. The k-NN classifi-
cation approach was introduced by authors in [7] to identify Gujarati Characters with
an accuracy of 88%, although it did not support symbol classification. 7960 Gujarati
characters were identified with 86.6% accuracy using k-NN and SVM in the seventh
iteration. Convolutional Neural Network (CNN) as a feature extractor and classifier,
achieving 90.10% accuracy in recognizing Gujarati letters. The authors utilized 4500
data for the model.
Various researchers are focusing on the recognition of printed and handwritten
numbers, which is discussed here along with the method they used [1, 8–10].
Euclidean Minimum Distance was used for classification, and regular and invariant
moments were employed in the k-Nearest Neighbor classifiers to achieve a 67%
recognition rate [8]. In order to achieve classification accuracy of 82.36%, they
describe a strategy that combines the Self Organizing Map (SOM) with the k-Nearest
Neighbor Classifier (k-NN) [9]. These samples, in the three distinct sizes, have been
used to train and test the neural network. The outcomes are assessed for several
image sizes, including 7 × 5, 14 × 10, and 16 × 16. For varying image sizes, the
overall recognition rates are 87.29, 88.52, and 88.76% [10]. They deal with the issue
9 Gujarati Handwritten Conjunct Consonant Recognition Using Deep … 115

of Gujarati language handwritten numeral recognition. There are three classifiers

used: Support Vector Machine, Back Propagation Neural Network, and K-Nearest
Neighbor. Using modified chain code, DFT, and DCT, respectively, the greatest
recognition rates obtained for the complete data set of 3000 digits are 85.67, 93.60,
and 93.00% [1].
About 199 writers provided 40 handwritten alphabets for this project. Here,
86.66% performance accuracy is achieved using a support vector machine (SVM) for
classification. Additionally, k-NN is employed for classification, and the outcomes
are contrasted with SVM results [11]. 35 consonants and 5 vowels out of a total of 39
Gujarati characters were taken into consideration by the researcher. Utilizing tenfold
cross validation, the resulting accuracies of the Support Vector Machines that were
created utilizing these extracted feature vectors were examined. SVM can handle a
high number of classes and performs well on data sets with multiple attributes [2].
Gohel et al. [12], in this paper, presented a low-level stroke feature-based technique
for online handwritten Gujarati character and numeric recognition This is the first
handwritten character database for Gujarati script available online. Using a k-NN
classifier with k-fold cross validation, recognition is carried out on a dataset including
4500 samples from 45 distinct classes (consisting of 37 characters and 8 numerals).
For the datasets containing characters, numbers, and a combination of both, the
overall recognition rates are 95, 93 and 90%, respectively.
Dutt and Amin [13] demonstrated the use of artificial neural networks for hand-
written Gujarati letter recognition and probabilistic neural networks for error correc-
tion. The Radial Basis Probabilistic Neural Network (RBPNN) approach is used
to solve character recognition problems. To recognize handwritten characters, they
demonstrated various feature extraction and classification methods.
Joshi and Risodkar [14] demonstrated the use of deep learning for the recognition
of Gujarati characters. With the suggested model, they obtain an accuracy of about
78.6%. In their review, they state that NNC is helpful for k-NN classification and
that feature extraction algorithms like filtering, edge detection, and morphological
transformation are useful.
Patel and Desai [4] offered a hybrid method for handwritten Gujarati character
identification that relies on k-Nearest Neighbor (k-NN) and tree classifiers. With the
suggested strategy, they obtained a satisfactory 63% accuracy rate—this is one of
the few attempts to recognize the entire set of Gujarati handwritten characters.
Dibyasundar Das and colleagues created a multi-objective Jaya Convolutional
neural Network (MJCN) for handwritten Optical Character Recognition (OCR). The
convolution layer analyzes important patterns in an image through a local neighbor-
hood connection, while the multiplication layer refines the convolutional response
into a condensed feature space. The MJO algorithm was used for maximizing the
initial weight number in the network. The primary goals of the MJO algorithm were to
reduce intra-class variance and increase inter-class distance. Conventional classifiers
were employed to identify the characters from various datasets [15].
116 R. Chaudhari and P. Tanna

Fig. 1 Deep learning structure

3 Proposed Method

3.1 Deep Learning

Deep learning techniques rely on Artificial Neural Networks (ANN) that mimic the
brain’s information processing to offer a self-learning representation feature. Deep
learning, similar to a trained robot with self-learning, utilizes several algorithms to
construct models. Deep learning models can autonomously identify and prioritize
relevant features with minimal programmer intervention. During the training process,
the algorithm uses unfamiliar data as input to categorize objects, perform feature
extraction, and provide meaningful information. Deep learning models provide both
feature extraction and classification as shown in Fig. 1.
When dealing with a high volume of inputs as well as outputs, Deep Learning
can be utilized. It is used to imitate human behavior. Deep Learning is executed by
Neural Networks, which are composed of Neurons. Deep learning models consist of
several algorithms, and it is essential to choose the best suitable one for a specific
task.

3.2 CNN (Convolution Neural Network)

CNN is a deep learning algorithm specifically created for processing data with a grid-
like layout, like photographs. It is created using the visual brain structure of animals
to gather spatial data hierarchies that include basic and complex patterns. CNN is a
mathematical model consisting of three main types of learning layers: convolutional
neural networks, layer pooling, and fully connected layers as shown in Fig. 2. The
initial two layers, convolution and pooling, identify features, whereas the third layer,
a layer that is completely interconnected, transforms these characteristics into the
ultimate output, like classification. A convolution layer is a fundamental component
of CNNs that consists of a series of mathematical operations, including convolution,
which is a specific sort of linear operation.
The CNN structure uses Forward propagation as the process of transforming input
data into output across layers, while backward propagation is the reverse process.
9 Gujarati Handwritten Conjunct Consonant Recognition Using Deep … 117

Fig. 2 Convolution Neural Network (CNN)

3.2.1 Convolution Layer

A convolution layer is an essential element in the CNN structure that performs

feature extraction by moving forward through a training dataset. The changeable
parameters, such as kernels and weights, are updated based on the reduction value
by back-propagation with gradient descent optimization.

3.2.2 Pooling Layer

A pooling layer reduces the size of feature maps, creating translation invariance to
small shifts and distortions, and reducing the number of trainable parameters in subse-
quent layers. The layers of pooling do not possess any trainable parameters. The filter
size, stride, and padding are hyper parameters used in pooling procedures, similar to
convolution processes. Max pooling is a frequently utilized pooling technique that
retains just the maximum value inside the specified kernel size.
Max pooling: Max pooling is the most commonly used pooling procedure. It
involves selecting patches from the provided feature maps, identifying the highest
value for each patch, and discarding the rest.

3.2.3 Fully Connected Layer

The final convolutions or pooling layer’s output feature maps are usually flattened into
a one-dimensional array of numbers and then linked to a number of fully connected
layers, also called dense layers, where each input is linked to every output through a
trainable weight.
118 R. Chaudhari and P. Tanna

3.3 Proposed Model

The suggested model, which is depicted in Fig. 3, explains how the data is processed
and which technique is applied for recognition using the stages that are listed below.
Steps:

(1) In this paper, we use a custom database in which we create more than 960 classes
and more than 15 k images in the lab.
(2) After load data use normalization and pre-processing to remove noise and filter
images.
(3) Divide data in Training, Testing, and validation to apply data for CNN-based
classification approach.
(4) Data enhancement is a method that creates additional training information by
applying different modifications to the images already present in a dataset. Use
ImageDataGenerator for the image Augmentation approach which uses scaling
and reshaping images as required.
(5) CNN-based EfficientNetB3 architecture used for classification and calculated
accuracy.

3.4 CNN-Based EfficientNetB3 Architecture

EfficientNet was initially introduced as an advanced technique for scaling convolu-

tional neural network models. Basic compound factors are utilized to provide effi-
cient outcomes. EfficientNet scales each dimension consistently, unlike conventional
approaches that increase a network’s breadth, depth, and resolution [16]. Scaling
each size of the models improves performance, but balancing each network charac-
teristic around resource availability considerably enhances performance. The authors
enhanced the image resolution, depth coefficient, and width progressively based on
the formulae.
Depth is represented by the equation d = αΦ , as equation (1).
Width is represented by the equation w = βΦ , as equation (2).
Resolution is represented by the equation r = γΦ , as equation (3).
such that.
α · β2 · γ 2 ≈ 2
α ≥ 1, β ≥ 1, γ ≥ 1.
The product of α, β2 , and γ2 is approximately 2, where α is greater than or equal
to 1, β is greater than or equal to 1, and γ is greater than or equal to 1.
The coefficient Φ controls the availability of new resources for expanding the
model. An initial search using grids α, β, and γ can help identify these.
The EfficientNetB3 model contains a total of 7 blocks with five modules and in
each block according to requirement different convolutional layers, Batch normal-
ization layer, and activation can be updated. Here Fig. 4 and Table 1 show the model
9 Gujarati Handwritten Conjunct Consonant Recognition Using Deep … 119

Fig. 3 Proposed model

120 R. Chaudhari and P. Tanna

Fig. 4 EfficientNetB3 model

Table 1 Parameters created

Parameters
and implemented in model
EfficientNetB3 40 × 40 (43,941,130 parameter)
Dropout 0.45
Activation SoftMax
Compile rate 0.01
Threshold 0.8
epoch 5

and its parameters created and implemented in the proposed model and use of other
parameters also which affect performance of the system final outcome.

3.5 Experimental Result

Figure 5 shows the model result for epoch run based on given value. Figure 5 illus-
trates the number of epochs, accuracy achieved after running, as well as graphs
depicting time, loss, and accuracy depending on epochs. Here, according to the
result, the model stops running at epoch value 5 so there is no chance for over-fitting
as used early stop functionality. The training and validation Accuracy & Loss graph
is shown in Fig. 6.
We achieved superior performance by implementing the proposed model using
CNN with Efficient B3, resulting in high precision and recall in classifying all tested
input samples. Figures 4, 5, 6 and 7 display the parameters from Table 2 used to
execute the model, leading to improved outcomes. The figures illustrate the number
of epochs, accuracy achieved after running, as well as graphs depicting time, loss, and
accuracy depending on epochs. The data displayed in Table 2 shows classification
reports of some of the conjunct consonants from 960 classes with precision, recall,
f1 score, and support received from the proposed model.
9 Gujarati Handwritten Conjunct Consonant Recognition Using Deep … 121

Table 2 Classification report of conjunct consonants

122 R. Chaudhari and P. Tanna

Fig. 5 Epoch run based on model

Fig. 6 Training and validation accuracy and loss

9 Gujarati Handwritten Conjunct Consonant Recognition Using Deep … 123

Fig. 7 Error by class

4 Conclusion

This research showcases the application of a deep learning model for identifying
handwritten Gujarati joint characters. The main objective was to develop algorithms
for identifying joint characters in handwritten Gujarati text and then evaluating their
results. Thorough research is conducted on identifying handwritten joint characters.
The models for implementation have been chosen accordingly. To achieve the goals,
a dataset consisting of 80,000 photos was developed first. The dataset was divided
into training, testing, and validation sets. The photographs’ dimensions were altered
according to the requirements of the algorithms. Utilize the CNN-based Efficient-
NetB3 model to test and validate handwritten Gujarati joint characters, achieving an
accuracy of over 83.88%.
124 R. Chaudhari and P. Tanna

References

1. Vyas AN, Goswami MM (2015) Classification of handwritten Gujarati numerals. In: Interna-
tional conference on advances in computing, communications and informatics (ICACCI), pp
1231–1237
2. Macwan SJ, Vyas AN (2015) Classification of offline Gujarati handwritten characters. In: Inter-
national conference on advances in computing, communications and informatics (ICACCI),
pp 1535–1541
3. Chandarana J, Kapadia M (2013) A review of optical character recognition. IJERT. ISSN:
2278-0181
4. Patel C, Desai A (2013) Gujarati handwritten character recognition using hybrid method based
on binary tree-classifier and k-nearest neighbour. IJERT. ISSN(O): 2278-0181
5. Chaudhary M, Shikkenawis G, Mitra SK, Goswami MM (2012) Similar looking Gujarati
printed character recognition using locality preserving projection and artificial neural networks.
In: Third international conference on emerging applications of information technology, pp
153–156
6. Chaudhari SA, Gulati RM (2013) An OCR for separation and identification of mixed English—
Gujarati digits using kNN classifier. In: International conference on intelligent systems and
signal processing (ISSP), pp 190–193
7. Mendapara MB, Goswami MM (2014) Stroke identification in Gujarati text using direc-
tional features. In: International conference on green computing communication and electrical
engineering (ICGCCEE), pp 1–5
8. Antani S, Agnihotri L (1999) Gujarati character recognition. In: Proceedings of the fifth
international conference on document analysis and recognition (ICDAR 99), pp 418–421
9. Goswami MM, Prajapati HB, Dabhi VK (2011) Classification of printed Gujarati charac-
ters using SOM based k-Nearest Neighbor Classifier. In: International conference on image
information processing, pp 1–5
10. Vasant AR, Vasant S, Kulkarni GR (2012) Performance evaluation of different image sizes for
recognizing offline handwritten Gujarati digits using neural network approach. In: International
conference on communication systems and network technologies, pp 270–273
11. Desai A (2015) Support vector machine for identification of handwritten Gujarati alphabets
using hybrid feature space. In: CSI transactions on ICT, pp 235–241
12. Gohel C, Goswami M, Prajapati V(2015) On-line handwritten Gujarati character recognition
using low level stroke. In: Third international conference on image information processing
(ICIIP), pp 130–134
13. Dutt SS, Amin JD (2016) Handwritten Gujarati text recognition using artificial neural network
and error correction using probabilistic neural network in recognized text. IJARIIE. ISSN(O):
2395-4396
14. Joshi DS, Risodkar YR (2018) Deep learning based Gujarati handwritten characters. In:
International conference on advances in communication and computing technology (ICACCT)
15. Das D, Nayak DR, Dash R, Majhi B (2020) MJCN: multi-objective Jaya convolutional network
for handwritten optical character recognition. In: Multimedia tools and applications. https://
[Link]/10.1007/s11042-020-09457-6
16. Link [Link]
els-5fd5b736142. Complete architectural details of all EfficientNet models
Chapter 10
Multimodal Deep Learning for Enhanced
Prediction of Molecular Binding
Affinities Integrating Chemical
Structures and Protein Sequences

L. Prasika, G. R. Karish Prajaishma, and M. Yoga Vardhani

1 Introduction

Deep learning was utilized in the Improved Molecular Binding Affinity technique to
evaluate the binding strength between protein sequences and drug structures. This
approach took into account protein sequences represented as amino acid sequences
and drug structures expressed as SMILES. The term “drug structure” describes the
molecular makeup and atomic arrangement of a pharmaceutical molecule.
This data is critical for the design, optimization, and prediction of drug action in
human tissue, including bioavailability, efficacy, and potential side effects. SMILES,
or the Simplified Molecular Input Line Entry System, is a human-readable notation
system that uses ASCII letters to describe chemical structures. A number of characters
were used to represent bonds; these characters represented ring closures, branches,
and single, double, or triple bonds. The atomic symbols were used to identify the
elements. SMILES may have applications in cheminformatics, bioinformatics, and
chemistry. In the SMILES analysis of imatinib in the DAVIS dataset, character-level
tokenization is applied to the provided SMILES sequence using the SMILES notation
CN1CCN(CC1)C2…. This can only be accomplished by breaking the sequence up
into distinct tokens, which yield tokens such as “C,” “N,” “1,” “C,” “C,” “N,” and

L. Prasika · G. R. Karish Prajaishma (B) · M. Yoga Vardhani

Department of Artificial Intelligence and Data Science, Mepco Schlenk Engineering College,
Sivakasi, Tamil Nadu, India
e-mail: karishprajaishma@[Link]
L. Prasika
e-mail: lprasika@[Link]
M. Yoga Vardhani
e-mail: yogavardhani@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 125
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
126 L. Prasika et al.

Fig. 1 AAK1 protein

so forth. After this tokenization process, each character is represented as a number

via a numerical conversion. Figure 2 shows the three-dimensional visual structure
of Imatinib. One-hot encoding, which assigns a distinct numerical value to each
character in the translation, was used. The number ‘C’ might be mapped to three, ‘1’
to one, ‘=’ to ten, and so on.
This process creates an organized numerical representation of the original
SMILES sequence, which facilitates further computer analysis and modeling of the
molecular data encoded by the SMILES notation. The DAVIS dataset tokenizes the
AP2-associated kinase 1 (AAK1) protein sequence at the character level. Tokeniza-
tion is performed inside the protein sequence of AP2-associated kinase 1 (AAK1),
for instance, which is MGSSH… The sequence has to be broken up into distinct
tokens in order to achieve this. Tokens like “M,” “G,” “S,” “S,” “H,” “H,” and so forth
are so formed. Next, a mapping from these tokens to integers is used to statistically
and systematically depict the protein sequence. This translation was done using one-
hot encoding, where each character corresponds to a different numerical value. For
instance, in numbers, “M” may be mapped to 1, “G” to 2, “S” to 3, and so on. The
molecular information included in the protein sequence may be more easily modeled
and subjected to computer analysis thanks to this process. The three-dimensional
visual structure of the AAK1 protein is seen in Fig. 1

2 Methods and Models

A. Dataset
The two datasets used to evaluate the performance of the deep learning framework:
Davis and KIBA. These datasets provided a collection of drug-target pairs with
experimentally or computationally determined binding affinities.
10 Multimodal Deep Learning for Enhanced Prediction of Molecular … 127

Fig. 2 Imatinib drug

The first dataset was DAVIS which shows the interaction between 68 kinase
inhibitors and 442 kinases, covering over 80% of the human catalytic protein kinase.
It consisted of 30,056 Drug Target Interaction pairs as shown in Table 1. The binding
affinity between kinase inhibitors and protein kinases was to be predicted. The dataset
includes information about the target amino acid sequence and compound SMILES
string. This dataset was a valuable resource for developing models in drug discovery,
particularly for understanding and predicting kinase inhibitor interactions.
The second dataset is the KiBA (Kinase inhibitor Bio-Activity) dataset. Kinase
is a biocatalyst enzyme which transfers the phosphate group from ATP to a specific
molecule. The dataset consisted of 118,257 drug-target interaction pairs involving
2,111 drugs and 229 proteins. The goal was to predict the binding affinity between
drugs and target proteins based on their amino acid sequence or compound SMILES
string. The dataset contained a lot of information about how kinase inhibitors
(compounds that influence proteins) interact with proteins. The dataset helped models
to predict how well certain drugs might work in the process of discovering new
medicines.
B. Concatenation and Fusion
Various features obtained from SMILES and protein sequence processing were
successfully merged using the concatenation approach. Accurate representation of
the interactions between chemical and biological components was achieved by this
methodical integration. Molecular structures and protein compositions interact intri-
cately, and the model could comprehend these connections by creating a feature
map. This combination allowed the model to function with both kinds of data, which

Table 1 Datasets
Dataset Drug Protein (Target) Interaction
DAVIS 68 442 30,056
KiBA 2111 229 118,254
128 L. Prasika et al.

improved results and increased the accuracy of binding affinity predictions—a crit-
ical component of the deep learning-based Improved Molecular Binding Affinity
technique.
Sequences and chemical structures in the Simplified Molecular Input Line Entry
System (SMILES) underwent necessary processing processes in the early stages of
SMILES processing. The SMILES strings were divided into separate tokens using a
character-level tokenizer. A structured numerical representation of the chemical data
was then produced by numerically converting these tokens. The SMILES model’s
capacity to identify minute patterns within the chemical structure was improved
by the addition of embedding layers, which encode associations between various
characters.
Similar processing was done on protein sequences in parallel. Using a character-
level tokenizer, they were separated into distinct tokens. To create an ordered
numerical representation of the chemical data, these tokens were then numerically
converted. To enhance the model’s capacity to identify minute patterns within the
molecular structure, embedding layers were used to encode the connections between
different characters in the protein sequences.
C. Neural Network Architecture
Using a deep learning framework, the design started with two input layers, each
representing a distinct kind of data, in order to increase the molecular binding
affinity method. Using a character-level tokenizer, the initial stage in the SMILES
sequences processing branch was tokenization and numerical conversion. The
SMILES strings were divided into individual characters by this procedure, and
those characters were subsequently numerically encoded to provide a structured
representation. The correlations between various characteristics were then recorded
using an embedding layer, which produced a numerical depiction of the chemical
data. Character-level tokenization created tokens like ‘C’, ‘1’, ‘=’, ‘C’, and so on
in the SMILES sequence “C1=CC2=C(C=C1C3=NC(=NC=C3)N)NN=C2N,” for
example. A predefined mapping was then applied to each character to transform it
to a numerical representation.
Tokenization was also performed at the character level on the protein sequence
MARTTSQLYDAVP….. Using a predefined mapping, this sequence was split up into
discrete tokens such as “M,” “A,” “R,” and so on. The tokens were then numerically
encoded. Conv1D layers processed the protein sequences and inserted SMILES in
numerous ways. They contributed to the prediction outcome by preserving translation
invariance for pattern recognition, identifying local patterns in sequential data, and
acting as filters to extract advanced features. Conv1D layers also improved training
efficiency by reducing dimensionality through pooling operations.
In conjunction with BiLSTM layers, they yielded a thorough comprehension of
input sequences. Processing SMILES and protein sequences required the use of bidi-
rectional long short-term memory networks or BiLSTM. Long-range relationships
and affiliations in sequential data were captured by BiLSTM, which also successfully
represented patterns and contextual dependencies within the sequences, enabling
10 Multimodal Deep Learning for Enhanced Prediction of Molecular … 129

more accurate predictions of molecular binding affinities by integrating both local

and global sequence feature.
D. Proposed Model Architecture
The architecture processed SMILES-encoded drug structures and protein sequences
by first tokenizing the input into individual characters. The tokenized data was then
embedded to capture semantic relationships. It consisted of Conv1D layers for local
pattern detection, a BiLSTM layer for sequential modeling, another Conv1D layer
for feature extraction, and MaxPooling for dimensionality reduction. Concatenated
features were passed to a Fully Connected layer for the final drug-protein interaction
prediction.
In Fig. 3, Drug structure (SMILES) and Protein sequence are given as the input,
input data was tokenized like (C=1, H=2, .., etc.) then the data was embedded, after
preprocessing of the data the processed data was passed to the model, in the model
the data undergoes conv1D, conv1D, BiLSTM, conv1D, and then by MaxPooling,
MaxPooling was done to get the optimum max value, and then both the values are
concatenated, finally FC was used in order to get the optimum result.

Fig. 3 System design

130 L. Prasika et al.

3 Results and Discussion

A. Evaluation Criteria

Reliability and accuracy of a model that predicts affinity relations for molecular
interaction are critical to computational molecular biology and drug development.
This section describes the assessment criteria used to assess the model’s execution. To
increase computing efficiency and quicken the assessment process, GPU acceleration
is used. In order to quantify accuracy, mean absolute error, or MAE, calculates the
average absolute difference between actual and projected affinity connections.
A lower MAE indicates that forecasts and ground reality are more closely aligned.
As a metric that can be understood in the same units as the predicted variable, Root
Mean Squared Error (RMSE), which is derived from MSE, is presented. Increased
accuracy and dependability are implied by a lower RMSE.
The percentage of binding affinity variance that the model explains is measured
by the R-squared (R2 ) score. Greater R2 denotes a better fit, indicating that the model
can adequately describe and capture the variation in molecular affiliation. By quanti-
fying the squared average difference and highlighting the impact of greater mistakes,
Mean Squared Error (MSE) offers a thorough understanding of overall efficiency.
Better accuracy and dependability are indicated by a lower MSE. The Concordance
Index (CI), a popular statistic used exclusively for drug-target interaction prediction,
evaluates how well the model can accurately rank possible drug-target pairings. This
section assesses the dependability of CI values in predicting drug-target affinity using
comparisons between two widely used datasets (DAVIS and KIBA).
Through the use of these several assessment criteria, we are able to obtain a
solid grasp of the model’s dependability for drug discovery and development by
predicting binding affinities with accuracy, precision, and overall efficacy. We use
assessment measures in our study to see how well our prediction models performed in
the regression. We employ the commonly used Mean Squared Error (MSE), a metric
that expresses how different actual values are from expected values, for regression.

1
a
2

MSE = (Ec − Ec ) (1)

a i=1

The MSE as in (1) is calculated as the average of the squared differences across

all samples, where each sample is denoted by Ec as the actual value and Ec as the
predicted value. A smaller MSE signifies heightened accuracy in our regression
predictions.

CI = B(bs − bt ) (2)
∂s >∂t

W (u) = {1, if u > 00.5, if u = 00, ifu < 0} (3)

10 Multimodal Deep Learning for Enhanced Prediction of Molecular … 131

Table 2 Parameter settings

Hyperparameters Value
for deep learning module
Filters units [Link]
Filter len of SMILE [8,4]
Filter len of amino acid [12,8]
Epoch 100
BiLSTM units 96
Dense Layers 1024;1024;512
Batch size 256
Optimizer Adam
Learning rate 0.001
Dropout rate 0.1

By determining if the anticipated affinity scores match the actual values in the
same order, the confidence interval (CI) is calculated as in (2). Specifically, the
formula incorporates ∂s as the predicted affinity score order, bs as the predicted value
associated with a larger affinity score ∂s , bt as the value with a smaller affinity score
∂t , and W (u) as a piecewise function defining thresholds for alignment as in (3).
These assessment criteria function as strong predictors of our models’ predic-
tive success, offering insightful information on the precision, accuracy, and ranking
stability of our methodology in the complex field of drug-target interaction predic-
tions. With a CI of 0.832 and an MSE of 0.155, this model predicts a rather excellent
concordance and decent accuracy in binding affinity prediction.
B. Settings of Hyperparameters
Optimizing the performance of the model is an essential part of enhancing its oper-
ation, requiring accurate adjustments to its configurations. The choices chosen for
these configurations in Table 2 have a substantial impact on the nature of the learning
procedure and, as a result, our neural network’s predictive power.
For all the adaptation and validation stages, a batch size of 256 was selected.
This configuration guaranteed representative samples for model updates while opti-
mizing memory use and processing speed. A total of 100 epochs are used to train the
model. This choice was based on a careful balance between preventing overfitting
and achieving an acceptable adjustment. A regularization technique called dropout
was applied at a rate of 0.1. By randomly omitting a portion of nodes throughout the
learning process, this helps to mitigate overfitting.
128 dimensions were set up for the embedding layer for biological sequences and
SMILES. This decision achieves a subtle balance between computing efficiency and
depth of representation. Convolutional layers were utilized in order to extract local
patterns from the input sequences. For the alternative biological routes and SMILES,
three convolutional layers with filter widths of 4, 6, and 8 were used, respectively.
1024, 1024, and 512 dimensions make up the completely linked dense layers. The
132 L. Prasika et al.

concatenated representations were used to create complex hierarchical features by

these layers.
With a learning rate of 0.001, the model makes use of the Adam optimizer. Adam
made it useful for non-stationary purposes such as neural network adaptation by
dynamically adjusting the learning rates for each parameter separately.
The Early Stopping Criteria was introduced in order to identify the ideal number
of training epochs and to avoid overfitting. The early stopping method detects when
improvements stop occurring, stops training, and monitors the validity loss. To
provide a strong stopping criterion, the minimum delta was set at 0.001 and the
patience period was set at 10 epochs.
These configuration hyperparameter settings and meticulous refinements yield a
robust and well-generalized model for predicting drug-protein binding affinities.
C. Verify Effectiveness of Structured Information
An approach for estimating the potency of drug-target amino acid attachments is
the Deep Learning-Based Improved Molecular Binding Affinity methodology. It
takes into account the target amino acid sequences as well as the drug’s sequence
and structure. The model performs exceptionally well in properly classifying the
interactions, as seen by Table III’s Concordance Index (CI) of 0.832, demonstrating
its efficacy in deciphering the complex relationships between SMILES and protein
sequences. Moreover, the model appears to be more accurate in predicting the strength
of these associations, as seen by the low Mean Squared Error (MSE) of 0.155. Our
model will be a useful resource for target amino acid connections and SMILES
estimation and comprehension.
D. Literature Survey
The papers selected exhibited remarkable progressions in using deep learning
methods for diverse bioinformatics and computational biology predicaments as
shown in Table 3.
A hybrid deep-learning ensemble model called Deep Fusion DTA was suggested
by Yuqian Pu et al. in 2022 to predict drug-target binding affinity. The prediction
accuracy for drug-target interactions was improved by this model’s integration of
many information kinds through a fusion method [1]. By adaptively integrating data
from protein sequences and biomedical literature, Yingwen Zhao, Zhihao Yang, et al.
in 2023 aimed to improve protein function prediction.

Table 3 The Avg. CI and

Models CI MSE
MSE scores of the test dataset
for the Davis dataset DeepDTA CNN&Pubchem 0.718 0.571
DeepDTA CNN&Pubchem 0.718 0.571
KronRLS 0.782 0.411
GANsDTA 0.866 0.224
DeepDTA S-W&CNN 0.854 0.204
SimBoost 0.836 0.222
10 Multimodal Deep Learning for Enhanced Prediction of Molecular … 133

Their methodology improved the functional annotations of proteins by fusing

textual information with sequencing data [2]. A deep graph convolutional network
model called DGCddG was created in 2023 by Yelu Jiang, Lijun Quan, Kailong Li,
Yan Li, and others with the goal of forecasting alterations in protein–protein binding
affinity as a result of mutations. This model achieved great prediction accuracy by
utilizing graph-based representations of protein structures [3].
In 2023, Richa Dhanuka, Jyoti Prakash Singh, and colleagues carried out an
extensive analysis of deep learning methods utilized for predicting protein function.
Their analysis encompassed a range of concepts and techniques, offering percep-
tions into the advantages and disadvantages of various strategies [4]. The application
of deep learning for protein–ligand binding affinity prediction in drug design was
investigated by Mohammad A. Rezaei et al. in 2022. Their research showed that
deep learning models could accurately estimate binding affinities, which would help
in the drug discovery process [5]. A multi-attribute discriminative representation
learning technique for forecasting unfavorable drug-drug interactions was described
by Jiajing Zhu et al. in 2022. This model used a number of medication characteristics
to enhance the forecasting of possible negative interactions [6].
IMCHGAN, an inductive matrix completion model with heterogeneous graph
attention networks for drug-target interaction prediction, was introduced by Jin Li
et al. in 2022. This method made use of graph attention processes to capture the
intricate interactions that exist between medicines and targets [7]. A deep learning
system called Matchmaker was created in 2022 by Halil Ibrahim Kuru et al. to
predict medication synergy. Their approach predicted synergistic effects between
medication combinations by combining data from many sources, which might hasten
the identification of combination therapies [8].
A strategy for forecasting drug-drug interactions based on integrated similarity
and semi-supervised learning was presented by Cheng Yan et al. in 2022. To increase
prediction accuracy, our method combined semi-supervised learning strategies with
similarity metrics [9].
Drug-target interaction prediction was achieved in 2022 by Zhongjian Cheng,
Cheng Yan, et al. using graph attention networks and multi-head self-attention. Their
technique improved interaction prediction by successfully capturing the complex
patterns in biological data [10]. A knowledge-enhanced multi-view framework for
drug-target interaction prediction was created by Ying Shen et al. in 2022. To enhance
prediction performance, this system combined several data perspectives and took into
account past knowledge [11].
The multimodal deep learning approach was a method used to predict the strength
of interactions between drugs and their target proteins. It considers the sequence and
structure of both the drug and the target protein. In Table 4, The Concordance Index
(CI) of 0.832 indicates that the model is very good at ranking the interactions accu-
rately, showing its effectiveness in understanding the complex connections between
drugs and targets.
134 L. Prasika et al.

Table 4 The Avg. CI and MSE scores of the test set for the DAVIS dataset and KIBA dataset for
our model
Models Target rep Drug rep CI MSE
DTABinding SeqM&StruM SeqM&StruM 0.832 0.155

Furthermore, the low Mean Squared Error (MSE) of 0.155 suggests that the
model is better at making precise predictions about the strength of these interac-
tions. Our model will be a valuable tool for predicting and understanding drug-target
interactions.

4 Conclusion

The research provided a bioinformatics strategy for DTI estimation that comprised
many network architectures and encoding approaches. To analyze the structure, two
popular datasets that had been tokenized and encoded were used: Davis and KIBA.
Concatenating the outputs of the network configurations that were used produced
the best results. In these investigations, computational molecular biology has made
substantial progress toward predicting affinities for molecule binding.
Drug development strategies improved the model’s ability to identify complex
patterns by combining information from biological and chemical sources. With the
development of more productive new computational methods, the Improved Molec-
ular Binding Affinity technique employing deep learning represents a significant step
toward wide and accurate predictions in biochemical affiliation research.
The integration of information from both chemical and biological data enhances
the model’s capability to recognize complex patterns, providing for drug discovery
processes. As computational methods continue to evolve, this multimodal deep
learning approach will be a significant step towards the accurate and extensive
predictions in molecular interaction studies.

References

1. Pu Y, Li J, Tang J, Guo F (2022) Deep fusion DTA: drug-target binding affinity prediction with
information fusion and hybrid deep-learning ensemble model. IEEE/ACM Trans Comput Biol
Bioinform 19(1)
2. Zhao Y, Yang Z et al (2023) Improving protein function prediction by adaptively fusing
information from protein sequences and biomedical literature. IEEE J Bioinf Health Inform
27(2)
3. Jiang Y, Quan L, Li K, Li Y et al (2023) DGCddG: deep graph convolution for predicting
protein-protein binding affinity changes upon mutations. IEEE/ACM Trans Comput Biol Bioinf
20(3)
4. Dhanuka R, Singh JP, Tripathi A (2023) A comprehensive survey of deep learning techniques
in protein function prediction. IEEE/ACM Trans Comput Biol Bioinf 20(4)
10 Multimodal Deep Learning for Enhanced Prediction of Molecular … 135

5. Rezaei MA, Li Y, Wu D, Li X, Li C (2022) Deep learning in drug design protein-ligand binding

affinity prediction. IEEE/ACM Trans Comput Biol Bioinf 19(5)
6. Zhu J, Liu Y, Zhang Y, Chen Z, Wu X (2022) Multi-attribute discriminative representation
learning for prediction of adverse drug-drug interaction. IEEE Trans Pattern Anal Mach Intell
44(6)
7. Li J, Wang J, Lv H, Zhang Z, Wang Z (2022) IMCHGAN: Inductive matrix completion with
heterogeneous graph attention networks for drug target interactions prediction. IEEE/ACM
Trans Comput Biol Bioinf 19(7)
8. Kuru HI, Tastan O, Cicek AE (2022) matchmaker: a deep learning framework for drug synergy
prediction. IEEE/ACM Trans Comput Biol Bioinf 19(8)
9. Yan C, Duan G, Zhang Y, Wu FX, Pan Y, Wang J (2022) Predicting drug-drug interactions
based on integrated similarity and semi-supervised learning. IEEE/ACM Trans Comput Biol
Bioinf 19(9)
10. Cheng Z, Yan C, Wu FX, Wang J (2022) Drug target interaction prediction using multi-head
self-attention and graph attention network. IEEE/ACM Trans Comput Biol Bioinf 19(10)
11. Shen Y, Zhang Y, Yuan K, Li D, Zheng H (2022) A knowledge-enhanced multi-view framework
for drug-target interaction prediction. IEEE Trans Big Data 8(11)
Chapter 11
A Comprehensive Performance Analysis
of Area and Power-Efficient Hybrid
Adder Design

N. Udaya Kumar, K. Bala Sindhuri, A. Praneetha, B. Dileep, G. Manikanta,

and D. Chandana

1 Introduction

In today’s era, it’s crucial to stay ahead of advancements in reconfigurable computing,

machine learning, and signal processing for the continual progress of artificial intelli-
gence. Ongoing efforts are directed towards making computer devices more energy-
efficient to support the needs of machine learning algorithms. Adders and multi-
pliers play vital roles in these algorithms, especially in components like the Arith-
metic Logic Unit (ALU), Convolutional Neural Networks (CNN), and Deep Neural
Networks (DNN), where adders contribute significantly to energy consumption
during signal processing.
Therefore, the significance of adders extends to VLSI systems, particularly micro-
processors and digital signal processors, where they are indispensable for intri-
cate operations, impacting the speed of the system. Current research is dedicated
to designing and developing VLSI systems that are not only high-speed but also
low-power and area-efficient with a special focus on portable devices. Adders are
meticulously optimized for effective utilization of space, power consumption, and
operational speed in VLSI chips, including CPUs and ALUs. Therefore, it is now
essential to create ALU circuits with low power consumption and fast computing
speed due to the rapid growth of technology [1]. The overall performance of VLSI
systems hinges on factors like speed, power usage, and area efficiency. Improvements
in communication and computation technologies are driving the growing need for
low-power VLSI systems, which is why efforts to improve adder performance and,

N. Udaya Kumar (B) · K. Bala Sindhuri · A. Praneetha · B. Dileep · G. Manikanta · D. Chandana

Sagi Ramakrishnam Raju Engineering College, Bhimavaram, India
e-mail: nuk@[Link]
K. Bala Sindhuri
e-mail: kbsinduri@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 137
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
138 N. Udaya Kumar et al.

in turn, operating speed, are continuing. In digital computers, basic arithmetic oper-
ations are implemented using gates like AND, OR, NOR, NAND, etc. Multiplica-
tion is achieved through repeated addition, subtraction by negating, and division by
repeated subtraction. Hence, the adder plays a major role in performing arithmetic
operations. Additions can be done with various adders, and one of those adders is
the ripple carry adder, where each full adder waits for the carry from the previous
one. Lamani et al. [2] made comparisons between various adder topologies such as
Ripple Carry Adder (RCA), Carry Save Adder (CSA), Carry Skip Adder (CSkA),
and Carry Select Adder (CSLA) and concluded that While RCA has a basic archi-
tecture, carry propagation causes a significant latency. For faster computation, more
advanced adder designs like Carry Save Adder (CSA), Carry Increment Adder (CIA),
Carry-Lookahead Adder (CLA), or Carry Select Adder (CSLA) are used.
Raju and Kumari [3] focused on modifying Carry Save Adder (CSA) for lower
power consumption. While discussing the benefits of CSA in faster computation, it
mentions the requirement for an extra adder stage to obtain the final sum, which can be
disadvantageous in some applications. Even the CSA developed by Benisha Bennet
et al. is power-efficient but covers more area [4]. Varshney and Arya [5] proposed an
improved Carry Increment Adder (CIA) by replacing the Ripple Carry Adder (RCA)
with a Carry Look-Ahead Adder (CLA), Kogge-stone adder, and Hancarlson block.
Although there is a reduction in complex circuitry, the modified CIA is less efficient
in terms of utilization of area and power. They also discussed the performance metrics
of the Hancarlson adder, providing less delay and improved area utilization.
In, Omid Akbari et al. mentioned that by increasing the width of the Carry Look
Ahead Adder (CLA), the delay, area usage, and power consumption of the carry
generator units increase [6]. So CLA possesses the disadvantage of design complexity
as the number of variables increases. Tapadar et al. [7] acknowledged that to elude the
latency, Carry Select Adder (CSLA) is suggested over all the other adders, although
it exhausts more area. The CSLA excels at reducing carry propagation delay through
parallel carry generation but is considered less area-efficient because of the cascading
of Ripple Carry Adders (RCA).
Swetha and Reddy [8] designed a Carry Select Adder at a cost per area using D-
latch and multiplexers. By comparison with high-speed adders such as Brent Kung, it
consumes more power and delay. The major constraint of the above adder is the area
and population consumption. Hence, concentrating on the area and power utilization
of the adder, a new adder is proposed using hybrid technology, in which the adder is
designed by employing multiple logic circuits. So different adders are studied from
various reference papers to know their performance metrics.
In, Suganya et al. mentioned that lower latency is caused by ling adders, and
less area is used in contrast to the Carry-Lookahead Adder [9]. Also based on the
study by Gaur et al. [10], the Weinberger adder reduces logic stages and final carry
generation duration in comparison with other adders. According to Refs. [11–13], the
topology based on binary to excess one converters (BECs) minimizes logic redun-
dancy, which lowers power and area usage. BEC circuits can consume less power
and area utilization.
11 A Comprehensive Performance Analysis of Area and Power-Efficient … 139

The hybrid adder combines these modules synergistically. Ling handles long-
distance carry propagation, while Weinberger segments and optimizes local delays.
Han Carlson’s area-efficient logic balances the complexity of other modules, and the
Ripple Carry Adder provides a cost-effective option where appropriate. This strategic
combination addresses each bottleneck with the most suitable module, leading to
overall performance gains in terms of area, power, and delay. While further investi-
gation is required to explore limitations and wider applicability, the hybrid approach
demonstrates substantial potential for enhancing VLSI circuit efficiency. Therefore,
the previously mentioned adders are integrated as separate modules to enhance the
adder’s speed and area usage, and finally an area and power-efficient hybrid adder is
developed.
This paper presents a hybrid adder incorporating various adders to improve crucial
performance metrics. The subsequent sections are structured as follows: Sect. 2
outlines the VLSI architecture of the proposed hybrid adder, while Sects. 3 and 4
delve into discussions regarding the obtained results.

2 VLSI Architecture of the Proposed 16-Bit Hybrid Adder

The hybrid full-adder design aims to balance speed, area, and power consumption. It
incorporates multiple addition schemes, low-power logic gates, and level restoration
carry logic for efficient performance. The goal is to reduce power consumption,
minimize area requirements, and enhance overall efficiency in digital circuit design.
The 16-bit hybrid adder, designed based on the SQRT CSLA architecture and
depicted in the Fig. 1, comprises five distinct groups of adders such as Ripple Carry
Adder (RCA), Han-Carlson Adder, Binary to Excess-1 Converter (BEC), Weinberger
Adder, and Ling Adder. The RCA is simple and easy to implement, suitable for small
bit-width additions. Han-Carlson, Weinberger, and Ling adders are designed for
efficient parallel computation, enhancing the speed of addition operations. They are
optimized for specific bit ranges and have a conditional carry selection mechanism
that optimizes functionality. In summary, each type of adder in the 16-bit hybrid
design contributes to overall efficiency by addressing specific bit ranges and utilizing
specialized designs.
Binary to Excess-1 Converter (BEC) is used for converting binary numbers to
excess-1 notation, playing a specific role in addressing and optimizing the addition
process. It participates in the conditional selection mechanism, dynamically influ-
encing the choice of output based on the carryout status. In the architecture of the
proposed hybrid adder shown in Fig. 1, the 4-bit Binary to Excess-1 Converter is
referred to as BEC4. Similarly, the 5-bit Binary to Excess-1 Converter is denoted as
BEC5, and the 6-bit Binary to Excess-1 Converter is labeled as BEC6.
The initial two groups comprising RCA in Fig. 1 are dedicated to processing two
inputs of size 2-bit along with carrying 1-bit. The first group consisting of RCA
takes A[1:0] and B[1:0] along with cin as inputs and generates sum out as s[1:0] and
carryout as cout1. The second group, consisting of RCA, takes A[3:2] and B[3:2]
140 N. Udaya Kumar et al.

Fig. 1 The architecture of proposed 16-bit hybrid adder

along with cin, i.e., cout1, from the previous RCA as inputs and generates sum out as
s[3:2] and carryout as cout2. In the third group of the hybrid adder architecture, a 3-bit
Han-Carlson adder is incorporated, along with Binary to Excess-1 Converter (BEC)
and Multiplexer (MUX) units. This module takes A[6:4] and B[6:4] as inputs and
operates dynamically by selecting the carryout (cout3) and sum bits (s[6:4]) based
on the carryout bit, i.e., cout2, from the preceding module. Specifically, if cout2 is
0, the output of the Han-Carlson adder is chosen. Conversely, if the cout2 is 1, the
output of the BEC4 is selected. This conditional selection mechanism based on the
carryout status optimizes the functionality of the third group in handling inputs from
augend and addend bits 4–6. Similar to the third group, the fourth group utilizes a
4-bit Weinberger adder for the addition of A[10:7] and B[10:7], and based upon the
previous carryout, i.e., cout3, the output sum (s[10:7]) and carryout (cout4) are chosen
between the Weinberger adder and BEC5. In the fifth group, a 5-bit Ling adder is used
for the addition of A[15:11] and B[15:11], and based upon the previous carryout, i.e.,
cout4, the output sum (s[15:11]) and carryout (cout) are chosen between the Ling
adder and BEC6. Each adder type is strategically chosen to match the characteristics
and requirements of the bit range it processes, contributing to the overall efficiency
of the 16-bit hybrid adder, with the primary objective of minimizing the required
area and power consumption.

3 Results and Discussion

The Xilinx Vivado 2019.1 tool is utilized for the simulation and synthesis of the
proposed 16-bit hybrid adder. The simulation results, demonstrating the performance
of the hybrid adder with various input combinations, are presented in Fig. 2.
11 A Comprehensive Performance Analysis of Area and Power-Efficient … 141

Fig. 2 Simulation results of proposed 16-bit hybrid adder

The proposed 16-bit hybrid adder is tested with various input combinations,
and for each combination of inputs, the output sum and carryout are generated,
respectively, as shown in Fig. 2.
The schematic diagram of the proposed hybrid adder obtained using the Xilinx
Vivado 2019.1 tool is represented in Fig. 3, which serves as a simplified illustration
of a system or circuit employing components and their interconnections. The perfor-
mance comparison of our proposed hybrid adder is done with other adders such as
Linear Hancarlson CSLA, Linear Ling CSLA, CLA_CSLA_D-Latch [8], and CLA_
CSLA_MUX [8]. The analysis is based on area, power, and delay reports generated
using Cadence software. The area, power consumption, and delay after synthesis are
tabulated in Tables 1 and 2, which represent values for 90 and 180 nm technology,
respectively.
The performance comparison of our proposed 16-bit hybrid adder is done with
other adders such as Liner Han Carlson CSLA,16-bit Linear Ling CSLA,16-bit CLA_
CSLA_D-Latch [8], 16-bit CLA_CSLA_MUX [8]. The analysis is based on area and
power reports generated using Cadence genus tool. The area and power consumption
after synthesis are tabulated in Table 1 which represent values for 90 nm technology.
The Liner Han Carlson CSLA utilizes 4.23% more area than the proposed hybrid
adder, while the Linear Ling CSLA has 15.65% more area than the proposed adder.

x4 s0_i
S=1'b1 I0[2:0]
cin cout O[2:0] 2:0
S=default I1[2:0] s[15:0]
x[2:0] y[2:0] c7_i x1
S=1'b1 I0 S RTL_MUX
bec3
O 1:0 a[1:0] cout
S=default I1
1:0 b[1:0] s[1:0] 1:0

S RTL_MUX cin
x2
rca_g1

3:2 a[1:0] cout

3:2 b[1:0] s[1:0] 1:0

cin cin
a[15:0]
rca_g1
x3 c10_i
x5 S=1'b1 I0 s0_i__0
6:4 a[2:0] cout ... O
x6 S=default I1 S=1'b1 I0[3:0]
6:4 b[2:0] s[2:0] a[3:0] cout O[3:0] 3:0
b[15:0] S=default I1[3:0]
... b[3:0] s[3:0] cin cout S RTL_MUX
hancarlson_g3
x[3:0] y[3:0] S RTL_MUX
weinberger_g4
bec4

s0_i__1
x7 S=1'b1 I0[4:0]
O[4:0] 4:0
S=default I1[4:0]
... a[4:0] cout

... b[4:0] s[4:0] S RTL_MUX

x8
ling_g5

cin cout cout_i

x[4:0] y[4:0] S=1'b1 I0
O
S=default I1 cout
bec5

S RTL_MUX

Fig. 3 RTL circuit of proposed 16-bit hybrid adder

142 N. Udaya Kumar et al.

Table 1 Practical comparison of area, power, and delay for various adders and the proposed hybrid
adder based on 90 nm technology
Adders (90 nm) Area (µm2 ) Power (mW) Delay (ns)
Liner Han Carlson CSLA 527.99 0.0237098 1799
Linear Ling CSLA 599.465 0.0212 2235
CLA_CSLA_D-Latch [8] 965.804 0.0381 2220
CLA_CSLA_MUX [8] 772.038 0.04 2228
Proposed hybrid adder 505.609 0.0211 1975

Table 2 Practical comparison of area, power, and delay for various adders and the proposed hybrid
adder based on 180 nm technology
Adders (180 nm) Area (µm2 ) Power (mW) Delay (ns)
Liner Hancarlson CSLA 1683.158 0.0994716 2809
Linear Ling CSLA 2092.306 0.0987 3156
CLA_CSLA_D-Latch [8] 3113.51 0.01495 2996
CLA_CSLA_MUX [8] 2464.862 0.13896 2420
Proposed hybrid adder 1679.82 0.0964 3172

The CLA_CSLA_D-Latch [8] utilizes 47.64% more area than the proposed hybrid
adder, while the CLA_CSLA_MUX [8] has 34.50% more area than the proposed
adder. Similarly, the Liner Han Carlson CSLA utilizes 3.01% more power than the
proposed hybrid adder, while the Linear Ling CSLA utilizes 2.33% more power than
the proposed adder. The CLA_CSLA_D-Latch [8] utilizes 44.61% more power than
the proposed hybrid adder, while the CLA_CSLA_MUX [8] utilizes 47.25% more
power than the proposed adder.
The performance comparison of our proposed 16-bit hybrid adder is done with
other adders such as Liner Han Carlson CSLA, 16-bit Linear Ling CSLA, 16-bit
CLA_CSLA_D-Latch [8], 16-bit CLA_CSLA_MUX [8]. The analysis is based on
area and power reports generated using Cadence genus tool. The area and power
consumption after synthesis are tabulated in Table 1 which represent values for
11 A Comprehensive Performance Analysis of Area and Power-Efficient … 143

Fig. 4 Comparison of Area (µm2 ) based on 90 nm technology

180 nm technology. The Liner Han Carlson CSLA utilizes 0.19% more area than
the proposed hybrid adder, while the Linear Ling CSLA has 19.71% more area than
the proposed adder. The CLA_CSLA_D-Latch [8] utilizes 46.04% more area than
the proposed hybrid adder, while the CLA_CSLA_MUX [8] has 31.84% more area
than the proposed adder. Similarly, the Liner Han Carlson CSLA utilizes 11% more
power than the proposed hybrid adder, while the Linear Ling CSLA utilizes 0.47%
more power than the proposed adder. The CLA_CSLA_D-Latch [8] utilizes 35.78%
more power than the proposed hybrid adder, while the CLA_CSLA_MUX [8] utilizes
32.35% more power than the proposed adder.
Figure 4 presents the comparative analysis of area (µm2 ) on 90 nm technology.
From Fig. 4, it is evident that the proposed hybrid adder occupies less area when
compared to other adders. Figure 5 puts forward the comparative analysis of power
(mW) on 90 nm technology, which displays that the hybrid adder consumes less
power in comparison with the other adders. Figure 6 represents the comparative
analysis of time delay (ns) on 90 nm technology, which conveys that the speed of
operation of other adders is high compared to hybrid adders.
144 N. Udaya Kumar et al.

Fig. 5 Comparison of power (mW) based on 90 nm technology

Fig. 6 Comparison of delay (ns) based on 90 nm technology

4 Conclusion

In this paper, a 16-bit hybrid adder is designed using Xilinx Vivado 2019.1, which is
efficient in terms of area and power and outperforms existing alternatives. The adder
is made to work well by incorporating different sizes of various adders. The proposed
hybrid adder is compared with other adders like Linear Hancarlson CSLA, Linear
Ling CSLA, CLA_CSLA_D-Latch [8], and CLA_CSLA_MUX [8]. Performance
metrics such as area, power, and delay reports are generated using the Cadence
Genus tool in both 90 and 180 nm technologies. From the reports, the proposed hybrid
adder is 4–35% and 10–50% more efficient in terms of area and power respectively
11 A Comprehensive Performance Analysis of Area and Power-Efficient … 145

on 90 nm technology when compared to other adders present in Table 1. Based on the

reports generated using 180 nm technology, the proposed hybrid adder is 0.2–45%
and 0.5–36% much efficient in terms of area and power respectively when compared
to other adders present in Table 2. This innovative design is a big step forward in
making digital circuits that balance space and power use really well.

5 Future Scope

The future work can be done on reducing delay of the adder by using various reduction
techniques such as MGDI (modified gate diffusion input-only two transistors required
to realize the gates, new low power and area-efficient technique) and power can be
even more reduced by employing power rating technique at back end and further the
developed efficient adder can be employed in various DSP and AI applications.

References

1. Sarkar S, Sarkar S, Mehedi J (2018) Comparison of various adders and their VLSI. In:
International conference on computer communication and informatics
2. Lamani DS, Kiran (2022) A comparative analysis on parameters of different adder topologies.
Int Res J Eng Technol (IRJET)
3. Tilak Raju D, Sravani Kumari S (2020) Design of carry save adder with low power using
modified gate diffusion input technique. J Crit Rev 7:11
4. Bennet B, Maflin S (2015) Modified energy efficient carry save adder. In: International
conference on circuit, power, and computing technologies [ICCPCT]
5. Varshney N, Arya G (2019) Design and execution of an enhanced carry increment adder
using Han-Carlson and Kogge-stone adder technique. In: Proceedings of the third international
conference on electronics communication and aerospace technology [ICECA 2019], p 8
6. Akbari O, Kamal M, Afzali-Kusha A, Pedram M (2018) RAP-CLA: a reconfigurable
approximate carry look-ahead adder. IEEE Trans Circuits Systems II Express Briefs 65:5
7. Tapadar A, Sarkar S, Dutta A, Mehedi J (2018) Power and area aware improved SQRT carry
select adder (CSlA). In: Proceedings of the 2nd international conference on trends in electronics
and informatics (ICOEI 2018), p 7
8. Swetha S, Siva Sankara Reddy N (2023) Design of FIR filter using low-power and high-speed
carry select adder for low-power DSP applications. IETE J Res 15
9. Suganya R, Meganathan D (2015) High performance VLSI adders. In: 3rd International
conference on signal processing, communication, and networking (ICSCN), p 7
10. Gaur N, Mehra A, Kumar P (2019) 16-Bit power efficient carry select adder. In: 6th International
conference on signal processing and integrated networks (SPIN), p 4
11. Munawar M, Khan T, Rehman M, Shabbir Z, Daniel K, Sheraz A, Omer M (2020) Low power
and high speed Dadda multiplier using carry select adder with binary to excess-1 converter. In:
International conference on emerging trends in smart technologies (ICETST)
12. Gudala NA, Ytterdal T, Lee JL, Rizkalla M (2021) Implementation of high speed and low
power carry select adder with BEC. In: International midwest symposium on circuits and
systems (MWSCAS)
13. Challa Ram G, Venkata Subbarao M, Varma R, Prema Kumar M (2023) Delay enhancement
of Wallace tree multiplier with binary to excess-1 converter. In: 5th International conference
on smart systems and inventive technology (ICSSIT)
Chapter 12
Performance Driven VLSI Adder
Choices in Image Processing:
A Comparative Analysis

K. Bala Sindhuri, N. Udaya Kumar, Ch. Gowthami, Ch. Sree Varun,

E. S. V. S. Surya Vaishnavi, A. K. Prathardhan, and G. G. Karthik

1 Introduction

The discernment of human visual perception is significantly influenced by the quality

and aesthetic appeal of images. The fundamental objective of image enhancement
endeavors is to refine images to better suit specific applications. Recent studies by
Narula et al. [1] have delved into the nuances of image enhancement techniques,
categorizing them into spatial domain methods and frequency domain methods.
Spatial domain techniques directly manipulate the pixel structure of the image plane,
whereas frequency domain methods leverage mathematical transforms to induce
enhancements. Crucial to image enhancement implementations are techniques such
as brightness adjustment and inversion, which can be achieved through both spatial
and frequency domain methods. However, spatial domain methods are preferred due
to their lower inherent complexity. Iqbal et al. [2] provided a comprehensive method-
ology for enhancing underwater images, where slide stretching was performed by
contrast stretching the image based on RGB values, followed by histogram stretching,
which is considered a complex spatial domain technique. However, it was noted that
this approach amplified noise or artifacts within the image and could lead to an
unnatural appearance when aggressively applied. Additionally, it required additional
processing time. To address these challenges, the work of Dr. Patel et al. [3] was
considered. In their research, a simple technique was introduced in which the initial
image files were in the .jpg format, which was not compatible with Xilinx software.

K. Bala Sindhuri (B) · N. Udaya Kumar · Ch. Gowthami · Ch. Sree Varun ·
E. S. V. S. Surya Vaishnavi · A. K. Prathardhan · G. G. Karthik
Sagi Ramakrishnam Raju Engineering College, Bhimavaram, India
e-mail: kbsinduri@[Link]
N. Udaya Kumar
e-mail: nuk@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 147
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
148 K. Bala Sindhuri et al.

Therefore, a conversion process to .bmp format using MATLAB was implemented.

The resulting data underwent processing through digital algorithms, and the output
data was converted back to image format. The prevalent point operations for image
enhancement were then detailed, with an emphasis on their implementation using the
Verilog language, showcasing its efficiency in managing files within a storage envi-
ronment. High-speed image improvement applications using FPGA were concluded
based on these studies. These techniques were deemed widely applicable, catering
to various needs such as digital X-rays, CT scans, MRIs, and digital sonography.
Liu et al. and Pitas et al. [4] have provided comprehensive insights into bright-
ness adjustment, entailing the global modification of pixel intensities within images.
The brightness operation modifies the intensity of pixels in the image to adjust its
overall brightness. Addition increases brightness, while subtraction decreases it. By
adjusting the pixel values of the R, G, and B components accordingly, the overall
brightness of the image is altered. However, extreme adjustments may yield adverse
effects such as alterations in color fidelity, detail loss, and noise amplification. Otsu
et al. [5] elucidated the negation of pixel intensities, effectively reversing light and
dark areas in images. This involves the calculation of averages of the intensities to
obtain a grayscale value. After obtaining the average intensity for each pixel, the
code subtracts this value from 255 to perform the inverse operation. Subtracting
from 255 effectively reverses the intensity, making darker pixels lighter and vice
versa, resulting in an inverted image. However, this process is not devoid of limita-
tions, including potential color information loss, noise exacerbation, and diminished
visibility of finer details. Addressing these limitations inherent in existing image
enhancement methodologies while concurrently ensuring efficient VLSI implemen-
tations remains a formidable challenge. The efficacy of VLSI systems hinges on
critical factors encompassing speed, power consumption, and area efficiency. The
burgeoning demand for low-power VLSI systems propels ongoing efforts to enhance
adder performance and operational speed. Adders, pivotal in facilitating pixel inten-
sity adjustments for enhancing image brightness and contrast, encompass a range of
architectures, including Ling adders and Weinberger adders [6–8]. Suganya et al. [9]
have demonstrated the superior attributes of Ling adders, exhibiting minimal delay
and chip size utilization compared to conventional Carry Lookahead Adders. Akash
Kumar et al. [10] have further provided that pseudo-carry is calculated instead of
conventional carry. With this, we can enhance the speed of the computation as it will
be using a smaller number of AND gates compared to the CLA. Gaur et al. [11] have
further advanced the discourse by showcasing the efficacy of Weinberger adders,
notably reducing logic stages and final carry generation time. Ling adders excel at
managing long-distance carry propagation, while Weinberger adders adeptly segment
and optimize local delays. This research undertakes a comprehensive examination of
diverse adder architectures within the VLSI framework, aimed at optimizing image
processing tasks. The effectiveness of Ling and Weinberger adders in executing
pivotal operations such as brightness adjustment and image inversion is methodi-
cally scrutinized, contrasting their respective merits. Leveraging the Vivado Soft-
ware environment, meticulous evaluations are conducted to ascertain each adder’s
impact on processing area, power consumption, and delay. To identify and optimize
12 Performance Driven VLSI Adder Choices in Image Processing … 149

the most suitable and resource-efficient image processing architecture for specific
applications, a comparative analysis is conducted. This analysis serves to align high-
level image enhancement algorithms with low-level VLSI implementations, facili-
tating the emergence of expedited, resource-efficient solutions across various image
processing domains [12, 13].

2 Methodology for Image Enhancement Techniques

The initial stage encompasses several sequential processes to obtain image data,
commencing with image acquisition from the source and resizing it to dimensions of
768 × 512 pixels. Subsequently, the resized image is stored in BMP format, following
this, the subsequent phase involves interfacing the image data with MATLAB, where
pixel extraction and conversion into [Link] file format occur as shown in Fig. 1. This
file is then processed by Verilog, delineating the spatial parallelism functional unit.
During this phase, diverse image enhancement algorithms such as Brightness and
Invert may be executed. Upon completion of algorithmic manipulation, the resulting
image is presented, showcasing the enhanced pixels. This system implementation
process is illustratively elucidated. The concluding step entails the practical real-
ization of these techniques, employing various adders based on their performance
metrics. Subsequently, the outputs of these adders are meticulously analyzed. These
evaluations are conducted to ascertain each adder’s impact on processing area,
power consumption, and delay with the ultimate goal of identifying and optimizing
the most suitable and resource-efficient image processing architecture for specific
applications.

Fig. 1 Methodology for image enhancement techniques through different Adder

150 K. Bala Sindhuri et al.

3 Adders

3.1 Ling Adder

The Ling Adder stands as a notable example of Parallel Prefix Adders within the
realm of digital arithmetic circuits. Positioned as an evolution from the traditional
Carry Look-Ahead adder, the Ling Adder is engineered to augment the speed and
efficiency of 8-bit binary addition operations. The Area, Power, and delay are very
small when compared with other Adders. Here 8-Bit ling adder is considered in this
work in order to perform Image brightness and invert operations.

3.2 Weinberger Adder

The Weinberger adder is renowned for its minimal area. It utilizes the Weinberger
Recurrence algorithm for carry computation. To enhance circuit speed, the Wein-
berger adder incorporates the concept of parallel carry computation. The Area, Power,
and delay are small when compared with other Adders. Because of these proper-
ties. Its simple architecture makes it more efficient for different image enhancement
techniques.

3.3 Image Enhancement Techniques

3.3.1 Brightness Operation

Brightness adjustment is essential for controlling the overall illumination of an image.

Proper brightness levels are crucial for maintaining visibility and ensuring that details
are discernible. In medical imaging and microscopy, for example, adjusting bright-
ness can help highlight specific structures or anomalies. Brightness adjustment is
commonly used in photography and multimedia to control the overall tone and mood
of an image. It allows for creative expression and can be used to achieve a desired
visual impact.

3.3.2 Inverse Operation

Inverting an image involves reversing the intensity values, turning dark areas into
bright ones and vice versa. This technique can be useful in highlighting specific
features or structures in an image. Image inversion can also be employed for creative
purposes in art and design. It allows for the creation of visually striking and unconven-
tional images, adding an artistic dimension to the processing of visual content. The
12 Performance Driven VLSI Adder Choices in Image Processing … 151

INVERT_OPERATION macro controls whether the invert operation is performed.

It allows you to toggle the functionality on or off at compile time.

4 Results and Discussions

The Adder architectures are modeled using Verilog HDL. For simulation and
synthesis, the Vivado 2019.1 ISIM is used. The tool ran on an INTEL Core i5
processor with a 64-bit operating system and 16 GB of RAM.

4.1 Schematic Diagrams

4.1.1 RTL Schematic Diagrams for Ling Adder

The RTL designs for both the 4-bit and 8-bit Ling Adders were developed using
Vivado 2019.1. These designs were meticulously crafted to adhere closely to the
theoretical architectures derived from computational analysis. Figures 2 and 3 illus-
trate the RTL schematics for the respective adders, ensuring fidelity to the theoretical
specifications throughout the design phase.

Fig. 2 RTL schematic for 4-bit Ling Adder

152 K. Bala Sindhuri et al.

Fig. 3 RTL schematic for 8-bit Ling Adder

4.1.2 RTL Schematic Diagrams for Weinberger Adder

The RTL designs for both the 4-bit and 8-bit Weinberger Adders were implemented
using Vivado 2019.1. These designs align precisely with the theoretical architectures
derived from computational analysis. Figures 4 and 5 depict the RTL schematics for
the respective adders, ensuring fidelity to the theoretical specifications during the
design process.

Fig. 4 RTL schematic for 4-bit Weinberger Adder

12 Performance Driven VLSI Adder Choices in Image Processing … 153

Fig. 5 RTL schematic for 8-bit Weinberger Adder

4.2 Performance Analysis Using Cadence Genus Tool

Both the adders were analyzed and compared based on their performance. The total
area, power consumption, and delay after synthesis for two adders for 45, 90, and
180 nm are obtained and tabulated in Tables 1, 2, and 3 respectively. And next these
adders are used for the image enhancement operations such as Brightness addition
and invert operation. Figures 6, 7, and 8 represent the comparative analysis of Area,
power, and delay for different adders (µm2 ) in 45 nm respectively.

4.2.1 Performance Analysis of Different Adders in 45 nm technology

See Figs. 6, 7, and 8 represent the comparative analysis of Area, power, and delay
for different adders in 45nm technology.

Table 1 Performance
Adders (45 nm) Area (µm2 ) Power (mW) Delay (ns)
analysis of different adders in
45 nm Ling Adder [8] 76.123 0.0425802 1972
Weinberger Adder [9] 115.463 0.00641718 2034

Table 2 Performance
Adders (90 nm) Area (µm2 ) Power (mW) Delay (ns)
analysis of different adders in
90 nm Ling Adder [8] 157.435 0.00543504 1918
Weinberger Adder [9] 213.446 0.00771417 1472

Table 3 Performance
Adders (180 nm) Area (µm2 ) Power (mW) Delay (ns)
analysis of different adders in
180 nm Ling Adder [8] 558.835 0.0255611 2705
Weinberger Adder [9] 671.933 0.0322704 2230
154 K. Bala Sindhuri et al.

Fig. 6 Area analysis for 45 nm

Fig. 7 Power analysis for 45 nm

4.2.2 Performance Analysis of Different Adders in 90 nm technology

Figures 9, 10, and 11 represent the comparative analysis of Area, power, and delay
for different adders in 90 nm technology.

4.2.3 Performance Analysis of Different Adders in 180 nm

Figures 12, 13, and 14 represent the comparative analysis of Area, power, and delay
for different adders (µm2 ) in 180 nm technology respectively. And next these adders
are used for the image Based on the findings presented in Figs. 9, 10, and 11. It is
determined that, in comparison to the Weinberger adder, the Ling adder consumes less
12 Performance Driven VLSI Adder Choices in Image Processing … 155

Fig. 8 Delay analysis for 45 nm

Fig. 9 Area analysis for 90 nm

area and exhibits lower power consumption. Following the implementation of image
enhancement operations, it is ascertained that there exists no concern regarding the
quality of images, as both adders yield identical outputs. Consequently, it is inferred
that the Ling adder is deemed more efficient among the two alternatives.
156 K. Bala Sindhuri et al.

Fig. 10 Power analysis for 90 nm

Fig. 11 Delay analysis for 90 nm

4.3 Results of Image Enhancement Techniques on Different

Adders

Figure 15 represents the original image without applying any of the image processing
techniques such as Brightness Operation, Inverse Operation.

4.3.1 Results for Brightness Operation on Both the Adders

The ifdef directive checks whether the macro-BRIGHTNESS_OPERATION is

defined [4]. If defined, the code within this conditional block will be included during
compilation. If the SIGN signal is set to 1 (indicating brightness addition), the pixel
values of the image (org_R, org_G, org_B) are increased by the specified VALUE.
12 Performance Driven VLSI Adder Choices in Image Processing … 157

Fig. 12 Area analysis for 180 nm

Fig. 13 Power analysis for180 nm

The resulting pixel values are checked to ensure they remain within the valid range
of 0 to 255. If a value exceeds 255, it is clipped to 255 to prevent overflow and vice
versa. The brightness operation modifies the intensity of pixels in the image to adjust
its overall brightness. Addition increases brightness, while subtraction decreases it.
By adjusting the pixel values of the R, G, and B components accordingly, the overall
brightness of the image is altered. The Addition or subtraction is done by using the
address mentioned above. Figures 16 and 17 represent the output images obtained
for Brightness operation using Ling Adder and Weinberger Adder.
158 K. Bala Sindhuri et al.

Fig. 14 Delay analysis for180 nm

Fig. 15 Original input

image

Fig. 16 Brightness
operation using Ling Adder

Fig. 17 Brightness
operation using Weinberger
Adder
12 Performance Driven VLSI Adder Choices in Image Processing … 159

Fig. 18 Invert operation

using Ling Adder

Fig. 19 Invert operation

using Weinberger Adder

4.3.2 Results for Invert Operation on Both the Adders

The INVERT_OPERATION macro controls whether the invert operation is

performed [5]. It allows you to toggle the functionality on or off at compile time.
Inside the conditional block, the code calculates the average intensity of the RGB
components (org_R, org_G, org_B) of two pixels (col and col + 1) at a specific row
(row). This calculation averages the intensities to obtain a grayscale value. After
obtaining the average intensity for each pixel, the code subtracts this value from
255 to perform the inverse operation. Subtracting from 255 effectively reverses the
intensity, making darker pixels lighter and vice versa, resulting in an inverted image.
The Addition or subtraction of pixel values is done by using the address mentioned
above. Figures 18 and 19 represent the output images obtained for invert operation
using Ling Adder and Weinberger Adder.
Firstly, an HEX file of an BMP image is given as an input to the Vivado and output
images are obtained as shown in Figs. 16, 17, 18, and 19.

5 Conclusion

In our research endeavor, the implementation of image enhancement techniques

across different adder architectures was undertaken, with a meticulous examination
of their performance in terms of area utilization, power consumption, and propa-
gation delay. This enhancement was centered around the integration of adders as
pivotal components within our design framework. To analyze the efficiency of this
architectural approach, comprehensive evaluations were conducted through synthesis
160 K. Bala Sindhuri et al.

and simulation processes, with the indispensable Xilinx Vivado 2019.1 ISIM tools
being relied upon. The practical implementation of prevalent point operations for
image enhancement were elucidated, with the Verilog language being employed.
Notably, Verilog’s versatility extended to file manipulation within storage environ-
ments, amplifying its utility in our research framework. The results of our investi-
gations unveiled notable enhancements in the area, Power, and Delay metrics of the
adders, underscoring their augmented energy efficiency. These insights are consid-
ered of notable significance, poised to enhance the performance and efficiency across
diverse applications within the realm of image processing.

References

1. Narula MS (2018) FPGA implementation of image enhancement using Verilog HDL. Int Res
J Eng Technol (IRJET) 5(5), e-ISSN: 2395-0056
2. Iqbal K, Salam RA, Osman A, Talib AZ (2007) Underwater image enhancement using an
integrated color model
3. Patel S (2019) Image enhancement on FPGA using Verilog. Int J Tech Innov Mod Eng Sci
(IJTIMES) Impact Factor 5(3):22 (SJIF-2017), e-ISSN: 2455-2585
4. Puneet P, Garg N (2013) Binarization techniques used for grey scale images. Int J Comput
Appl 71:8–11. [Link]
5. Menotti D, Najman L, Facon J, Araújo A (2007) Multi-histogram equalization methods for
contrast enhancement and brightness preserving. IEEE Trans Consum Electron 53:1186–1194.
[Link]
6. Sarkar S, Mehedi J, Sarkar S (2018) Comparison of various adders and their VLSI. In:
International conference on computer communication and informatics
7. Ramya AS, Mounica ACN, Ramesh Babu BSSV (2015) Performance analysis of different 8-bit
full adders. IOSR J VLSI Signal Process (IOSR-JVSP) 5
8. Tilak Raju D, Sravani Kumari S (2020) Design of carry save adder with low power using
modified gate diffusion input technique. J Crit Rev 7
9. Meganathan D, Suganya R (2015) High performance VLSI adders. In: 3rd International
conference on signal processing, communication and networking (ICSCN), p 7
10. Shetty, Saud, Serrao, Pinto R (2020) Design and implementation of 64-bit parallel prefix adder
11. Gaur N, Mehra A, Kumar P, Kallakuri S (2019) 16 Bit power efficient carry select adder. In:
6th International conference on signal processing and integrated networks (SPIN)
12. Cui X (2011) Optimized design of parallel prefix Ling adder. In: 2011 International conference
on electronics, communications and control (ICECC)
13. Thakur G (2020) FPGA-based parallel prefix speculative adder for fast computation application.
In: 2020 Sixth international conference on parallel, distributed and grid computing (PDGC)
Chapter 13
Text-to-Speech Conversion for Gujarati
Language Using Deep Learning
Technique

Vishal Narvani and Harshal Arolkar

1 Introduction

The process of generating speech using technology or computers is referred to as

speech synthesis. The primary objective of speech synthesis is to develop a compu-
tational system that can produce coherent and authentic vocal output, effectively
conveying information to a user in a specific accent, language, and vocal style. The
generation of the speech signal is the ultimate task of the voice synthesis block. There
are two approaches employed to achieve this objective: the parametric representation
method, where machine-generated phoneme realizations are utilized, or by selecting
speech parts from a pre-existing database. The resulting brief segments of verbal
communication are interconnected to generate the ultimate auditory representation
[1].
Text-to-speech (TTS) systems refer to software applications that facilitate the
conversion of written text in natural language into synthetic speech. The text-to-
speech (TTS) synthesis system is utilized in various domains of everyday life, such
as public speaking, listening aids, and screen reading, among others [2].

V. Narvani (B) · H. Arolkar

GLS University, Ahmedabad, India
e-mail: [Link]@[Link]
H. Arolkar
e-mail: [Link]@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 161
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
162 V. Narvani and H. Arolkar

2 Overview of Gujarati Language

The Gujarati script evolved from the Devanagari script. The Gujarati language can be
traced back to its ancestral roots in Sanskrit. Gujarati is utilized by a global population
of over 50 million individuals for both written and spoken communication. The
earliest known document in the Gujarati script, dating back to 1592, represents its
historical origin, while the script itself made its first appearance in print through an
advertisement in 1797 [3].
The Gujarati character set comprises a total of 75 officially recognized shapes,
encompassing 59 characters and 16 diacritics, each of which is distinct and consid-
ered legal. The 59 characters can be categorized into three groups: 36 consonants,
including 2 compound consonants and 34 singular consonants that represent embel-
lished sounds; 13 vowels, representing pure sounds; and 10 numerical digits. There
are a total of 16 diacritics, which can further be categorized into 13 vowel characters
and 3 other characters. The systematic grouping of vowels and consonants according
to their respective phonetic pronunciations determines the alphabet’s arrangement
[4].
The Gujarati script employs a syllabic writing system, wherein each character
represents a distinct syllable and is conventionally written from left to right. In
linguistic terms, the phonetic units known as consonants are referred to as “Vyanjan,”
while the phonetic units known as vowels are referred to as “Swar.” The Gujarati
language features a specific set of modifier symbols associated with each vowel.
These symbols are used to alter the pronunciation of consonants and are commonly
referred to as “maatras.” Modifiers take various forms and are affixed either at the
upper, lower-right, or lower portion of a consonant, contingent upon the specific
consonant in question. A letter is considered conjunct when it is formed by the
combination of two half-consonants. In the Gujarati script, characters are created
through the combination of consonants, vowels, and diacritics. The characters in the
Gujarati language are depicted in Table 1 [4].

3 Text-to-Speech (TTS)

TTS (Text-to-Speech) refers to a technological system that transforms written

linguistic content into audible speech output. It is also known as speech synthesis,
which is the artificial production of human speech. The primary objective of a text-
to-speech (TTS) tool is to autonomously transform written text into audible speech.
This technology serves diverse purposes, including providing auditory access for
individuals with visual impairments, delivering public announcements in transporta-
tion hubs such as railway stations and airports, and facilitating information retrieval
in telephone services offered by financial institutions and call centers, among other
applications. In order to enhance the comprehensibility and naturalness of a text-to-
speech (TTS) system, the incorporation of rhythm is a crucial element. The notion of
13 Text-to-Speech Conversion for Gujarati Language Using Deep Learning … 163

Table 1 Sample Gujarati

“Symbol” Equivalent Gujarati “Akshar”
characters
a

kha

gha

cha

corpus-based voice synthesis is employed in numerous text-to-speech (TTS) systems.

The necessity for systems that can effectively accommodate various speech styles and
emotions is underscored by the advancements in next-generation TTS technology.
The quality of text-to-speech (TTS) systems is continuously improving, leading
to a rapid expansion of their application field. The TTS system has increasingly
become a viable option for everyday usage among the general population, owing to
its newfound affordability [4].

4 Literature Review

Kothari and Kumbharana in [2] outline the process of designing, developing, and
implementing a concatenation-based algorithm for text-to-speech synthesis in the
Gujarati language. When researchers endeavor to create a recognition system, they
need access to certain pre-existing data, such as a database, that is relevant to the
intended recognition system. The study specifically explores the utilization of a
database containing pre-recorded Gujarati phonemes in concatenative synthesis,
where these phonemes are combined to generate sound.
Kaveri and Ramesh in [5] delves into the concept of a Text-to-Speech (TTS)
synthesizer, a computer-based system that should be able to read text. Specifically
focusing on Indian languages, the paper provides a comprehensive explanation of a
single text-to-speech (TTS) system tailored for Hindi, with the purpose of generating
speech. Typically, this technique consists of two stages: text processing and speech
creation. A Java Swings graphical user interface has been developed to translate
Hindi text into voice. India is a linguistically diverse country with several spoken
languages, each serving as the primary language for tens of millions of individuals.
The paper also acknowledges the significant dissimilarities in languages and scripts
164 V. Narvani and H. Arolkar

while highlighting the substantial similarities in grammar and alphabet words across
Indian languages.
Diwakar et al. in [6] discuss the ongoing research of the authors on a Sanskrit
text-to-speech (TTS) system named ‘Samvachak’ at the Special Centre for Sanskrit
Studies, JNU. Currently, there is no existing text-to-speech (TTS) system specifically
designed for the Sanskrit language. Upon examining the relevant study, the report
centers on the advancement of several components of the text-to-speech (TTS) system
and highlights potential challenges in its development. The research for the TTS may
be categorized into two groups: TTS independent language study and TTS-related
research and development (R&D).
Sajini and Neetha in [7] investigate the utilization of speaker adaptation tech-
nology in Hidden Markov Model-based Text-to-Speech (HTS) for introducing
speaker variability in Malayalam TTS. Speaker adaptation (SA) has been success-
fully implemented within the HTS framework for foreign languages like English
and Japanese. However, it has not yet been experimented with for Indian languages.
The aim of this study is to employ the HTS framework to apply Speaker Adaptation
(SA) as a means of achieving a wider range of voices while simultaneously reducing
the costs, time, and effort typically associated with generating a new or modified
Text-to-Speech (TTS) voice. The study employs constrained maximum likelihood
linear regression (CMLLR) and maximal posterior probability (MAP) method to
make different vocal variations. The database for sentiment analysis includes five
speakers, each contributing one hour of speech. Within this database, a subset of
four speakers is employed to train a speaker-independent average model (SI). The
SI model underwent training with varying numbers of speakers. The average model,
equipped with three speakers, produced a comprehensible but noisy output. However,
when using four speakers, the model produced a comprehensible output of high
quality and similarity, with only occasional distortions.
Femina and Jayakumari in [8] discuss that the text-to-speech (TTS) technology
may be used in several languages, including those that have limited usage. Text-to-
speech (TTS) systems produce spoken representations based on textual input. While
the process of generating speech is somewhat intricate, the significant difficulty in
text-to-speech (TTS) is achieving a natural and authentic expression from the speaker.
This study presents a highly accurate and efficient text-to-speech (TTS) conversion
system for the Tamil language. This research study focuses on the development of a
deep learning approach called Deep Quality Speech Recognition (DQSR) specifically
tailored for Tamil language text-to-speech (TTS) applications.
13 Text-to-Speech Conversion for Gujarati Language Using Deep Learning … 165

5 Methodology

5.1 Text Preprocessing

The pre-processing step of the text-to-speech system is when two essential tech-
niques, text analysis and text normalization, come into action. When dealing with
Gujarati phrases, it is important to note that many terms can be found in a basic dictio-
nary. As a result, the terminology included within the dataset is used to construct this
dictionary, ensuring the correctness and relevance of the linguistic elements used in
the process [9].
In the pre-processing module, a significant role is played by the translation of
abbreviations, numerical values, and acronyms into their corresponding comprehen-
sive textual representations. The purpose of this transformation is to guarantee that
the synthesized speech adheres to the desired language rules while preserving clarity
and comprehensibility. Additionally, pre-processing contributes to enhancing the
overall language analysis and synthesis process by performing the task of efficiently
segmenting incoming texts into word clusters [10].

5.1.1 Text Analysis

The text analysis component is responsible for preparing the incoming text by
analyzing and organizing it into a manageable list of terms. The system incorpo-
rates numerical values, shortened forms, initialisms, and idiomatic expressions and
converts them into complete written language as necessary. One notable challenge at
the character level is the ambiguity of punctuation marks, particularly in determining
the conclusion of sentences. This issue can be resolved to some extent using basic
regular grammar [11].

5.1.2 Text Normalization

Text normalization refers to the process of converting text into a form that may
be easily spoken. Prior to any text processing, it is common practice to undertake
text normalization, which involves standardizing the text. This is typically done in
preparation for tasks such as generating synthesized voices or automated language
translation. The primary goal of this technique is to recognize punctuation marks and
intervals between words. The text normalization procedure often involves removing
punctuations, accent marks, stop words (commonly used words), and other diacritics
from letters [11].
During this phase, an examination of the document’s structure is conducted.
This procedure operates at two levels: sentence-level tokenization, which involves
dividing text into individual sentences, and word-level tokenization, which involves
breaking down text into individual words. Identifying the boundaries of a sentence in
166 V. Narvani and H. Arolkar

sentence tokenization can be a challenging task. For instance, it is not required that
the demarcation of sentence boundaries be only determined by the use of periods;
alternative punctuation marks such as colons or double quotation marks may also
serve this purpose. Word tokenization, on the other hand, involves the process of
converting non-standard words into their standardized form [12–14].

5.2 Speech Synthesis

Speech synthesis refers to the process of converting written text into audible
speech through the generation of corresponding waveforms, which encompasses
three distinct methodologies, namely articulatory synthesis, formant synthesis, and
concatenative synthesis [15, 16].

5.2.1 Articulatory Synthesis

Articulatory synthesis relies on physical models of the human speech production

system. It entails replicating the sound-producing capabilities of the vocal tract and
its fluid movements. This approach involves constructing an articulatory model that
mimics the configuration of the vocal tract based on the positioning of the phonatory
organs, such as the lips, jaw, tongue, and velum. The signal is derived using a math-
ematical simulation that models the airflow throughout the vocal tract. The control
parameters of a synthesizer in articulatory synthesis include subglottal pressure, vocal
cord tension, and the relative positioning of various articulatory organs. The synthe-
sizer thereafter replicates the vocal tract model to match the articulatory form. The
challenges encountered in this method are acquiring precise three-dimensional vocal
tract representations and modeling the system using a restricted range of parame-
ters. S. Martincic-Ipsic highlights a primary challenge in articulatory synthesis as
the insufficient understanding of the intricate human articulation organs, hindering
the achievement of high-quality speech synthesis [17].

5.2.2 Formant Speech Synthesis

Formant speech synthesis relies on defining the resonance frequencies of the vocal
tract through a set of criteria. The formant method uses the source-filter voice
synthesis paradigm. The goal is to create source signals that are both periodic and
non-periodic, and then send them through a filter or resonator circuit that emulates the
characteristics of the vocal tract. Although rule-based formant synthesis is capable
of producing high-quality speech, the difficulties in precisely estimating the vocal
tract models and source variables may cause the voice to sound unnatural. The funda-
mental frequency, the relative intensities of the voiced and unvoiced source indicators,
and the degree of voicing are typically the lowest number of modifiable parameters.
13 Text-to-Speech Conversion for Gujarati Language Using Deep Learning … 167

During each phoneme, the settings controlling the source signal and the vocal tract
filter’s frequency response are modified. The resonators can be connected in either a
parallel or cascade pattern to create the vocal tract model [17].

5.2.3 Concatenative Synthesis

Concatenative synthesis is a technique used in speech synthesis that involves the

creation of artificial speech by combining segments of pre-recorded speech. Concate-
native speech synthesis relies on the process of combining pre-existing, recorded
utterances that possess a genuine and authentic quality. Also referred to as “cut and
paste synthesis”, this technique involves the selection of brief fragments of recorded
speech from a pre-existing database, which are then concatenated in sequence to
generate the intended utterances. The primary advantage of this methodology lies in
its simplicity, as it does not necessitate the use of a mathematical model. The use of
human-like, natural language produces speech [18].
Concatenative synthesis is a voice synthesis technique that relies on the sequential
arrangement, or concatenation, of pre-recorded speech fragments. In general, it may
be asserted that concatenative synthesis is known for its ability to generate synthe-
sized speech that closely resembles natural human speech. Concatenative synthesis
comprises three primary subtypes: unit selection synthesis, diphone synthesis, and
domain-specific synthesis [18].
The study’s overview of different speech synthesis methods has been organized
and presented in Table 2.

Table 2 Summary of the speech synthesis methodologies

Techniques Remarks
Articulatory synthesis The sound quality may not be optimal due to the complexity of
accurately simulating the human pronunciation process
Not only is acquiring data for the articulatory model challenging, but
also designing and controlling the model with parameters is difficult
Another difficulty is to control the high-level parameters such as
prosodic features [17]
Format synthesis There are artifacts and the simulated voice sounds less natural.
Furthermore, it is challenging to define synthesis rules [17]
Concatenative In concatenative synthesis, challenges arise from the discontinuities at
synthesis the unit boundaries and the artificial, stiff feeling of the prosody. Some
unexpected errors can also lead to unstable results in the system [18]
Deep learning-based More genuine and human-like speech can be produced by deep
synthesis learning-based TTS systems than by previous techniques; however,
training deep learning TTS models requires a substantial quantity of
high-quality data, such as text and associated audio recordings, which
can be difficult and costly to gather [19, 20]
168 V. Narvani and H. Arolkar

6 Flow of the System

In this section, we are proposing a speech synthesis system which operates through
the subsequent stages. A detailed description of the stages as per the flow diagram
is shown in Fig. 1.

1. Load Library and Declare Variable: A load library contains files prepared to
be executed and a variable is generated when some word is assigned to it. The
text assigned to a variable determines the data type of that variable.
2. Load Speech and Text Dataset: Datasets have the capability to be loaded from
local files that are saved on the user’s PC.

Fig. 1 Flow diagram of the proposed system

13 Text-to-Speech Conversion for Gujarati Language Using Deep Learning … 169

3. Divide data in Training and Testing: The concept of the data divide refers
to the existing disparity between individuals or entities possessing the necessary
capabilities to effectively collect, store, mine, and manage vast quantities of data,
and those individuals or entities from whom the data is being collected.
4. Apply MFCC and Log Filter for feature extraction: The Mel frequency
cepstral coefficients (MFCCs) are a representation of a sound’s short-term power
spectrum. It is a feature extraction approach for speech and audio analysis. This
converts raw audio signals into a compact representation that captures significant
frequency and temporal information. Additionally, a log filter is a useful tool for
real-time processing of very large log files.
5. Apply Feature Extraction: The process of removing various characteristics
from a speech signal, such as pitch, vocal tract anatomy, and power, is known
as feature extraction. Parameter transformation is the subsequent process of
converting these features into signal parameters by means of differentiation and
concatenation.
6. Apply Sequential Model (CNN): Convolution Neural Network (CNN) was
used to extract and train the speech characteristics. CNN was used as the
most sophisticated deep learning approach, and its performance for voice signal
identification as a multiclass classification process was explored. CNN-based
models are trained to detect patterns in text, such as important words, automati-
cally. For feature extraction, the majority of CNN-based models employ a one-
dimensional (1-D) convolution process, succeeded by an average or maximum
one-dimensional pooling procedure.
7. SoftMax Classifier on Test Data: Employ a SoftMax classifier by assigning a
label to a middle word and concatenating all word vectors surrounding it.
8. Classify Audio based on Confusion Matrix: Utilize a confusion matrix to clas-
sify audio, a matrix designed to evaluate the performance of classification models
with a given set of test data.
9. Analysis: Perform a detailed analysis of the results and conclude the process.

6.1 Mel Frequency Cepstral Coefficients MFCC

The Mel Frequency Cepstral Coefficients (MFCCs) are numerical coefficients

utilized for the representation of audio signals. The origins of these representations
can be traced back to a cepstral representation of the audio file. The distinction is in
the mel-frequency cepstrum (MFCC), where the cepstrum coefficients are obtained
by applying a cosine transform to the logarithm of the short time energy spectrum,
which is expressed on a Mel-frequency scale [22].
The key advantage of employing Mel Frequency Cepstral Coefficients (MFCC)
lies in the incorporation of Mel frequency scaling, which closely approximates the
auditory perception of the human ear. Figure 2 illustrates a block diagram depicting
the structure of a Mel Frequency Cepstral Coefficient (MFCC) computing procedure
[21].
170 V. Narvani and H. Arolkar

Fig. 2 Block diagram of

MFCC computation process
[21]

The diagram in Fig. 1 illustrates the fundamental stages involved in the compu-
tational procedure for deriving cepstral coefficients from a speech signal. These
procedures are outlined below.
• Prioritize the input signal.
• Conduct a Fourier brief interval analysis to get a magnitude spectrum.
• Construct a Mel-spectrum from the magnitude spectrum.
• Use the power spectrum’s log operation (also known as the Square of the Mel-
spectrum).
13 Text-to-Speech Conversion for Gujarati Language Using Deep Learning … 171

Utilizing the log-Mel power spectrum, apply the Discrete Cosine Transform
(DCT) to extract Cepstral features and execute Cepstral.
Step 1: Pre-Emphasis: This step involves passing the isolated word sample
through a filter that amplifies higher frequencies. It will amplify the signal’s energy
at higher frequencies.
Step 2: Framing: The voice stream is segmented into frames, each consisting of
brief durations lasting 20–3 ms. The voice stream is divided into N samples, with
adjacent frames distinguished by M (where M is less than N). Commonly, standard
values for M are 100, and for N, they are set to 256. Speech framing becomes neces-
sary due to the temporal variability of the signal. Nevertheless, when the signal is
analyzed over a sufficiently short period, its characteristics remain relatively constant.
Consequently, a brief spectrum analysis is conducted.
Step 3: Each frame mentioned above is multiplied by a Hamming window to
maintain signal continuity. To mitigate this discontinuity, a window function is
employed, gradually decreasing the amplitude of the voice sample to zero at the
frame’s beginning and end as shown in Eq. (1). This helps reduce spectral distortion.

Y (n) = X (n) ∗ W (n) (1)

where W (n) is the window function.

Step 4: The Fast Fourier Transform (FFT) is a mathematical algorithm used to
transfer data from the time domain to the frequency domain. In order to determine the
magnitude and frequency response of each frame, we utilize the Fast Fourier Trans-
form (FFT) algorithm. By utilizing the Fast Fourier Transform (FFT) algorithm, the
result obtained is a representation of the signal in the frequency domain, commonly
referred to as a spectrum or periodogram.
Step 5: Mel filter: Apply the Mel filter. In order to obtain a smooth magnitude
spectrum, we utilize a series of 20 triangular bandpass filters on the magnitude
frequency spectrum as shown in Eq. (2). Additionally, it decreases the dimensions
of the characteristics implicated.

Mel(f ) = 1125 ∗ ln(1 + f /700) (2)

Step 6: We apply Discrete Cosine Transform (DCT) to the 20 log energy

Ek obtained from the triangular bandpass filters to obtain L Mel-scale cepstral
coefficients. The DCT formula is shown in Eq. (3):

Cm = k = 1N cos [m ∗ (k − 0.5) ∗ π/N ] ∗ Ek, m = 1, 2 . . . L (3)

Here, N represents the number of triangular bandpass filters, and L represents

the number of Mel-scale cepstral coefficients. Typically, N is set to 20, and L
is set to 12. The Discrete Cosine Transform (DCT) is employed to convert the
172 V. Narvani and H. Arolkar

frequency domain into a time-like domain known as the quefrency domain, gener-
ating Mel-scale cepstral coefficients. While MFCC alone is suitable for speech recog-
nition, incorporating log energy and performing delta operations can enhance overall
performance.
Step 7: Log Energy: We can also calculate the energy within a frame, which can
be another feature for MFCC. We can enhance the feature set by calculating the time
derivatives of (energy + MFCC), which provide velocity and acceleration in Eq. (4).

Cm(t) = = −M Cm(t + τ )τ
M
= −M M τ 2 (4)
τ τ

If M equals 2, the feature dimension is 26 when velocity is added. However, when

both acceleration and velocity are added, the feature dimension increases to 39.

7 Implementation of Proposed Architecture

We have created a dataset for audio and text with sample data as shown in Fig. 3. The
system in the suggested experiment is trained using recorded wav files that contain
the numbers 1 through 9 from nine different speakers.
Next step is to showcase results or sounds in wav format using the Python wave
reader as shown in Figs. 4 and 5.

Fig. 3 Snapshot of the

dataset of audio files
13 Text-to-Speech Conversion for Gujarati Language Using Deep Learning … 173

Fig. 4 Audio waveforms of dataset

Fig. 5 Audio waveform for normalization

174 V. Narvani and H. Arolkar

Fig. 6 Confusion matrix

Figure 6 shows a confusion matrix in which we can see there are nine different
classes that are classified accuracy based on detection rate. A confusion matrix is
a tabular representation that provides a succinct summary of a machine learning
model’s performance on a specific dataset used for testing purposes. It is a method
of visually representing the number of correct and incorrect occurrences determined
by the model’s predictions.
Figure 7 displays a classification output after selecting and classifying data,
choosing the most appropriate class.
Table 3 depicts the accuracy and loss ratio of 9 different classes after applying
testing input to the training dataset. Figure 8 shows the evaluation results.
After successfully applying and completing the epoch, it will show accuracy
and other resulting parameters. While running the code, it updates and provides
a cumulative response. After applying the suggested model, Table 4 displays an
accuracy ranging from 1 to 9 digits.
13 Text-to-Speech Conversion for Gujarati Language Using Deep Learning … 175

Fig. 7 Classification output

Table 3 Accuracy of classes

Epoch Training loss Validation loss Accuracy
0 2.468000 2.414284 0.159794
1 2.441800 2.272436 0.402062
2 2.315500 2.044732 0.510309
3 2.158000 1.855133 0.564433
4 1.800300 1.684932 0.688144
5 1.733700 1.566831 0.819588
6 1.660800 1.468647 0.858247
7 1.586400 1.393256 0.884021
8 1.514800 1.337225 0.930412
9 1.416800 1.318340 0.925258

Fig. 8 Evaluation results

176 V. Narvani and H. Arolkar

Table 4 Data accuracy of

No. Accuracy
model
1 0.971
2 0.984
3 0.980
4 0.982
5 0.976
6 0.981
7 0.983
8 0.995
9 0.991

8 Conclusion

Through this research paper, we have proposed a text-to-speech detection system

based on convolutional neural networks (CNN) and mel-frequency cepstral coeffi-
cients (MFCC). We have examined the design of several deep models, exploring
several alternatives for hyper parameters such as the Dropout rate and Learning rate.
This paper focuses on a real-time database containing numbers from 1 to 9, each with
different frequency noise levels and time frames. Henceforth the proposed system not
only pre-processes and removes noise from an audio file but also extracts meaningful
features from audio. Through the developed system, classification and differentia-
tion of audio files can be done with a high level of accuracy. The paper utilizes CNN
with hybrid features like MFCC and a softMax classifier, achieving higher accuracy
compared to existing systems. Future work may involve expanding the dataset and
exploring deep learning approaches for even more accurate data classification.

References

1. Archana B, Agrawal SS, Dev A (2013) Speech synthesis: a review. IJERT

2. Kothari JJ, Kumbharana CK (2015) Designing, development and implementation of Text to
Speech algorithm for Gujarati text using concatenative methodology. IJSRP
3. Jariwala NB, Patel B Gujarati text to speech conversion: a review
4. Nikisha J, Bankim P (2015) A system for the conversion of digital Gujarati text to speech for
visually impaired people. In: CSI-2015
5. Kaveri K, Ramesh K (2012) A review: translation of text to speech conversion for Hindi
language. IJSR
6. Diwakar M, Girish NJ, Kalika B Challenges in developing a TTS for Sanskrit
7. Sajini T, Neetha G (2017) Speaker independent text to speech for Malayalam. IJEAT
8. Femina JA, Jayakumari J (2021) An efficient Tamil text to speech conversion technique based
on deep quality speech recognition
9. Rahate PM, Chandak MB (2019) Text normalization and its role in speech synthesis. Int J Eng
Adv Technol (IJEAT) 8(5S3):115–122. ISSN: 2249-8958
13 Text-to-Speech Conversion for Gujarati Language Using Deep Learning … 177

10. Onaolapo JO, Idachaba FE, Badejo J, Odu T, Adu OI (2014) A simplified overview of text-to-
speech synthesis. In: Proceedings of the world congress on engineering, vol 1. ISSN: 2078-0958
11. Sasirekha D, Chandra E (2012) Text to speech: a simple tutorial. IJSCE
12. Slobodan B, Sanda M Text normalization for Croatian speech synthesis
13. Pooja MR, Manoj C (2019) Text normalization and its role in speech synthesis. In: IJEAT
14. Pooja MR, Manoj C (2019) An experimental technique on text normalization and its role in
speech synthesis. In: IJITEE
15. Yin Z (2018) An overview of speech synthesis technology. IEEE
16. Xu T, Tao Q, Frank S, Tie-Yan L (2021) A survey on neural speech synthesis. ISCSLP
17. Helal UM (2015) A comparative study of different text-to-speech synthesis techniques. IJSER
18. Kothari JJ, Kumbharana CK (2015) A phonetic study for constructing a database of Gujarati
characters for speech synthesis of Gujarati text. Int J Comput Appl 117(19)
19. Fahima K, Farha AM, Nadia AR, Aloke KS, Muhammad FM (2022) Text to speech synthesis:
a systematic review, deep learning based architecture and future research direction. J Adv Inf
Technol
20. Yishuang N, Sheng H, Zhiyong W, Chunxiao X, Liang-Jie Z (2019) A review of deep learning
based speech synthesis. MDPI
21. Pooja P, Miral P (2017) Feature extraction of isolated Gujarati digits with Mel frequency
cepstral coefficients (MFCCs). Int J Comput Appl
22. Dani PP, Deole MS (2017) Improvement of accuracy using MFCC speech recognition. IJARIIE
Chapter 14
A Comprehensive Bibliometric Study
on AI-Guided Breast Cancer Diagnosis
and Prognosis Investigating Web
of Science and Scopus from 2016 to 2023

Emmy Bhatti , Prabhpreet Kaur , Kiranbir Kaur , and Arzoo

1 Introduction

For more than a decade, the prevalence of breast cancer has been steadily increasing. It
is one of the leading cancer-related causes of death in women. Although there has been
significant progress in breast cancer diagnosis methods, the quest for early detection
remains a work in progress [1]. Numerous critical studies and clinical trials have
significantly improved breast cancer prognosis, but several others remain unknown
to clinical maturity, implying the need for this bibliometric analysis [2]. In the past,
a few studies linked the most cited papers of the time in several domains under the
umbrella term of breast cancer, which motivated researchers to conduct such studies
with ease. This identification is critical because clinicians base their opinions on the
substantiation and significance of these studies. The most significant component of
research methodology is linked to higher citation counts and a significant impact on
the journal of publication. These parameters were the main focus of this study, along
with several other crucial factors for a more thorough analysis [3].
The main contribution of this study, compared to existing papers, lies in its compre-
hensive bibliometric analysis of AI-based breast cancer detection research from

E. Bhatti (B) · P. Kaur · K. Kaur · Arzoo

Department of Computer Engineering and Technology, Guru Nanak Dev University,
Amritsar 143001, Punjab, India
e-mail: [Link]@[Link]
P. Kaur
e-mail: [Link]@[Link]
K. Kaur
e-mail: [Link]@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 179
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
180 E. Bhatti et al.

2016 to 2023. It introduces a unique dimension by employing R Studio’s biblio-

metric and visualization packages, such as bibliometric, to scrutinize trends and
dissect highly cited articles. The study queries Web of Science and Scopus databases
using keywords like “breast cancer,” “diagnosis,” “prognosis,” “deep learning,” and
“machine learning” to aggregate and categorize papers over eight years, eliminating
redundancies. Unlike previous studies that focused on the most cited papers [4]. This
research focuses on AI-based breast cancer detection, addressing the critical need for
early diagnosis. It explores bibliometric criteria and emphasizes disruptive quality
measures.
The innovative use of disruptive scoring tools [5] in the context of breast cancer
research adds a novel dimension to bibliometric analysis. By analyzing a validated
dataset of publications between 2016 and 2023, the study aims to provide a thor-
ough analysis of the literature [6], offering insights into rates of publication and
establishing correlations between different criteria. The approach goes beyond tradi-
tional bibliometric analyses by incorporating disruptive measures and evaluating the
symmetrical distribution of breast cancer research across various reputable journals.
Overall, the study contributes valuable insights for researchers and clinicians in the
field of breast cancer, paving the way for precision research and indicating pathways
for expanded future scope.

2 Literature Review

Exploring different papers describing bibliometric analysis demonstrates that this

study succeeds in capturing a rather unique as well as state-of-the-art methodology
for generating various aspects of bibliometric analysis on the whole.
Table 1 summarizes various bibliometric analyses conducted on different aspects
of breast cancer research. Each study focuses on specific themes within breast cancer
research and identifies potential research gaps or areas requiring further investigation.
This study evaluates the disruptiveness of breast cancer research papers, expanding
analysis to include other areas of cancer research. It compares Scopus and Web of
Science databases and employs R Studio’s bibliometrics package.

3 Materials and Methods

The above figure (see Fig. 1) gives the process of selecting the papers for bibliometric
analysis. The topic used, accessibility, query prompt, record extraction, and process
of bibliometric analysis are presented in this figure.

(A) Datasets: The study obtained its dataset through a comprehensive query-
based search on Scopus and Web of Science, including Science Citation Index
Expanded journals.
Table 1 Comprehensive literature analysis
Article Description of article No. of papers Source Method used Bibliometric parameters Research gaps identified
referenced analyzed/period database analyzed
under
consideration
[7] Focus on the study of 1059 Web of Excel, Co-authorship of countries, TFEB’s role in breast
TFEB which is pivotal in Science VOSviewer, institutes, and authors, cancer’s molecular pathways
the research of and Citespace co-citation of cited authors and needs additional study
neurodegenerative references as well as keyword
diseases and tumors clusters
including breast cancer
[8] Focused on aging 100 Web of VOSviewer and Publication year, authorship, Need for research on novel
research Science Histcite publication type, keywords, interventions targeting
journal name, institution, age-related factors in breast
country cancer prevention and
treatment
[9] Deep learning on breast 2014–2021 Scopus VOSviewer and The yearly trends in Not much clinical translation
cancer image Bibliometrix publications, the networks of and real-world use of deep
classification co-authors, countries, and learning models for breast
scientific journals, as well as cancer diagnosis and
the networks of authors’ prognosis
keywords that appear together
[10] Male breast cancer 100 Web of – Parameters such as the subject, Potential gaps in awareness,
14 A Comprehensive Bibliometric Study on AI-Guided Breast Cancer …

Science author, journal, year, and screening, and management

country of publications strategies for male breast
cancer patients
(continued)
181
Table 1 (continued)
182

Article Description of article No. of papers Source Method used Bibliometric parameters Research gaps identified
referenced analyzed/period database analyzed
under
consideration
[11] Automated diagnosis of 1216 Web of – Dissects papers to unravel Possible limitations in
various diseases with the Science and different insights such as the knowing how automated
help of machine Scopus most prolific authors, diagnosis affects healthcare
learning-based countries, and organizations as delivery and patient
techniques well as articles cited with the outcomes
highest frequencies
[12] Literature concentrates 20 years Scopus Biblioshiny, Examines the increase in Potential gaps in the
on the computer-aided VOSviewer, publications and sources, the translation of research
detection of cancers and Word author’s wealth, keyword findings into clinical practice
based on medical Cloud research, the number of times and patient care
imaging an article is cited, and other
factors
[13] Breast cancer studies – – – The h-index, impact factor, Potential weaknesses in the
specific to Pakistan journal quartile, and the Pakistani population’s
number of publications and customized breast cancer
citations are some of the prevention, screening, and
factors that have been used treatment programs
[14] Breast cancer care, – Scopus Biblioshiny, There are details about the top Possible omissions from our
helping to narrow down VOSviewer journals, articles that get the knowledge of differences in
the treatment range for highest citations, most outcomes and access to care
patients and increasing important authors, and research between various
patient survivability centers that do this kind of socioeconomic and
work. On top of that, there is demographic groups
authorship co-occurrence
analysis and keyword analysis
(continued)
E. Bhatti et al.
Table 1 (continued)
Article Description of article No. of papers Source Method used Bibliometric parameters Research gaps identified
referenced analyzed/period database analyzed
under
consideration
[15] Breast cancer studies in 2734, from 2009 Web of CitespaceII Reveals some basic insights, Significant exclusions from
the domain of nursing to 2018 Science e.g., the year when the an understanding of how
publications under scrutiny nursing care affects treatment
peaked compliance and long-term
results in breast cancer
survivors
[16] Advocates the use of 2928, from the PubMed Look at the country, the first Possibilities for
machine learning for the year 2015 to author, the journal, the shortcomings in addressing
detection of breast cancer 2019 institutional collaborations, issues with algorithmic bias,
and the number of times author data privacy, and the use of
keywords appear together machine learning
technologies by clinicians
[17] Breast cancer stem cells – Scopus and Citespace Clustering results indicate that Known gaps in the
the web of breast cancer was identified to therapeutic applications of
science be the most researched and stem cells for breast cancer
heavily cited topic treatment derived from
preclinical research
[18] Concentration on From 1976 to Web of VOSviewer and Parameters analyzed include Limitations in knowing
14 A Comprehensive Bibliometric Study on AI-Guided Breast Cancer …

integrative and 2017 Science SciMAT production trends, country patient preferences, safety,
complementary oncology collaboration, and leading and efficacy of integrative
research topics therapy in breast cancer
treatment
(continued)
183
Table 1 (continued)
184

Article Description of article No. of papers Source Method used Bibliometric parameters Research gaps identified
referenced analyzed/period database analyzed
under
consideration
[19] Triple-negative breast 1932, From Scopus A mix of Citations and productivity The challenges in identifying
cancer and 2012 to 2017 VOSviewer, the mechanisms of action and
Nanotechnology STATA, Excel, potential toxicity of
and R-tools nanotechnology-based
therapeutics in breast cancer
patients
[20] Breast cancer From the years – R tool with a – Need for clarification on the
2007 to 2017 bibliometric specific aspects of breast
analysis cancer research addressed in
package the study
[21] Perspective of Indian 40 years Scopus – Performance analysis Challenges in leveraging
Breast Cancer Research employing institutions, indigenous knowledge and
journals, authors, and their resources for addressing
citation impact with the Hirsch breast cancer disparities in
Index India
[22] Breast cancer from the 3529, From Scopus Lotka’s law and Collaborative authorship Potential limitations in
Indian perspective 2005 to 2014 Bradford’s law addressing regional
differences in breast cancer
incidence, biology, and
healthcare access in India
(continued)
E. Bhatti et al.
Table 1 (continued)
Article Description of article No. of papers Source Method used Bibliometric parameters Research gaps identified
referenced analyzed/period database analyzed
under
consideration
[23] Breast cancer radiation – Scopus R-Studio with Growth trends, author There may be areas where
therapy: a bibliometric bibliometrix R productivity, relevant journals, our understanding of patient
analysis of the scientific package themes analysis (niche, preferences and
literature emerging, hot, essential) decision-making regarding
radiation therapy options for
breast cancer treatment is
incomplete
[24] Evolution of research ––- ––- Bibliometrix (R Trends in literature studies Possible deficiencies in
trends in artificial package) published, productivity of dealing with regulatory and
intelligence for breast countries, relevant authors, ethical issues in the
cancer diagnosis and affiliated institutions, leading implementation of AI
prognosis over the past journals, and emerging topics technologies for breast
two decades: A cancer diagnosis
bibliometric analysis
[25] A Comprehensive 985, 2019–2023 Web of Bibliometrix (R Comprehensive analysis of Significant limitations in
Bibliometric Analysis of Science Core package) literature on breast cancer comprehending the influence
Deep Learning Collection segmentation using deep of deep learning-based
Techniques for Breast learning techniques segmentation on treatment
Cancer Segmentation: planning and patient results
14 A Comprehensive Bibliometric Study on AI-Guided Breast Cancer …

Trends and Topic in breast cancer

Exploration (2019–2023)
185
186 E. Bhatti et al.

Fig. 1 The analysis process followed

Fig. 2 Query prompted in the search field of databases

(B) Search and Extraction: This study employed basic and advanced queries in
WOS and Scopus to find breast cancer papers from 2016 to 2023 [26] (see
Fig. 2).

(C) Tools Used: Datasets were analyzed extensively, employing author/citation

analysis, co-citation networks, and R Studio for breast cancer research in
computer science (2016–2023).

4 Results and Discussion

An assessment of WOS as well as SCOPUS papers is conducted. The span of cita-

tions of the most substantiated 801 papers, of which 330 were from WOS and 649
from SCOPUS databases, ranged from 328 to 5028 times; the top five cited papers
were used over 1000 times by fraternity authors during the established publication
period. Clinical and introductory wisdom exploration papers dominated the top 801
breast cancer citations. Prospective cohorts, case reports, case series, cross-sectional
studies, randomized controlled trials, beast studies, biochemical and in vitro research
were considered. This study also published conference and review papers.
Bibliometric assessment of Scopus and Web of Science findings along several
factors yielded the observations below.
Average Article Citations Per Year Interpretation (see Fig. 3): The bar chart illustrates
the hypothetical number of publications per year from 2016 to 2023. The number of
publications increases steadily over the years, showing consistent growth in research
14 A Comprehensive Bibliometric Study on AI-Guided Breast Cancer … 187

Fig. 3 Average article citations per year

output. Each bar represents a single year, with 2023 exhibiting the highest number
of publications among the depicted years.
Country-wise Productivity Analysis: The bar graph shows 10 nations’ research
production via Web of Science and Scopus papers and partnership types (see Fig. 4).
China is the biggest donor, sending 40 documents to Scopus and 40 to Web of
Science. The US and India produce significant research in both databases, trailing
behind. The image includes MCP (Multiple Collaboration Pattern) and SCP (Subject
Collaboration Pattern), which shed light on research partnerships. Malaysia, which
embodies SCP, and the Netherlands, which represents MCP, serve as prime examples
of the varied partnership patterns that exist across nations. The collaborative efforts
in Malaysia are primarily subject-oriented, whereas the Netherlands invests heavily
on research, as shown by its diversified collaboration patterns.
The narrative summarizes international collaboration and comparative study
results. The image highlights nations with similar research pathways and unique
collaboration preferences. Policymakers, researchers, and institutions seeking to
understand global research, foster international cooperation, and effectively manage
resources will find the material invaluable. The map provides a global overview of
research, emphasizing the quantity and quality of current academic research.
Author Productivity Analysis: Based on Scopus and Web of Science submissions
and journal publications, the bar graph displays the most productive writers (see
Fig. 5). Famous authors include CHANG, whose 15 Scopus publications influenced
academia. CHUN, CHDI MS, CHEN Y, CHEN J, CHANG JS, ARYA I, LIUZY,
CHEN C, and WANG Y have two Web of Science papers. The graphic depicts the
author’s output and publishing locations. Every Web of Science author is linked to
188 E. Bhatti et al.

Fig. 4 Most productive countries

this journal. Author CHANG distinguishes himself with many Scopus publications
without naming the journal.
It shows magazine authorship and publication concentration. This information
can help readers, institutions, and research evaluators understand writers’ scholarly
impact and different academic contributions. These visual representations also show
the complexity of scholarly activity by identifying hardworking scholars and their
chosen academic periodicals.

Fig. 5 Most productive authors

14 A Comprehensive Bibliometric Study on AI-Guided Breast Cancer … 189

Fig. 6 Annual scientific production

Annual Scientific Production: The line chart depicts the annual scientific index from
2016 to 2023 for Web of Science (WOS) and Science Citation Index (SCI). WOS
index fluctuates initially, peaking in 2017, while SCI index remains relatively stable.
Both indices converge and show similar trends from 2020 onwards, with WOS
surpassing SCI in 2023 (see Fig. 6).
The Co-citation Network: This network visually represents the connectivity between
Web of Science (WOS) and Scopus papers, indicating shared citations. It showcases
the intellectual interdependence and collaborative nature in scholarly research. For
example, it has been observed that papers similar to authors “Beniogo 2009,” “Xie
2017,” and “Wang 2014” had been cited collectively with the best frequency in WOS
while Bray 2018, Sudharshan 2019, Arrival 2018, etc., are highest among all in the
co-citation of Scopus.
Historical Direct Citation Network: The analysis of past direct mutual citations
reveals a low interlink density and transitivity, indicating limited correlation. The
most productive year was 2017, with deviations observed in 2023(see Fig. 7).
Keyword Co-occurrences: For 330 WOS database papers, bibliometric analysis
analyzes keyword occurrence. Max keywords are displayed in max size. R calls this
a word cloud. The word classification has been the most common keyword. Other
frequent keyword occurrences include breast cancer segmentation. Keyword evalu-
ation from the SCOPUS database shows that categorization is most often utilized.
The survival keyword is also used. The word cloud retrieved after this evaluation
shows that researchers prefer breast cancer classification.
190 E. Bhatti et al.

Fig. 7 Historical direct citation network

Conceptual Structure Map: Next, plot the conceptual structural map (see Fig. 8). A
conceptual structural map links keywords to the paper’s major analysis. Classification
precedes diagnosis in most WOS articles. Most SCOPUS publications establish an
identical structural map from classification to diagnosis.
Topic Dendrogram: The topic dendrograms reveal that segmentation, classification,
and diagnosis keywords are most commonly used in all the 801 papers obtained from
both databases (see Fig. 9). This means that papers related to these topics are most
commonly accepted in the Web of Science as well as Scopus directories.
Factorial Maps of Most Cited Documents: Factorial mapping highlights the distinc-
tiveness of breast cancer research. Most focused on disease detection, with image-
based datasets being popular. Citations indicate 2020 cited 2017 publications, show-
casing advancements in AI methods [27] (see Fig. 10). Thus, breast cancer research
may continue with better methods.

Fig. 8 Conceptual structure map

14 A Comprehensive Bibliometric Study on AI-Guided Breast Cancer … 191

Fig. 9 Topic dendrogram

Fig. 10 Factorial maps of most cited documents

5 Research Gaps

The study finds many gaps in AI-guided breast cancer detection and prognosis liter-
ature. First, there are few studies on the ethical and social effects of AI breast
cancer diagnosis. The ethics of patient privacy, consent, and AI algorithm biases
must be examined. Second, the survey highlights China’s scientific output but also
laments a lack of international engagement. Addressing this gap could boost inter-
national cooperation and breast cancer AI applications. The study also emphasizes
192 E. Bhatti et al.

the need for further AI algorithm performance comparisons. Comprehensive assess-

ment could help choose the most successful and dependable clinical models. Finally,
end-users including healthcare professionals and patients’ views on AI systems for
breast cancer diagnosis remain unknown [28]. Successful integration into real-world
healthcare requires exploring these perspectives. Closing these shortcomings could
help develop ethical, collaborative, comparative, and user-centric AI-guided breast
cancer detection systems.

6 Conclusion

This bibliometric study provides a comprehensive analysis of AI-guided detection

and prognosis of breast cancer research from 2016 to 2023. Utilizing R studio’s
bibliometric package, the study examined 801 highly cited articles from Web of
Science and Scopus, revealing trends, dynamics, and scientific outputs in the field.
The analysis covers parameters such as average citations per year, most produc-
tive countries and authors, annual scientific production, co-citation networks, histor-
ical direct citation networks, keyword co-occurrences, conceptual structure maps,
topic dendrograms, and factorial maps. The findings highlight China as the leading
contributor, the prominence of classification and segmentation keywords, and a shift
from fuzzy-based to optimization-based techniques in breast cancer research. The
study concludes that the field holds immense potential for further investigation and
advancements, providing valuable insights for researchers and clinicians involved in
breast cancer diagnosis.

References

1. Sun YS, Zhao Z, Yang ZN, Xu F, Lu HJ, Zhu ZY, Shi W, Jiang J, Yao PP, Zhu HP (2017) Risk
factors and preventions of breast cancer. Int J Biol Sci 13:1387. [Link]
21635
2. Waks AG, Winer EP (2019) Breast cancer treatment: a review. JAMA 321:288–300. https://
[Link]/10.1001/JAMA.2018.19323
3. Key TJ, Verkasalo PK, Banks E (2001) Epidemiology of breast cancer. Lancet Oncol 2:133–
140. [Link]
4. Lo PK, Sukumar S (2008) Epigenomics and breast cancer. Pharmacogenomics 9:1879–1902.
[Link]
5. Polyak K (2007) Breast cancer: origins and evolution. J Clin Invest 117:3155–3163. https://
[Link]/10.1172/JCI33295
6. Fan L, Strasser-Weippl K, Li JJ, St Louis J, Finkelstein DM, Yu KD, Chen WQ, Shao ZM,
Goss PE (2014) Breast cancer in China. Lancet Oncol 15. [Link]
5(13)70567-9
7. Zhou R, Lin X, Liu D, Li Z, Zeng J, Lin X, Liang X (2022) Research hotspots and trends
analysis of TFEB: a bibliometric and scientometric analysis. Front Mol Neurosci 15:854954.
[Link]
14 A Comprehensive Bibliometric Study on AI-Guided Breast Cancer … 193

8. Haroon, Li Y-X, Ye C-X, Ahmad T, Khan M, Shah I, Su X-H, Xing L-X (2022) The 100 most
cited publications in aging research: a bibliometric analysis. Electron J Gen Med 19. https://
[Link]/10.29333/ejgm/11413
9. Khairi SSM, Bakar MAA, Alias MA, Bakar SA, Liong CY, Rosli N, Farid M (2022) Deep
learning on histopathology images for breast cancer classification: a bibliometric analysis.
Healthcare (Switzerland) 10. [Link]
10. Kwok HT, Van M, Fan KS, Chan J (2022) Top 100 cited articles in male breast cancer: a
bibliometric analysis. [Link]
11. Ahsan MM, Luna SA, Siddique Z (2022) Machine-learning-based disease diagnosis: a
comprehensive review. [Link]
12. Kore M, Naik DN, Chaudhari D (2021) A bibliometric approach to track research trends in
computer-aided early detection of cancer using biomedical imaging techniques. J Scientometric
Res 10:318–327. [Link]
13. Ahmad S, Ur Rehman S, Iqbal A, Farooq RK, Shahid A, Ullah MI (2021) Breast cancer
research in Pakistan: a bibliometric analysis. Sage Open 11. [Link]
40211046934
14. Joshi SA, Bongale AM, Bongale A (2021) Breast cancer detection from histopathology images
using machine learning techniques: a bibliometric analysis. Libr Philos Pract 2021
15. Özen Çınar İ (2020) Bibliometric analysis of breast cancer research in the period 2009–2018.
Int J Nurs Pract 26. [Link]
16. Salod Z, Singh Y (2020) A five-year (2015 to 2019) analysis of studies focused on breast
cancer prediction using machine learning: a systematic review and bibliometric analysis. J
Public Health Res 9. [Link]
17. Liu J, Qiu XC (2020) Research hotspots and trends of breast cancer stem cells. J Shanghai
Jiaotong Univ (Med Sci) 40:881. [Link]
18. Moral-Munoz JA, Carballo-Costa L, Herrera-Viedma E, Cobo MJ (2019) Production trends,
collaboration, and main topics of the integrative and complementary oncology research area:
a bibliometric analysis. Integr Cancer Ther 18. [Link]
ASSET/IMAGES/LARGE/10.1177_1534735419846401-[Link]
19. Handerson R, Teles G, Fernando H, Márcia M, Cominetti R, Moralles HF, Cominetti MR
(2018) Global trends in nanomedicine research on triple-negative breast cancer: a bibliometric
analysis. Int J Nanomed 13:2321–2336. [Link]
20. Vakharia PP, Kakish D, Tadros R, Riutta J (2017) Bibliometric analysis of breast cancer-related
lymphoedema research published from 2007–2016. J Lymphoedema 12
21. Ram S (2017) Indian contribution to breast cancer research: a bibliometric analysis. Ann Libr
Inf Stud 64
22. Singh N, Handa TS, Kumar D, Singh G (2013) Mapping of breast cancer research in India: a
bibliometric analysis. Curr Sci 110
23. Lin L, Liang L, Wang M, Huang R, Gong M, Song G, Hao T (2023) A bibliometric analysis of
worldwide cancer research using machine learning methods. [Link]
24. Franco P, De Felice F, Jagsi R, Nader Marta G, Kaidar-Person O, Gabrys D, Kim K, Ramiah D,
Meattini I, Poortmans P (2023) Breast cancer radiation therapy: a bibliometric analysis of the
scientific literature. Clin Transl Radiat Oncol 39. [Link]
25. Zaman S (2023) A comprehensive review: bibliometric analysis of decision tree-based
approaches for breast cancer prediction
26. Cardoso F, Kyriakides S, Ohno S, Penault-Llorca F, Poortmans P, Rubio IT, Zackrisson S,
Senkus E (2019) Early breast cancer: ESMO clinical practice guidelines for diagnosis, treatment
and follow-up. Ann Oncol 30:1194–1220. [Link]
27. Prerita, Sindhwani N, Rana A, Chaudhary A (2021) Breast cancer detection using machine
learning algorithms. In: 2021 9th International conference on reliability, Infocom technologies
and optimization (trends and future directions), ICRITO 2021. [Link]
51393.2021.9596295
28. Akram M, Iqbal M, Daniyal M, Khan AU (2017) Awareness and current knowledge of breast
cancer. Biol Res 50. [Link]
Chapter 15
Predictive Modeling of Cardiovascular
Disease Using Feedforward Neural
Networks

Dhaval Joshi, Bhavya Singh, Seema Kalonia, Ajay Kumar Kaushik,

Namita Goyal, and Sunil Maggu

1 Introduction

CVD is a leading cause of morbidity and mortality and poses a major public health
burden with devastating costs in terms of human suffering, disability, and deaths.
The accurate and timely diagnosis of CVD is a critical need, which allows for the
delivery of interventions that not only reduce the burden of treatment on patients
but also maximize clinical outcomes. The increasing availability of healthcare data,
along with breakthroughs in deep learning models, has motivated many researchers
to utilize these resources for the purpose of CVD prediction and diagnosis.
The following paper significantly extends the existing efforts in the field of predic-
tive healthcare by providing a thorough assessment of the performance of the ML
and DL models in the context of cardiovascular diseases’ prediction.
The case with the FNNs organized under deep learning schema holds immense
potential for identifying and even predicting CVDs more effectively and efficiently,
which in turn promises to bring a shift in the healthcare management model towards
a more precise and targeted one. Therefore, constant striving for improvement of the
mentioned area through new studies and innovations in the sphere can be essential for
the healthcare community to better approach the complex issues caused by CVDs,
and thus, improve patients’ quality of life and minimize the social costs of diseases.
In summary, this research is useful in addressing a current gap in the literature in
a comparison of machine learning and deep learning models for predicting CVD. In
addition, this study has helped to progress the field by demonstrating that FNNs with

D. Joshi (B) · B. Singh · S. Kalonia · A. K. Kaushik · N. Goyal · S. Maggu

Department of IT, Maharaja Agrasen Institute of Technology, Delhi, India
e-mail: [Link]@[Link]
S. Maggu
e-mail: sunilmaggu@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 195
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
196 D. Joshi et al.

dropout regularization yield higher accuracy in predicting the outcome. This paper
provides the basis for the next research projects to enhance the methodologies used
in predicting risk of CVD in an effort to provide health solutions that will prevent
further occurrences of the disease.
The relevance of innovation discipline in the concurrent paradigm of managing
the emergence and interplay of the challenges of public health is manifested by the
need to incorporate the use of innovation technologies and methodologies for health
systems to evolve and apply efficiencies.

2 Literature Review

Over the past few years the number of works that use deep learning models for
the CVD prediction has increased, they include different architectures and method-
ologies to improve prediction and explain more complex patterns behind CVD
occurrence and development.
In a related study, Garcia-Ordas et al. (2023) have applied a novel model, specifi-
cally the CNN with a sparse autoencoder network to 11 crucial cardiovascular health
parameters dataset. Their results, which reported an approximate 90% accuracy of the
conventional neural network approach to deep learning, point to further proof of the
effectiveness of deep learning in CVD prognosis and areas that need enhancements
of the analysis of cardiovascular health predictive models [1].
Subramani et al. (2023) provided a worthy contribution employing multiple
machine learning models of which deep learning models were incorporated to iden-
tify a set of features capturing essential aspects of cardiovascular health. The effec-
tiveness of their test results stands at 96% accuracy levels in CVD prediction with
the use of machine learning and deep learning and they identify areas of research in
the advancement of the techniques in CVD prediction [2].
Bhatt et al. (2023) have put forward an approach, which utilizes machine learning
algorithms including decision tree, XGBoost and random forest, and multilayer
perceptron to classify a real-world dataset with 70,000 instances imported from
Kaggle; elements of cardiovascular health being utilized. They included their find-
ings, and their ability to accurately diagnose up to 87% of the offenders showcased
their research paper skills [3].
In the study by Ahmad et al., the authors combined CNN and Bi-LSTM to perform
CVD prediction relying on data with a 94.5% accuracy rate. Their work goes a long
way in supporting hybrid models for the career advancement of cardiovascular health
research to facilitate the discovery of better enhancing predictive models [4].
Following the same idea, Mehmood et al. (2021) also prove that deep learning
solutions are feasible by using the CNN method in predicting CVD and achieving
an astonishing accuracy of 97%. As findings, their paper highlighted the importance
of applying sophisticated machine learning techniques especially in cardiovascular
diseases especially in the making of better and more accurate models for predictions
15 Predictive Modeling of Cardiovascular Disease Using Feedforward … 197

that would open up more opportunities for further research and analysis of deeper
level learning systems [5].
Sharma and Parmar proffered a distinct deep learning framework for CVD prog-
nostication, dwelling upon an optimized deep neural network utilizing the tool, Talos
in 2020. Using a dataset of 14 crucial features associated with cardiovascular disease,
they offered promising results when developing the basics of an efficient predictive
model of overall cardiovascular health [6].
Bharti et al. (2020) proposed a novel framework combining machine learning and
deep learning approaches. Their assessment on a dataset from University of Cali-
fornia, Irvine (UCI), which contains 14 important features relative to cardiovascular
disease, demonstrated a fairly impressive accuracy of 94.2%. This study stresses how
important the application of the hybrid machine learning method is in the context of
furthering knowledge on cardiovascular health [7].
Pasha et al. (2020) developed a new framework based on deep learning in which
the best accuracy rate of 85% of cardiovascular disease prediction was obtained by
ANNs. Their input is significant in the development of the existing literature that
supports the use of deep learning approaches to accurately estimate cardiovascular
health prognoses [8].
Altogether, these works proved that interest to apply deep learning approaches in
cardiovascular disease prediction raises.

3 Methodology

3.1 Classic Methods

A number of traditional techniques have been used to establish the relationship

between the identified outcomes and the proposed framework.
Logistic Regression. Logistic regression is another type of the supervised learning
method based on the binary classification which tries to estimate the exact probability
for the input data to fall under a certain class by using logistic or sigmoid function.
While simple, its results are interpretable and it is essential in developing more
complex models.
Support Vector Machine. One of the advanced algorithms used in the umbrella of
supervised learning specializes in classifications and regressions. It refers to finding
the equi-penalized hyperplane that separates instances belonging to different classes
with the largest margin. SVMs run optimally in spaces of several dimensions, and
while dealing with non-linear data, they are facilitated by kernel functions.
K-Nearest Neighbor. K-Nearest Neighbor (KNN) is an instance-based learner
assumed for both classification and regression tasks in machine learning. In KNN,
points are classified based on the majority class of the K-nearest neighbors or points
of the feature space.
198 D. Joshi et al.

Random Forest. Random Forest, a decision tree-based learning technique that uses
set data samples for assembly learning, has gained tremendous ground in dealing
with both classification and regression projects. In the training process, it generates
many decision trees and then boils down all their forecasts. For classification, to
generate the final class label prediction, it uses the mode of the output probability
distribution, while for regression, it uses the mean of the predicted values to give the
final output value.
XGBoost with random forest as the base estimator. XGBoost stands for Extreme
Gradient Boosting and is another successful gradient boosting technique used
primarily for classification and regression. It works by building a large number of
weak learners, mainly such decision trees in a successive manner with the intention
of reducing the loss function.

3.2 Proposed Method

The outlined model is a form of the Feedforward Neural Network (FNN) with an
architecture of the multilayer perceptron with the Dropout method of regularization.
Data Collection and Preparation. Cross-sectional study was conducted and data
was extracted from a file including important information on cardiovascular diseases
(CVD). The information in this dataset includes data about age, sex, and cholesterol
along with other medical parameters’ details. Such detailed data helps in performing
a better analytical study and helps in enhancement of existing and new policies to
manage the CVD in a better way.
Data Preprocessing. To begin with, the dataset was divided into the features, which
are represented using the symbol “X”, and the target variable symbolized by the “y”;
the target in this study is defined per binary CVD classification labels. Next, we split
the data in an 80:20 ratio to have a test size of 20%. The random state was kept at
42 for reproducibility.
Feature Scaling. To boost the convergence problem between the models and also
bring about equal or comparable value scale to the features, feature scaling was
conducted. This is a vital step adding to the fact that research has shown that features
with zero means and unit variance are better for model convergence. In this way, the
occurrence of the features is treated uniformly, improving the training of models.
Model Architecture. The structure of our model is delineated as follows:
Description. The network consists of four primary layers: an input layer, a hidden
layer, a dropout layer, and an output layer. Together, these layers enable the neural
network to effectively process and learn from input data, ultimately producing
accurate and reliable outputs (shown in Fig. 1).
15 Predictive Modeling of Cardiovascular Disease Using Feedforward … 199

Fig. 1 Architecture of the proposed neural network

Fig. 2 Illustration of a multilayer feed-forward network

Description. The network consists of three types of nodes organized in multiple

layers, Input Nodes (X), Hidden Nodes (Y), and Output Nodes (Z). This struc-
ture allows the network to transform input data into meaningful outputs through
successive layers of processing (shown in Fig. 2).
Input Layer. In the following input layer, neurons are equal to the number of features
within the data set. Every neuron in the input layer will represent each feature
depending on the number of features available within the training data set.
Activation function: Rectified Linear Unit (ReLU)
Hidden Layers. The model includes multiple hidden layers, each consisting of densely
connected neurons. The number of neurons in each hidden layer is specified in the
Dense layer parameters (e.g., 256 neurons in the first hidden layer, 128 neurons in
the second hidden layer, and so on).
Activation function: ReLU
Dropout Layers. After each hidden layer is passed through the transfer function,
a dropout layer is usually included. Dropout is a regularization technique that is
intended to prevent overtraining, in which a particular number of neurons are set
aside and turned off randomly during training sessions. The dropout rate, indicated
as a parameter (for instance, 0.3), dictates exactly what proportion of units should
be dropped out within a specific training cycle.
200 D. Joshi et al.

Output Layer (1 unit). Output layer having only one neuron for binary classification
where “1” represents cardiovascular disease and “0” represents its absence.
Activation function: Sigmoid
Model Compilation. This model is trained with Adam optimizer and the binary
cross-entropy loss that is especially applicable to binary classification problems.
Adam optimizer is famous for its ability to quickly adapt to the large-scale dataset,
but it also performs well in the case of sparse gradients. Binary cross-entropy loss
function is best suited for binary class problems as it quantifies the dissimilarity of the
probability distributions of the predicted classes versus the actual class distribution
in the data.
Model Training. Over a span of 50 training sessions, our model went through a
learning journey. In each session, it tackled a batch of 128 samples, taking small
steps towards improvement. Throughout this training process, we made sure to check
in on its performance with a separate set of validation data to ensure it wasn’t just
memorizing the training set.
Model Evaluation. Once our model finished its training, it was time for the big test.
We put it through its paces on a set of data it hadn’t seen before. We compared its
predictions to the actual outcomes to see how well it could anticipate the future. This
rigorous test helped us see if our model could handle real-world scenarios beyond
its classroom training.
Result Analysis. We meticulously calculated its accuracy on the test set, which gave
us a solid idea of how well it could predict outcomes. We dug deeper, looking at
other metrics like precision, recall, and F1-score to get a more complete picture. We
used a confusion matrix to break down its performance even further.
Description. The methodology outlines the steps from dataset collection to final
prediction using deep learning techniques. The approach ensures the model is well-
prepared to make accurate predictions (shown in Fig. 3).

4 Experimental Result

4.1 Experimental Dataset

Source and Description. The dataset used in this study consists of 1025 entries that
are carefully selected and organized to cover in detail the cardiovascular health area.
This comes from the well-known University of California, Irvine (UCI) Machine
Learning Repository. The data set contains 14 distinct features which were selected in
detail and captures some essential aspects that are important to predict cardiovascular
disease (CVD) outcomes.
15 Predictive Modeling of Cardiovascular Disease Using Feedforward … 201

Fig. 3 Proposed model methodology

Description. The table comprises 14 carefully selected features, each essential for
predicting cardiovascular health. These features are chosen based on their relevance
and impact on cardiovascular conditions, ensuring the model has the necessary
information to make accurate predictions (shown in Table 1).

4.2 Performance Analysis

Performance Metric. By deploying multiple evaluation metrics, we got a better feel

for the subtleties of the performance of the model in terms of different aspects of
accuracy and reliability.
Precision. Precision is a measure through which we can assess how many positive
predicaments are covered by the predictor/classifier. It is calculated by dividing true
positives over all positive predictions (thus false positives are included in calculation).
Recall. Also called the sensitivity or true positive rate, tells us how well a model can
correctly pinpoint all relevant cases or positives in a dataset. It is computed as the
ratio of true positive predictions to the total number of actual positive instances in
the dataset.
F1-Score. F1-score, which is the harmonic mean of precision and recall, gives a single
measure that combines both precision and recall into one value. It is calculated as 2 *
(precision * recall)/(precision + recall). This score ranges from 0 to 1 where higher
values imply better performance with respect to precision and recall.
Results and Findings. The results of the experiment clearly showed that by using
an FNN architecture with multilayer perceptrons employing dropout regulariza-
tion technique resulted in very good performances. After trying various machine
202 D. Joshi et al.

Table 1 Essential features for predicting cardiovascular disease (CVD)

S. no Attribute Description
1 age Patients age (in year)
2 sex 0: female and 1: male
3 cp Type of chest pain
Type 0: typical angina
Type 1: atypical angina
Type 2: non-anginal pain
Type 3: asymptomatic
4 trestbps Resting blood sugar
5 chol cholesterol in mg/dl
6 fbs Fasting blood sugar > 120 mg/dl
(1 = true; 0 = false)
7 restecg Resting ECG
8 thalach Maximum heart rates Achieved
9 exang Exercise induced angina
(1 = yes; 0 = no)
10 oldpeak ST depression induced by exercise relative to rest
11 slope ST segment during slope or peak exercise
Value 1: upsloping
Value 2: flat
Value 3: downsloping
12 ca No. of major vessels colored by flouroscopy
13 thal 3 = normal;
6 = fixed defect;
7 = reversible defect
14 target The predicted attribute
1: Yes;
0: No

learning models Logistic Regression (LR), Support Vector Machines (SVM), k-

Nearest Neighbor (KNN), Random Forest (RF), Extreme Gradient Boosting (XGB)
alongside with the Feedforward Neural Network, FNN was found superior to all.
FNN got accuracy rate at the level of 98.5%
Description. The accuracy of different models was compared, revealing that the
Feedforward Neural Network (MLP), achieved the highest accuracy at 98.5%. This
indicates that the FNN outperforms other models in making accurate predictions
(shown in Table 2).
15 Predictive Modeling of Cardiovascular Disease Using Feedforward … 203

Table 2 Accuracy
S. No Accuracy (%) Model
comparison of various models
1 78.68 Logistic regression
2 80.32 SVM
3 73.77 KNN
4 85.24 Random forest
5 90 XGBoost
6 98.5 FNN (MLP)

Description. The performance metrics comparison shows that the FNN (MLP),
outperforms all other models, indicating superior accuracy, balance between preci-
sion and recall, and effectiveness in identifying relevant instances (shown in
Table 3).
It reveals that the FNN model has higher performance than various machine
learning models. Although each of these models has been studied and tried to have
a better predictive model for CVD, the present study demonstrates that advanced
neural networks with addition of regularization techniques can be useful to improve
prediction accuracy. This trend implies the need to have a high number of hidden
layers and the ability to flexibly adjust the model to analyze the Cardiovascular health
data complexities. The relevance of such trends and fluctuations, as was highlighted
in our analysis, is based on their applicability as answer to the question posed at
the beginning of our study. In attempting to derive and compare the most effective
machine learning, and deep learning models for CVD risk prediction strategy, we
sought to determine how different training, feature selection, and hyperparameter
optimization strategies can be used to improve diagnostic precision in cardiovascular
health care.
Description. The bar graph visually illustrates the performance disparities among
various models, prominently showcasing that the FNN (MLP), achieves the highest
accuracy compared to others (shown in Fig. 4).
Description. Confusion matrix evaluates how well a Feedforward Neural Network
(FNN) has performed in classifying data into two categories. The Correlation Matrix
represents the relationships between 13 variables. Darker red indicates stronger posi-
tive correlation, while darker blue indicates stronger negative correlation (shown in
Fig. 5).

Table 3 Performance metrics comparison

S. no Metric LR SVM KNN RF XGB FNN
1 Precision 0.75 0.757 0.675 0.74 0.947 1.0
2 F1-Score 0.827 0.862 0.862 0.689 0.85 0.985
3 Recall 0.786 0.806 0.757 0.714 0.896 0.970
204 D. Joshi et al.

Fig. 4 Bar graph showcasing model performance

Fig. 5 Confusion matrix for FNN classification performance and correlation matrix for variable
relationships

Description. The grouped bar plot compares Precision, F1-Score, and Recall across
multiple models. Each metric (Precision, F1-Score, and Recall) is represented by six
bars, each corresponding to a different model. This visualization provides a clear
comparison of how each model performs across these important evaluation metrics
(shown in Fig. 6).
15 Predictive Modeling of Cardiovascular Disease Using Feedforward … 205

Fig.6 Grouped bar plot of precision, F1-score, and recall across models

5 Conclusion

Taking into consideration, the current study has outlined the significance of adopting
innovative and prolixic methodologies of advanced deep learning for predictive
cardiovascular healthcare. Our efforts have paid off through rigorous practice and
evaluation, and their achievements give us their best accuracy, almost touching the
98.5% in predicting cardiovascular diseases. It is firmly concretized in the present
paper that the deep learning algorithm should serve as the bedrock of model devel-
opment for cardiovascular diseases. The limitations of this study also include the fact
that it uses only one dataset and one experimental setting, which greatly limits the
generality of the conclusions drawn. Furthermore, actual deployment can pose other
considerations that are not adequately reflected in this study. Despite these, our study
builds on prior work to progress the field of predictive cardiovascular healthcare.
Further advancements should expand into detailed combinations of other machine
learning methods, innovative feature creation methods, and the use of even larger
datasets than existing trends. Furthermore, future research should aim toward discov-
ering the generalizability of the results to various populations and setting as the
developed framework of the predictive cardiovascular healthcare has the potential to
extend its application to various settings.
206 D. Joshi et al.

References

1. Garcia-Ordas MT et al (2023) Heart disease risk prediction using deep learning techniques with
feature augmentation. Multimedia Tools Appl 82:31759–31773
2. Subramani S et al (2023) Cardiovascular diseases prediction by machine learning incorporation
with deep learning. Front Med 10:1150933. [Link]
3. Bhatt CM, Patel P, Ghetia T, Mazzeo PL (2023) Effective heart disease prediction using machine
learning techniques. Algorithms 16(2):88. [Link]
4. Ahmad S, Asghar MZ, Alotaibi FM et al (2022) Diagnosis of cardiovascular disease using deep
learning techniques. Soft Comput 27:8971–8990
5. Mehmood A, Iqbal M, Mehmood Z et al (2021) Prediction of heart disease using deep
convolutional neural networks. Arab J Sci Eng 46:3409–3422
6. Sharma S, Parmar M (2020) Heart diseases prediction using deep learning neural network model.
Int J Innov Technol Exploring Eng
7. Bharti R et al. (2020) Prediction of heart disease using a combination of machine learning and
deep learning. Hindawi Comput Intell Neurosci 2021
8. Pasha SN et al. (2020) Cardiovascular disease prediction using deep learning techniques. IOP
Conf Ser: Mater Sci Eng 981:022006
Chapter 16
Design and Implementation of Fetal
Heart Rate Measuring System
on MATLAB Simulink
Twarita Singh, Tanishq Dixit, Pragya Paliwal, Samriddhi Tiwari,
and Shivani Saxena

1 Introduction

An Electrocardiogram (ECG) represents bioelectrical impulses that are useful in

analyzing the cardiac cycle in order to ascertain the functionality of the heart. Fetal
Heart Rate (FHR) monitoring is crucial during pregnancy to determine the fetus’s
appropriate amount of oxygen, nutrition, and growth factors. One commonly utilized
method for diagnosis and identifying prenatal abnormalities is Fetal Electrocardio-
gram (FECG) monitoring. It’s the most basic non-invasive technique for differen-
tiating cardiac conditions and can be collected from the belly of pregnant women.
According to the World Health Organization (WHO), in 2020, nearly 800 women
died in pregnancy. In 2020, low and lower-middle-income countries accounted for
about 95% of all maternal deaths [1]. The biomedical research industries contin-
uously develop non-invasive health monitoring techniques and enhance diagnostic
equipment to provide expert medical care prior to, during, and after childbirth in
order to rescue women and newborns from death [2]. C A General FECG waveform
is shown in Fig. 1, consisting of various signal waves (P, QRS complex, T waves),
segments (PQ, QT, ST), and intervals (PR).

T. Singh (B) · T. Dixit · P. Paliwal · S. Tiwari · S. Saxena

Banasthali Vidyapith, Banasthali, Rajasthan, India
e-mail: btbte21105_twarita@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 207
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
208 T. Singh et al.

Fig. 1 Graphical representation of FECG signal [3]

2 Literature Review

In order to get an intrusive measurement of Fetal Heart Rate, researchers employed

Electrocardiogram (ECG). Various non-invasive techniques are used to estimate FHR
using the successive R waves approach from ECG signal [4]. As a result of techno-
logical advancements, the Doppler sound monitoring procedure was taken to non-
invasively monitor the mother’s heart rate [5]. This procedure involved sending an
ultrasound beam through the mother’s abdomen and measuring the reflected signal.
For real-time analysis, it is essential to separate and cancel out the noise from
the FECG. An Adaptive Least Mean Squared Algorithm is used to construct an
adaptive system for processing signals received from two interferometric measuring
sensors [6]. Cardiotocography (CTG) technique uses an ultrasound transducer on
the mother’s abdomen to continuously record the baby’s heart rate electronically [7].
Fetal phonocardiography (FPCG) is used for passive recording of fetal heart sounds
when the maternal abdomen produces mechanical activity [8]. Entropy Profiling
is used to obtain fetal heart rate in time series from each NI-FECG recording [9].
Transabdominal Fetal pulse oximetry method is used to measure fetal oxygen satura-
tion level. This technique uses photons emission into the mother’s abdomen surface
and captures reflected photons in the form of photoelectrons [10]. Tele-fetal moni-
toring approach is used in remote monitoring of the fetal where uterine contractions
are obtained in FHR measurement [11]. In fetal magnetocardiography technique,
a superconducting quantum interference device is used to measure heartbeat [12].
As compared to Doppler and other non-invasive techniques, fetal ECG is a more
promising technique with reported accuracy of 95% to outperform significantly [13].
It can be said that several techniques governing fetal heart rate are reported with
their advantages and limitations, mainly in acquiring ECG signals and importing
them into an integrable platform. Still, there is a need for such techniques that can
16 Design and Implementation of Fetal Heart Rate Measuring System … 209

Fig. 2 DWT representation of signal

extract time-varying ECG features and can be independent of any pathological and
physiological states of mother and Fetus. The work in this project is framed according
to the above research gap and uses a DWT-based signal filtering and processing block
to analyze each beat-beat of the fetal.

3 Discrete Wavelet Transform

A Discrete Wavelet Transform (DWT) is a method that decomposes a signal into

various sets, each set representing a time series of coefficients that depicts how the
signal changed over time in the relevant frequency range using successive filters [14].
It results in various coefficients to extract signal information. Figure 2 represents a
DWT decomposition of a signal into two decomposition levels.
In the proposed model, DWT effectively reduces noise in FHR signals by selec-
tively filtering out certain coefficients while retaining true signals, improving FHR
analysis accuracy in the presence of artifacts or interference [15].

4 Methodology

In this research endeavor, a comprehensive dataset comprising an ECG record of 16

datasets is obtained from Non-Invasive Fetal ECG Arrhythmia Database (nifeadb),
PhysioNet [16]. It is a renowned repository for physiological signal data. Each ECG
record contains a recording of 10,000 numbers of samples at sampling frequency
1000 Hz in 10 s. It has a gain of 26,599.2897 dB.
210 T. Singh et al.

Fig. 3 Workflow diagram of proposed model

Figure 3 shows a workflow diagram of the proposed model which consists of five
basic steps. Steps 1–2 are used in the signal acquisition, Step 3 is used to remove
various noise components from the signal, called pre-processing stage, and Steps
4–5 are implemented to detect and measure fetal heart rate using R-peaks. Finally,
from a software algorithm, a MATLAB-based Simulink model is created, which is
an ensemble of all entities used in fetal heartbeat.
The detailed description of each step is as follows:
Step1: Import FECG records from Physionet to MATLAB R2023b workspace
[17].
Step 2: Normalization of ECG gain from dB to mV, using Eq. 1, as follows:

Original signal
Normalized ECG signal(n) = (1)
Gain factor
This ensures consistent interpretation of ECG data across different recordings.
Step 3: Conducted 4-level undecimated DWT using the SYM4 wavelet, resulting
in the signal decomposition into its constituent frequency components for the removal
of various noise components.
Step 4: Detection of R-peaks using amplitude thresholding and windowing
method.
Step 5: Calculation of Heart Rate using successive R-Peaks, according to the
following equation.
16 Design and Implementation of Fetal Heart Rate Measuring System … 211

Number of Rpeaks
HR = (2)
Time duration
Here, HR is the heart rate in beats per minute (bpm) and number of R-Peaks is the
count of R-peaks detected within the specified time duration [18].
Statistical parameters are used to measure the performance of the proposed
method. First is Mean Square Error (MSE) is computed to find the variation in
time between successive heartbeats. The mathematical equation governing the MSE
is Eq. 3.

1 N
MSE = ECG(i) − ECG(i)2 (3)
N i=1

Here, ECG (i) and ECG (i) are the noisy and denoised ECG signals, respectively.
The signal-to-noise ratio is measured using signal averaging, given by Eq. 4.

1 N
Average ECG(t) = ECGi (t) (4)
N i=1

Here, N is the number of ECG samples to be averaged and ECG i (t) is denoised ECG
signal [15].

5 Experimentation, Results, and Discussion

As discussed in Sect. 4, the first step of the proposed model is signal acquisition
which is implemented using Algorithm 1 in MATLAB workspace to import this
FECG signal.

Algorithm 1: Signal Acquisition and plotting the original noisy signal

% Loading ECG signal
[fname, path] = uigetfile(’*.*’, ’ARR_03m.mat’ );
filewithpath = strcat(path, fname);
ecg = load(fname);
ecg_sig = ([Link]);
Fs = input(’Sampling Frequency of ECG = ’);
n = 1:length(ecg_sig); % Number of samples
t = n./Fs; % Time vector
subplot(211)
plot(t,ecg_sig);
grid on;
xlabel(’Time(s)’)
ylabel(’Gain(dB)’)
title(’Noisy ECG Signal’)

The resulting noisy FECG signal of record ARR_03 is shown in Fig. 3.

212 T. Singh et al.

Fig. 4 Simulated waveform of noisy FECG signal

The second and third step is to apply DWT on noisy and normalized FECG signals,
as shown in algorithm 2 (Fig. 4).

Algorithm 2: Gain Normalization and Discrete Wavelet Transform in Noise

Removal
level=4; % 4 level undecimated DWT using sym4
wt= modwt(ecg_sig(1,:),level,’sym4’);
disp(size(wt))
wtrec = zeros(size(wt));
wtrec(3:4,:) = wt(3:4,:); % Extracting d3 and d4 coeff
y = imodwt(wtrec, ’sym4’); % Inverse DWT
y = abs(y).^2; % Magnitude square
yavg = mean(y); % Average of y^2 as threshold

The signal depiction of denoised FECG is shown in Fig. 5.

Advancing to the fourth step, extraction of R-peaks from denoised FECG signal
is performed, according to Algorithm 3.

Algorithm 3: R-peaks detection and extraction

% Detecting Peaks
[R_peaks, locs] = findpeaks(y,n,’MinPeakHeight’,8*yavg,
’MinPeakDistance’,50;

Fig. 5 Normalized gain and discrete wavelet transform

16 Design and Implementation of Fetal Heart Rate Measuring System … 213

Fig. 6 Detected R-peaks

The resulting FECG signals with all detected R-peaks are shown in Fig. 6.
As shown in Fig. 6, the number of detected R-peaks in record ARR_03 is 15.
Using these R-peaks, the measure value of heart rate is shown in Eq. 5.

15
HR = · 60 = 90bpm (5)
60
This measured value is used as a reference for comparison of simulated values
obtained from derived Algorithms 1–3 in the proposed model.
In the same manner, simulated values of heart rate are compared with theoretical
values in all 16 FECG records and measured values of performance parameters using
Eqs. 3 and 4 are tabulated in Table 1.
It is shown in the table above that the theoretical values of heart rates are similar
with the measured values of heart rate using experimental methods as discussed in
Fig. 2. Also, it exemplifies the performance indices of the proposed DWT based
method which is quantified in the form of statistical parameter.

5.1 Hardware Implementation of Proposed FECG Signal

Processing Model

In the final stage of hardware implementation of the proposed model, the FECG
signal is imported into the MATLAB Simulink. Firstly, conversion of floating-point
data type into fixed-point data type is performed to import these FECG signals into
the MATLAB Simulink environment. Figure 7 shows basic processing blocks in
Simulink to convert into appropriate data types.
The acquired signal is segmented into two distinct sections corresponding to the
chest and abdomen regions and pre-processed with supported data format conversion.
The collective fetal heart rate signal, derived from these categorized sections in
integer format, fixed data type signal representation. This type of fixed-point signal
representation is necessary and essential for hardware implementation as it utilizes
less hardware resources to be embedded into a small device [19].
214 T. Singh et al.

Table 1 Measurement of heartbeat of fetus using FECG and numerical simulation of statistical
parameters in denoised ECG signals
S. no Records No. of Heartbeat Heartbeat count MSE SNR
R-peaks (theoretical) (experimental)
detected
1 ARR_01m 14 84 88 0.0392 42
2 ARR_02m 15 90 96 0.0409 49
3 ARR_03m 15 90 114 0.0344 70
4 ARR_04m 15 90 90 0.0486 56
5 ARR_05m 18 108 108 0.0042 54
6 ARR_06m 19 114 114 0.0344 42
7 ARR_07m 14 84 84 0.0749 51
8 ARR_08m 13 78 78 0.0405 43
9 ARR_09m 23 138 108 0.031 39
10 ARR_10m 12 72 72 0.1217 42
11 ARR_11m 13 78 78 0.0875 48
12 ARR_12m 17 102 102 0.0704 64
13 NR_01m 15 90 138 0.0256 51
14 NR_02m 16 96 96 0.1039 45
15 NR_03m 15 90 96 0.237 47
16 NR_04m 17 102 108 0.1051 50

Fig. 7 Import of FECG data into Simulink using integer data type

It is shown in Fig. 8 that the simulated fetal heart rate signal is sent to the display
component followed by pipelining of various DSP blocks including digital filters
to remove high-frequency noise components, and also the DC (zero frequency). In
addition, the Spectrum Analyzer is easily used in SIMULINK blocks to identify the
frequency of noise components. The resultant FECG signal generated by the above
Simulink model is shown in Fig. 9, and consists of various signal components, (as
discussed in Fig. 1).
It is clear from Fig. 9 that the proposed Simulink model generates FECG signals
efficiently which is used for clinical purposes in heart rate monitoring.
16 Design and Implementation of Fetal Heart Rate Measuring System … 215

Fig. 8 Simulink model of ECG signal processing

Fig. 9 Fetal heartbeat signal

216 T. Singh et al.

6 Conclusion

Conclusively, the work develops a software hardware co-design to produce Fetal

Heart Rate Measuring System based on non-invasive technique. It integrates
MATLAB coding and Simulink modeling for precise heart rate representation.
The system demonstrated nearly accurate modeling in hardware implementation,
emphasizing its potential for real-world applications. The obtained results of mean
square error and signal-to-noise ratio are satisfactory and used to check validation
of the proposed work. The collaborative synergy between MATLAB and Simulink
highlights the innovation in developing accurate fetal heart rate monitoring systems.

References

1. Ronsmans C, Graham WJ (2006) Maternal mortality: who, when, where, and why. The Lancet
368(9542):1189–1200
2. Aggarwal G, Wei Y (2021) Non-invasive fetal electrocardiogram monitoring techniques:
potential and future research opportunities in smart textiles. Signals 2(3):392–412
3. Lin H, Liu R, Liu Z (2023) ECG signal denoising method based on disentangled autoencoder.
Electronics 12(7):1606
4. Behar J, Johnson A, Clifford GD, Oster J (2014) A comparison of single channel fetal ECG
extraction methods. Ann Biomed Eng 42:1340–1353
5. Alnuaimi SA, Jimaa S, Khandoker AH (2017) Fetal cardiac Doppler signal processing
techniques: challenges and future research directions. Front Bioengd Biotechnol 5:82
6. Ahmad AA, Nyitamen DS, Lawan S, Wamdeo CL (2019) Fetal heart rate estimation: adaptive
filtering approach vs time-frequency analysis. In: 2019 2nd international conference of the
IEEE Nigeria computer chapter (NigeriaComputConf). IEEE, pp 1–5
7. Grivell RM, Alfirevic Z, Gyte GM, Devane D (2015) Antenatal cardiotocography for fetal
assessment. Cochrane Database Syst Rev 9
8. Jabbari S (2021) Source separation from single-channel abdominal phonocardiographic signals
based on independent component analysis. Biomed Eng Lett 11(1):55–67
9. Keenan E, Udhayakumar RK, Karmakar CK, Brownfoot FC, Palaniswami M (2020) Entropy
profiling for detection of fetal arrhythmias in short length fetal heart rate recordings. In: 2020
42nd annual international conference of the IEEE engineering in medicine & biology society
(EMBC). IEEE, pp 621–624
10. Kasap B, Vali K, Qian W, Chak WH, Vafi A, Saito N, Ghiasi S (2021) Multi-detector heart rate
extraction method for transabdominal fetal pulse oximetry. In: 2021 43rd annual international
conference of the IEEE engineering in medicine & biology society (EMBC). IEEE, pp 1072–
1075
11. Farahi M, Casals A, Sarrafzadeh O, Zamani Y, Ahmadi H, Behbood N, Habibian H (2022)
Beat-to-beat fetal heart rate analysis using portable medical devices and wavelet transformation
techniques. Heliyon 8(12)
12. Escalona-Vargas D, Bolin EH, Lowery CL, Siegel ER, Eswaran H (2020) Recording and
quantifying fetal magnetocardiography signals using a flexible array of optically pumped
magnetometers. Physiol Meas 41(12):125003
13. Vullings R, Van Laar JO (2020) Non-invasive fetal electrocardiography for intrapartum
cardiotocography. Front Pediatr 8:599049
14. Chen D, Wan S, Xiang J, Bao FS (2017) A high-performance seizure detection algorithm based
on Discrete Wavelet Transform (DWT) and EEG. PLoS ONE 12(3):e0173138
16 Design and Implementation of Fetal Heart Rate Measuring System … 217

15. Saxena, S., & Vijay, R. (2020). Optimal selection of wavelet transform for de-noising of
ECG signal on the basis of statistical parameters. In: Soft computing and signal processing:
proceedings of 2nd ICSCSP 2019, vol 2. Springer Singapore, pp 731–739
16. Non-Invasive Fetal ECG Arrhythmia Database (nifeadb) downloaded from PhysioNet Website
17. Simulink toolbox MATLAB R2023b
18. Saxena S, Vijay R, Pahadiya P, Gupta KK (2023) Classification of ECG arrhythmia using
significant wavelet-based input features. Int J Med Eng Inf 15(1):23–32
19. Travieso-González CM, Pérez-Suárez ST, Alonso JB (2013) Using fixed point arithmetic for
cardiac pathologies detection based on electrocardiogram. In: Computer aided systems theory-
EUROCAST 2013: 14th international conference, Las Palmas de Gran Canaria, Spain, February
10-15, 2013. Revised Selected Papers, Part II 14. Springer Berlin Heidelberg, pp 242–249
Chapter 17
Exploring the Effectiveness of Artificial
Neural Networks and Regression Models
in Weather Prediction

Vishwadeep Singh, Chandan Kumar, and Nitin Choudhary

1 Introduction

Machine learning in meteorology is a significant aspect, especially in the context of

applications like weather forecasting systems (WFS). Weather prediction, including
aspects like thunderstorms, temperature, and rainfall, holds crucial importance in
various fields such as navigation and aviation. Improving the accuracy of WFS is
essential, given the imbalanced and vast raw weather data available. The use of Arti-
ficial Neural Network (ANN) emerges as an influential method to address such chal-
lenges. ANN [2, 3, 5] is widely employed in forecasting weather conditions. Temper-
ature prediction, a challenging task, is particularly vital in the modern world [4, 6].
Conventional approaches prove ineffective due to the complex, nonlinear nature
of atmospheric patterns. ANN training requires a substantial amount of previous
weather information covering various places and timescales. If difficult to interpret,
ANN-based weather forecasts may be less reliable and widely accepted. Increasing
the number of inputs and hidden layers in the Neural Network (NN) model can
enhance performance. Temperature data is widely used for forecasting crop yields
and weather needs. Temperature data is highly unbalanced and inconsistent, posing
a challenge for pre-processing before training. Temperature prediction is a vital
aspect of WFS, being time-dependent; thus, prediction accuracy is crucial. There is a
high temporal correlation in temperature data, making regression models suitable for
measuring accuracy. Weather forecasting (WF) plays a significant role in planning

V. Singh · C. Kumar (B) · N. Choudhary

Sanjeev Agrawal Global Educational (SAGE) University, Bhopal, India
e-mail: chandan.k@[Link]
V. Singh
e-mail: vishwadeeps10@[Link]
N. Choudhary
e-mail: [Link]@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 219
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
220 V. Singh et al.

everyday activities, particularly from a tourism perspective. WFS and early warn-
ings assist travelers in planning trips more effectively, enabling informed decisions
about specific days, locations, and times. Accurate WF is crucial in natural disas-
ters to ensure the safety of people and property. Therefore, this paper focuses on
predicting temperature data using historical data from Delhi city spanning 20 years
and 365 days. The temperature dataset from Himani et al. [16] is utilized to test NN-
based weather forecasting (WF) accuracy, revealing a significant need for improved
forecasting accuracy. The ultimate goal of the paper is to demonstrate the accurate
use of the ANN network model for temperature WF.

2 Literature Review

Pathak et al. [1] have introduced FourCastNet, a computational model with signif-
icant implications for predicting catastrophic weather phenomena, including extra-
tropical and tropical cyclones, air pathways, and wind power planning. FourCastNet
demonstrates superior forecasting accuracy at short lead periods compared to the
ECMWF Interactive Forecasting System (IFS), a cutting-edge Numerical Weather
Prediction (NWP) approach. Particularly, FourCastNet outperforms IFS in small-
scale factors like rainfall, and its remarkable speed enables the generation of weekly
predictions in just a second. This velocity facilitates the cost-effective creation of
large-ensemble projections, enhancing deterministic modeling. The study under-
scores the value of data-driven computational models like FourCastNet as valuable
additions to the weather forecasting arsenal, complementing and improving NWP
modeling.
In a related development, Bi et al. [2] have presented Pangu-Weather, a deep
learning-based system for precise and rapid global weather forecasting. Utilizing
hourly global meteorological data spanning 43 years from the ECMWF revision
(ERA5), the model is trained with around 256 million parameters. Pangu-Weather
achieves a spatial accuracy of 0.25° × 0.25°, comparable to the ECMWF Inte-
gral Forecasting System (IFS). Notably, the artificial intelligence-driven approach
surpasses traditional NWP techniques in precision across various variables and time
intervals.
Xia et al. [3] have introduced the ED-ConvLSTM architecture, employing a
convex short quick memory network, to predict worldwide net electron contents
(TEC) with a 1-h time cycle. Using International GNSS Services (IGS) TEC data
from 2005 to 2018, the model forecasts TEC maps one to seven days in advance. The
model’s effectiveness is validated through simulations and comparisons with empir-
ical frameworks such as the Bend model, International Model for the Reference
Ionosphere (IRI) models, and NeQuickmodel.
Chen et al. [4] address state-dependent errors in weather forecasts resulting
from imprecise algorithms. They propose using data assimilation (DA) to incor-
porate observations and correct model mistakes. The study focuses on extracting
data embedded in the analysis increments generated by the DA procedure to
17 Exploring the Effectiveness of Artificial Neural Networks … 221

enhance numerical weather forecasts. Neural networks (NNs) are trained to predict
adjustments to systemic errors in the FV3-GFS simulation, leading to significantly
improved error correction compared to a linear baseline.
In their study, Grabar et al. [5] present a comprehensive approach to address
dry data, utilizing a publicly available quarterly weather dataset as input for a
spatio-temporal artificial intelligence model. The investigation, conducted across
five distinct climatic locations, assesses the accuracy of Palmer Dry Severity Index
(PDSI) forecasts using various algorithms. Notably, the study concludes that the
Transformers approach, EarthFormer, exhibits remarkable precision in generating
immediate (up to six months) forecasts. The use of equations with partial differen-
tials to mimic Earth’s systems highlights the potency of general circulation models
(GCMs) in predicting climatic events.
Kurth et al. [6] address the challenges associated with the time-to-solution
constraints and processing costs of physics-based numerical weather forecasting
(NWP). They reveal FourCastNet, a data-driven deep neural network planetary
emulator, as a promising alternative that outperforms traditional NWP in medium-
range forecasting with significantly reduced computational time. The study employs
spectrum approaches to achieve these advancements.
Sharma et al. [7] introduce a data-driven model named U-Net, based on convolu-
tional neural networks, for meteorological weather forecasting. This model considers
environmental characteristics such as 2 m temperatures, mean ocean stress, surface
stress, wind speed, modeling topography height, sunlight intensity, and humidity
ratio to forecast weather conditions for the next six hours.
Liu et al. [8] contribute to the exploration of weather forecasting (WF) prob-
lems by presenting CAST-YOLO, an enhanced YOLO approach incorporating a
cross-attention approach converter. Utilizing convolutional block attention modules
(CBAM) and a transformation decoder layer (TE-Layer), the detector employs infor-
mation distillation techniques for cross-domain object identification, showcasing
notable advancements in convolutional neural network-inspired item recognition
techniques.
Nayak et al. [9] tackle the immediate forecasting of severe downpours, a crit-
ical aspect of flooding alert communication. Focusing on urbanized locations like
Mumbai, the study employs micro and continental-level weather observations to
develop a machine learning-based method, using the support vector machine (SVM)
and the atypical regularity methodology (AFM). This methodology leverages the
occurrence of high anomalous values of meteorological parameters to identify
predictors for SVM, addressing the challenges of predicting excessive precipitation.
Novitasari et al. [10] employ climate variables to predict rainfall quantity in their
study, utilizing the Adaptive Neuro-Fuzzy Inference System (ANFIS) and support
vector regression (SVR) techniques. This approach aims to provide rapid and reli-
able insights into upcoming weather conditions, enabling individuals to prepare
adequately. The prediction is based on synopsis data, including wind speed, temper-
ature, and humidity levels, with the ANFIS technique forecasting each variable’s
results for precise rainfall forecasts, assessed using RMSE and MSE metrics.
222 V. Singh et al.

Hayaty et al. [11] conducted a study with the aim of assessing the capability of
support vector machines (SVM) in forecasting rain at Tanjungpinang, Kepulauan
Riau, Indonesia. The research utilizes factors such as humidity, temperature, wind
speed, and precipitation to make predictions. The gathered information undergoes
detailed analysis to determine whether the constructed SVM model meets the conver-
gence requirements of an algorithm or technique. This evaluation is crucial for vali-
dating the effectiveness and reliability of the SVM-based rain forecasting model in
the specific geographical context of Tanjungpinang.
In a study by Chandrasekara [12], an automatic SVM-inspired method for annual
weather forecasting with thirty-minute increments is proposed. This method, tested
with a dataset from the Kandy region comprising 136 samples, 20 characteristics, and
5 labels, exhibits significantly higher accuracy than models based on yearly datasets.
The framework achieves optimal precision of 86% through thorough training, data
preparation processes, and hyper parameter optimization using the grid search tech-
nique. SVM algorithms prove effective, particularly in time series-based forecasting
approaches.
Velasco et al. [13] address the impact of climate change-related heavy rainfall
on the Southern Philippines. Using a 4-year precipitation dataset, a Support Vector
Regression Machine (SVRM) forecasts future precipitation in a tropical city. The
process involves optimizing cost and gamma radiation variables to enhance predic-
tion precision, assessing forecasting algorithms, and authenticating cost and gamma
decay parameters to establish correlations across current and past value pairs.
Khan et al. [14] present a unique deep learning-driven approach for estimating
energy load in their research. This involves a three-step process: feature extrac-
tion through Recurrent Features Removal, classification/forecasting using enhanced
support vector machines (SVM) and Extreme Learning Machines (ELM), and feature
selection through a combination of include choice. Hyper parameter adjustments are
made using the Genetic Algorithm for ELM and the Grid Searching Engine for SVM,
with the relevance of features determined through the Recursive Features Elimination
(RFE) approach.
Álvarez-Alvarado et al. [15] review various hybrid SVM algorithms utilizing
State-of-the-Art (SOA) approaches to optimize settings for sunlight prediction.
Examining the last five years of studies on hybrid SVM-optimized algorithms, they
find that SVM using Genetic Algorithm (GA) outperforms traditional SVM models,
especially when forecasting variables are set using the Zonal-based kernel algorithm.
SVM emerges as a potent machine learning classification technique for forecasting
ultraviolet (UV) rays.
Tyag et al. [16] have focused on weather forecasting utilizing historical datasets.
Due to the intricate and nonlinear nature of atmospheric patterns, conventional
methods often fall short in terms of effectiveness and efficiency. Recognizing the
complexity of the issue, the researchers turned to Artificial Neural Network (ANN)
as a potent approach for addressing these challenges. The proposed ANN method
assesses model performance through the exploration of various parameters such as
neurons, hidden layers, and transfer functions.
17 Exploring the Effectiveness of Artificial Neural Networks … 223

Table 1 shows different types of weather forecasting models, from old methods
to new advanced ones using deep learning and hybrid techniques. Each model has
unique strengths, improving accuracy, speed, and usefulness for various weather
and climate issues. The new method we suggest uses an artificial neural network
(ANN) and is very accurate, highlighting how modern AI can improve environmental
monitoring.

3 Proposed ANN-Based WFS

The paper has introduced WFS, which is designed by using ANN. This ANN-based
system assesses different models by predicting temperatures throughout the year,
considering various factors like transfer functions, hidden layers, and neurons. The
selection of the best model is based on the MSE. The main aim of the study is to
create a more accurate and efficient method for modeling weather data, reducing
computational costs, and achieving a lower MSE. The process involves removing
invalid values from the raw data, creating a more manageable dataset, and using it
for predictions. Regression analysis is then employed to measure the accuracy of
the data predictions. The flow chart illustrating the steps of the ANN-based weather
forecasting is provided in Fig. 1.

3.1 Pre-Processing

At the front end, raw weather data is downloaded from the Kaggle site and stored as
an Excel spreadsheet. The following steps are involved sequentially:
A. Data Cleaning
Read the columns of weather data. The data contains month, date, year, and temper-
ature in °F for Mumbai and Delhi cities. The missing values are replaced with “Not
a Number” (NaN) for i = 1 to the length of the data as follows:

Tempdata = NaNif Tempdata = −99

Convert temperature into o C as

5
TempoC = Tempdata − 32 ∗
9
Then the NaN is eliminated using the spline interpolation method.
B. Network Architecture
The features of the input data are divided into 19 input layers, with one output layer
corresponding to the 20 years, comprising 7320 data samples divided by 366 days.
224 V. Singh et al.

Table 1 Summary of recent methodology in WFS

Study Approach/Model Main focus/result
Pathak et al. [1] FourCastNet - Computational model for predicting
catastrophic weather. - Outperforms ECMWF
IFS in short-term forecasting. -Superior in
small-scale factors like rainfall. - Weekly
predictions generated in a second
Bi et al. [2] Pangu-Weather - Deep learning-based system for global
weather forecasting. - Trained on 43 years of
ECMWF data. - Surpasses traditional NWP
techniques. - Achieves high precision across
variables and time intervals. - Advanced
machine learning weather forecasting
Xia et al. [3] ED-ConvLSTM - Uses a convex short quick memory network to
predict worldwide net electron contents. -
Forecasting up to seven days in advance. -
Validated against empirical frameworks for
ionospheres thermosphere emission curves
Chen et al. [4] Data assimilation (DA) - Addresses state-dependent errors in weather
forecasts. - Uses DA to incorporate observations
and correct model mistakes. -Neural networks
trained to enhance FV3-GFS simulation error
correction. - Potential for post-forecasting
refinement with statistical or ML methods
Grabar et al. [5] EarthFormer - Utilizes a publicly available quarterly weather
dataset for AI models. - EarthFormer
demonstrates remarkable precision in Palmer
Dry Severity Index forecasts. - Shows the
potency of general circulation models in
predicting climatic events
Kurth et al. [6] FourCastNet - Reveals FourCastNet as a data-driven deep
neural network planetary emulator. -
Outperforms traditional NWP in medium-range
forecasting. - Addresses time-to-solution
constraints and processing costs in
physics-based NWP
Sharma et al. [7] U-Net - Introduces U-Net, a data-driven model for
meteorological weather forecasting. - Considers
various environmental characteristics for next
six-hour forecasts. - Shows promise in
addressing the complexity of weather prediction
due to shifting climatic trends
Liu et al. [8] CAST-YOLO - Presents CAST-YOLO, an enhanced YOLO
approach for weather forecasting. - Incorporates
a cross-attention approach converter. - Shows
advancements in convolutional neural
network-inspired item recognition techniques
(continued)
17 Exploring the Effectiveness of Artificial Neural Networks … 225

Table 1 (continued)
Study Approach/Model Main focus/result
Nayak and Ghosh SVM-based method - Focuses on immediate forecasting of severe
[9] downpours for flooding alerts. - Uses SVM and
atypical regularity methodology (AFM). -
Addresses challenges in predicting excessive
precipitation in urbanized locations like
Mumbai
Novitasari et al. [10] ANFIS and SVR - Uses climate variables to predict rainfall
quantity. - Employs Adaptive Neuro-Fuzzy
Inference System (ANFIS) and supports vector
regression (SVR) techniques. - Enables rapid
and reliable insights into upcoming weather
conditions
Hayaty et al. [11] SVM-based rain - Assesses SVM capability in forecasting rain. -
forecasting Uses factors like humidity, temperature, wind
speed, and precipitation. - Evaluates model
convergence for reliability in Tanjungpinang,
Kepulauan Riau, Indonesia
Chandrasekara [12] SVM-inspired method - Proposes an automatic SVM-inspired method
for annual weather forecasting. - Demonstrates
higher accuracy than models based on yearly
datasets. - Achieves optimal precision through
training and hyperparameter optimization
Velasco et al. [13] Support Vector - Addresses the impact of climate
Regression Machine change-related heavy rainfall in the Southern
Philippines. - Uses Support Vector Regression
Machine (SVRM). - Optimizes variables for
enhanced prediction precision. - Contributes to
understanding ionospheric thermospheric
emission curves
Khan et al. [14] Enhanced SVM and - Presented a deep learning-driven approach for
ELM estimating energy load. - Uses feature
extraction, SVM, and Extreme Learning
Machines (ELM). - Employs Genetic Algorithm
and Grid Searching Engine for hyperparameter
adjustments
Álvarez-Alvarado Hybrid SVM with SOA - Reviews various hybrid SVM algorithms for
et al. [15] sunlight prediction. - Highlights SVM with
Genetic Algorithm (GA) outperforming
traditional SVM models. - Utilizes Zonal-based
kernel algorithms for forecasting variables. -
Potent for UV rays prediction
Tyag et al. [16] ANN Evaluates performance for temperature
prediction across all 365 days of the year with a
4:1 ANN accuracy of 98.6% for Delhi data.
They stated that increasing the layers may
minimize the MSE error further
(continued)
226 V. Singh et al.

Table 1 (continued)
Study Approach/Model Main focus/result
Kumar Abhishek ANN - ANN for weather forecasting with a 10:1 ratio,
et al. [17] utilizing either a five-layer or ten-layer
architecture, achieved an accuracy of around
90%
Proposed ANN - Designed a 19:1 model with 14 hidden layers,
Methodology incorporating spline interpolation for NaN
reduction, and achieved an accuracy of around
99.042% for Delhi data

The best performance is achieved with 14 hidden layers, each corresponding to the
12 months.

3.2 ANN Model Design

The selection of the best model is based on the MSE. The main aim of the study is
to create a more accurate and efficient method for modeling weather data, thereby
reducing computational costs and achieving a lower MSE.

3.3 Training and Validation

Regression analysis is then employed to measure the accuracy of the data predictions.
The training process of the suggested ANN-based WFS is depicted in Fig. 2. In
Fig. 2a, it’s evident that the system employs 19 input and 4 hidden layers. Despite
having a more extensive structure, this results in a substantial enhancement in data
correlation. Moving on to Fig. 2b, it illustrates the training process of the system.
Although the proposed system takes a bit more time to converge, the trade-off is
improved accuracy in forecasting.

4 Simulation Results and Discussions

It is suggested to remove NANs from the datasets to create a smaller dataset. The paper
aims to maximize the R-square value to demonstrate improved correlation between
subjective and proximate data. The estimation MSE is used to evaluate performance,
seeking to minimize errors. The results of the histogram of error performance for the
proposed WF system are shown in Fig. 3.
17 Exploring the Effectiveness of Artificial Neural Networks … 227

Fig. 1 Flowchart of proposed ANN-based WFS method

Figure 4a and b compares MSE assessments of Training, Validation, and testing

effectiveness for the 4:1 (Input–Output Data Ratio) Neural Network (NN) against the
19:1 NN for Delhi and Mumbai over the previous 20 years. Comparing the 4:1 and
19:1 ANNs reveals that the 19:1 ANN is better at predicting temperature. Moreover,
it is evident that multi-city data produces more accurate predictions than single-city
228 V. Singh et al.

Fig. 2 Proposed ANN setup and training

Fig. 3 Histogram comparison of the error performance for the proposed system
17 Exploring the Effectiveness of Artificial Neural Networks … 229

data. It can be clearly observed from Fig. 4 that the estimation error performance is
better for the proposed method, with the best performance achieved at lower epochs.
The comparison of MSE and epochs is provided in Table 2. Table 2 presents a
comparative analysis between reference methods [16] and the proposed method for
Delhi data based on MSE and respective epochs. The reference method achieved
a MSE of 4.3915 with a corresponding epoch of 12, while the proposed method
outperformed it with a significantly lower MSE of 0.3962 at a slightly higher epoch
of 18. This indicates the improved performance of the proposed method in terms of
minimizing error and achieving convergence over the training epochs compared to
the reference method.
Table 3 provides a parametric comparison of an existing method [16] and the
proposed ANN approach. The proposed method is evaluated on both Delhi and
Mumbai datasets, with variations in hidden layers, R-square values, and MSE. The
proposed method demonstrates superior performance in terms of R-square values
and MSE compared to the reference method in the Delhi dataset. However, when
validated on the Mumbai dataset, the R-square value decreases, and MSE increases,
suggesting potential variability in performance across different locations.
It can be observed from Table 4, which presents the analysis under different loca-
tions and weather conditions for four cities in India, whose performance and compu-
tation complexity vary notably for highly humid weather data. The best performance

Fig. 4 Comparison of the validation performance of MSE for proposed method

Table 2 MSE and Epochs comparison of proposed and existing methods for Delhi data
Parameter Ref. [16] Proposed method
Best MSE 4.3915 0.3962
Respective epoch 12 18
230 V. Singh et al.

Table 3 Parametric comparison of existing and proposed ANN approach

Method Hidden layers R-square value MSE
Ref. [16] 4 0.98566 1.8985
Proposed method on Delhi data 14 0.99042 0.3962
Proposed method validation on Mumbai data 14 0.91123 2.7176
Proposed method on Mumbai data 14 0.9891 0.2624

Table 4 Analysis of the system’s performance under different weather conditions

Weather location Iterations R-square value MSE
Weather on Chennai data 14 0.99007 2.2064
Weather on Delhi data 17 0.99042 0.3962
Weather on Mumbai data 24 0.91123 2.7176
Weather on Calicut 60 0.9997 0.24546

is achieved for Calicut data. However, the proposed method performs well under all
cases.
Figures 5 and 6 compare the Regression Plot for the 4:1 (Input–Output Data Ratio)
Neural Network; versus 19:1 (Input–Output Data Ratio) Neural Network for Delhi
over and Mumbai over the last 20 years respectively. As it can be observed from the
figures that the fit of the 19:1 ANN is better compared to that of the 4:1 ANN.
Challenges
The major challenge lies in computation, particularly for very large datasets where
the approach could become slow or even impractical. When considering geographical
areas, different regions may exhibit distinct data features. A technique that works
effectively in one area may necessitate modifications in pre-processing to perform
well in another.

5 Conclusions and Future Prospects

This paper proposes the use of an ANN-based data prioritization method after initially
validating the fundamental WF coefficient. The recommendation is to remove NANs
from the datasets to create a smaller dataset. The paper’s primary goal is to maxi-
mize the R-square value, demonstrating improved correlation between subjective
and proximate data. Additionally, the estimation MSE is employed to assess perfor-
17 Exploring the Effectiveness of Artificial Neural Networks … 231

Fig. 5 Results of the performance comparison for proposed ANN system with 20-year Delhi data

mance, aiming for a minimum error value. The suggested system incorporates 19
inputs and 14 hidden layers. During the training phase, the suggested system achieves
peak performance with the best possible R-square value of 0.904 and a minimum
MSE of 0.3962 for Delhi and 0.2624 for Mumbai data. Looking ahead, the study
suggests the potential use of Deep Neural Networks (DNN) for improved forecasting
performance with more extensive datasets in the future. Several crucial elements of
weather datasets that may improve their reproducibility in the future include: (1)
Using NetCDF ensures compatibility. (2) Incorporating indicators or flags concerning
outliers, suspicious data points, or missing values into the dataset. (3) Utilizing hourly
and mean daily temperature data may enhance reproducibility.
232 V. Singh et al.

Fig. 6 Results of performance comparison for Proposed ANN for 20-year Mumbai data

References

1. Pathak J, Subramanian S, Harrington P, Raja S (2022) FourCastNet: a global data-driven high-

resolution weather model using adaptive Fourier neural operators (or arXiv:2202.11214v1
[[Link]-ph] for this version). [Link]
2. Bi K, Xie L, Zhang H et al (2023) Accurate medium-range global weather forecasting with 3D
neural networks. Nature 619:533–538. [Link]
3. Xia G, Zhang F, Wang C, Zhou C (2022) ED-ConvLSTM: a novel global ionospheric total
electron content medium-term forecast model. Space Weather 20(8). [Link]
2021SW002959
4. Chen T-C (2017) Proactive quality control to ImproveNWP, reanalysis, and observations
5. Grabar V, Maximov Y, Zaytsev A (2022) Predicting spatial distribution of palmer drought
severity index. [Link]
6. Kurth T, Subramanian S, Harrington P (2023) FourCastNet: accelerating global high-
resolution weather forecasting using adaptive Fourier neural operators (or arXiv:2208.054
19v1 [[Link]-ph]. [Link]
7. Sharma P, Patel A, Shah P, Senroy S (2023) Data-driven weather forecast using deep convolution
neural network. In: Proceedings of the 15th international conference on agents and artificial
17 Exploring the Effectiveness of Artificial Neural Networks … 233

intelligence—vol 3. ICAART, pp 853–860. ISBN 978-989-758-623-1. [Link]

0011785200003393
8. Liu X, Zhang B, Liu N (2023) CAST-YOLO: an improved YOLO based on a cross-attention
strategy transformer for foggy weather adaptive detection. Appl Sci 13:1176. [Link]
10.3390/app13021176
9. Nayak M, Ghosh S (2013) Prediction of extreme rainfall events using weather pattern recog-
nition and support vector machine classifier. Theoret Appl Climatol. [Link]
s00704-013-0867-3
10. Novitasari R, Suwanto H, Arnita R, Junaidi R (2018) Journal of physics: conference series,
issue 1, vol 15012020. IOP Publishing, p 012012. [Link]
012012
11. Hayaty N, Kurniawan H, Rathomi M, Chahyadi F, Bettiza M (2023) Rainfall prediction with
support vector machines: a case study in Tanjungpinang City, Indonesia. BIO Web Conf. https://
[Link]/10.1051/bioconf/20237001003
12. Chandrasekara S (2022) Support vector machine based an efficient and accurate seasonal
weather forecasting approach with minimal data quantities
13. Velasco LC, Serquiña R, Zamad A, Shahin M, Juanico B, Lomocso J (2019) Week-ahead rainfall
forecasting using multilayer perceptron neural network. Procedia Comput Sci 161:386–397.
[Link]
14. Khan PW, Byun Y-C, Lee S-J, Kang D-H, Kang J-Y, Park H-S (2020) Machine learning-
based approach to predict energy consumption of renewable and nonrenewable power sources.
Energies 13(18):4870. [Link]
15. Álvarez-Alvarado J, Moreno GJ, Obregón-Biosca S, Ronquillo G, Ventura-Ramos E, Trejo-
Perea J (2021) Hybrid techniques to predict solar radiation using support vector machine and
search optimization algorithms: a review. Appl Sci 11:1044. [Link]
31044
16. Tyag H, Suran S, Pattanaik V (2016) Weather—temperature pattern prediction and anomaly
identification using artificial neural network. Int J Comput Appl 140(3):0975–8887
17. Abhishek K et al (2012) Weather forecasting model using artificial neural network. Procedia
Technol 4:311–318
Chapter 18
Modeling and Simulation of a Hybrid
Energy Storage System for DC Microgrid

Kinjal R. Patel and Jagrut J. Gadit

1 Introduction

In regions where the electrical grid is inaccurate, an Energy storage system provides
constant electricity, grid stability, and control of frequencies [1, 2]. Nowadays, the
most prevalent kinds of storage systems implemented are those for disasters [3],
emergencies [4], and intermittent or separated operation scenarios [5, 6]. Petrol or
diesel-electric generators are often used in these instances as energy sources. In recent
days, however, some states have begun to consider large-scale battery deployments
for grid or renewable energy storage. While battery storage alone may be adequate for
a large-scale energy system, this research will concentrate on smaller-scale residential
systems, where hybrid energy storage tends to be more appropriate [7–10]. Battery
supercapacitors (SCs) offer higher power density and quick response to variations
but are not suitable for long-term storage due to their limited power source [11–
13]. Hybrid energy storage systems are due to their opposing characteristics and PV
systems have become increasingly popular and suitable for distributed systems [14].
Many governments promote the utilization of renewable energies and encourage a
more decentralized approach to power delivery systems [15]. Despite their relatively
high cost, there has been very remarkable growth in installed in RESs [11]. Solar
energy is the world’s major renewable energy source. They do not have any moving
parts, operation is smooth, and generate no emissions [16]. Another advantage is
that they are highly modular and can be easily scaled to provide the required power
for different loads [17–19]. To overcome environmental depletion by harvesting
distributed energy from RES which plays a major role in clean energy production [8,

K. R. Patel (B) · J. J. Gadit

Department of Electrical Engineering, The M.S. Univ. of Baroda, Vadodara, Gujarat, India
e-mail: kinjal.p-eedphd@[Link]
J. J. Gadit
e-mail: jagrut-eed@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 235
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
236 K. R. Patel and J. J. Gadit

Fig. 1 Model of a DC microgrid

9]. Considering the nature of the voltage and current, microgrids are sub-categorized
into three types, i.e., AC, DC, and hybrid [20–23]. Each alternative has provided
different advantages and disadvantages. Microgrids provide three benefits reliability,
sustainability, and economic [24–26].

2 System Configuration

2.1 Schematic Diagram of the DC Microgrid

In Fig. 1, the DC microgrid contains the generation side as photovoltaic (PV),

and storage devices are battery storage systems (BSS), supercapacitors (SC), and
electronics devices.

2.2 Parameters of the DC Microgrid

The specifications of the DC microgrid parameters are listed in Table 1.

18 Modeling and Simulation of a Hybrid Energy Storage System for DC … 237

Table 1 Parameters for the DC Microgrid Simulations

Sr. No Parameters Range
1 PV Maximum Power (PMPP ) 530 W
2 Maximum Power Point Voltage (VMPP ) 41.49 V
3 Maximum Power Point Current (IMPP ) 12.79 A
4 Nominal DC Bus Voltage (Vbus ) 380 V,48 V
5 Battery Nominal Voltage (Vb ) 120 V
6 Battery Maximum Charged Voltage (VbH ) 96 V
7 Battery Maximum Discharge Current 32.5 A
8 Supercapacitor Nominal Voltage (Vsc ) 26 V
9 Supercapacitor Maximum Charged Voltage (VscH ) 30.1 V
10 Supercapacitor Maximum Discharge Current 10 A
11 Load power (PL ) 10 KW

3 Mathematical Models

3.1 Model of Photovoltaic System

To deliver the total amount of output power into the network and maintain the
PV output power (mostly to track the maximum power point), a boost converter
is employed as shown in Fig. 2.
The voltage-current characteristic equation of a solar cell [5] module photo-
current:
Ir
Iph = [Isc + Ki (T − 298)] × (1)
1000
Here, Iph : : photo-current (A); Isc : : short circuit current (A); Ki : short-circuit current
of the cell at 25 °C and 1000 W/m2 ; T: operating temperature (K); Ir : : solar irradiation
(W/m2 ).
The current output of the PV module [5] as:

Fig. 2 PV equivalent circuit

238 K. R. Patel and J. J. Gadit

V
Ns
+I× Rs
Np
I0 = Np × Iph − Np × I0 × exp − 1 − Ish (2)
n × Vt

Here: Np : number of PV modules connected in parallel; Rs : : series resistance (Ω);

Rsh : : shunt resistance (Ω); Vt : diode thermal voltage (V).

3.2 Model of Supercapacitor

When dealing with loads that fluctuate and require more immediate power. Including
SC helps to enhance and alleviate most of these challenges. Supercapacitors are high-
density power storage devices that can be used to provide the increased current. For
this purpose, they cannot be used independently due to their poor energy density.
They are hence suitable with BESSs. The SC is connected in parallel to increase the
potential energy storage
2RESR CTOTAL
−
ηeff = e tdch
(3)

Here, ηeff : energy efficiency; RESR : total equivalent series resistance; CTOTAL : total
capacitance; tdch : discharging time.

−R t
Vc (t) = Vo e T Ctotal (4)

v(t) = vC (t) + vRESR (t) (5)

Here, Vc (t): voltage of the supercapacitor; Vo : voltage at initial condition; t: time.

so that, voltage V(t)

RESR − RT Ctotal
t

V (t) = Vo 1 + e (6)
RT

3.3 Model of Battery

Due to the sporadic nature of renewable energy sources, times when solar energy
production is little or non-existent need a battery storage device. The energy is stored
in the battery system may be used to provide the necessary power during peak and
non-peak hours. The battery’s efficiency and performance are contingent upon several
factors such as the surrounding temperature, charge level, voltage fluctuations, and
18 Modeling and Simulation of a Hybrid Energy Storage System for DC … 239

charging and discharging rates [4].

Q
V = E0 − K i − R0 · i + A · e(−B·it) (7)
Q − it

R0 is the internal resistance ();it = idt is the removed charge (Ah);A and B are
empirical constants (V), (1/Ah).

Q Q
V = E0 − K i∗ − K it − R0 · i + A · e(−B·it) (8)
Q − it Q − it

Vfull = E0 − R0 · I + A (9)

Q
Vexp = E0 − K Q + I − R0 · I + A · e(−B·Qexp ) (10)
Q − Qexp exp

4 Scenario Design

In this scenario, the simulation operation of DCMG is illustrated without grid power.
Hence, it has either surplus power or sufficient charge in the storage devices to feed
the load power requirement. The MATLAB Simulink model of the DC microgrid is
shown in Fig. 3.

Fig. 3 Model of the DC microgrid in MATLAB

240 K. R. Patel and J. J. Gadit

Fig. 4 Power comparison for Case:1

5 Simulation Results

This simulation analysis has different modes including random behavior of gener-
ation and loads and the deficit beyond the battery’s discharging. All the various
scenarios of DCMG without a utility power supply are discussed below:
Case 1: Ppv = Minimum , PBat− + PSC− < Pload
During evening time, the generation is less than the demand power due to the
irradiation. Therefore, SC and battery are provided to meet the required load demand
are shown in Fig. 4.
Case 2: Ppv = Pload
In the above Fig. 5, morning time power of pv are same as the demand of load
power. However, the power of pv is minimum or zero than the load demand in
morning time. Therefore, the fuel cell is used as a backup protection and satisfied
the load demand as shown in Fig. 6.
Case 3: Ppv = 0 or Minimum , PFC− < Pload
Also, Fig. 7 illustrates that when the load demand is higher than the power of PV
then the required power demand is fulfilled using the battery supercapacitor and fuel
cell and the power of generation is larger than the Load demand. The battery and SC
are charged as shown in Fig. 8.
Case 4: Ppv = 0 or minimum , PBat− + PSC− + PFC− < Pload
Case5 : Ppv + PBat++ + PSC++ > Pload
18 Modeling and Simulation of a Hybrid Energy Storage System for DC … 241

Fig. 5 Power comparison for Case:2

Fig. 6 Power comparison for Case:3

Fig. 7 Power comparison for Case:4

242 K. R. Patel and J. J. Gadit

Fig. 8 Power comparison for Case 5

6 Conclusion

This paper has presented different scenarios in the DC microgrid, wherein stochastic
output variations of the generation power due to the variation in irradiation can lead to
uncertainty in the microgrid. However, the combined Hybrid Energy Storage System
(HESS) such as a battery and supercapacitor can solve this problem and improve the
system’s stability and reliability. Therefore, to ensure the reliability, stability, and
robustness of the energy management strategy for residential applications consider
the time of use before applying it to the real simulation system.

References

1. Ahmed M, Datta M, Vahidnia R (2020) Stability and control aspects of microgrid architectures-a
comprehensive review. IEEE Access
2. Hatziargyriou N (2014) Microgrids: architectures and control. Wiley
3. Farrokhabadi M, Cañizares CA, Simpson-Porco JW, Nasr E, Fan L, Mendoza-Araya PA,
Tonkoski R, Tamrakar U, Hatziargyriou N, Lagos D et al. (2019) Microgrid stability definitions,
analysis, and examples. IEEE Trans Power Syst
4. Hama N, Weerawoot K, Siriroj S (2017) An evaluation of voltage variation and flicker severity
in micro grid. Int J Electr Eng Congress (IEECON)
5. Patel KR, Gadit J (2024) Power management and control of hybrid energy storage system in
a standalone DC microgrid. International multidisciplinary conference on emerging trends in
sustainable development (IMCETSC)
6. Sadhu RM, Patel KR, Gadit J (2024) Energy management using hybrid energy storage system
in DC microgrid: a review. International multidisciplinary conference on emerging trends in
sustainable development (IMCETSC)
7. Xu L, Chen D (2011) Control and operation of a dc microgrid with variable generation and
energy storage. IEEE Trans Power Del
8. Eghtedarpour N, Farjah E (2014) Distributed charge/discharge control of energy storages in a
renewable-energy-based dc micro-grid. IET Renew Power Gen
18 Modeling and Simulation of a Hybrid Energy Storage System for DC … 243

9. Hong J, Yin J, Liu Y, Peng J, Jiang H (2019) Energy management and control strategy of
photovoltaic/battery hybrid distributed power generation systems with an integrated three-port
power converter. IEEE Access
10. Cabrane Z, Kim J, Yoo K, Ouassaid M (2021) HESS-based photovoltaic/batteries/
supercapacitors: energy management strategy and DC bus voltage stabilization. J Solar Energy
11. Cabrane Z, Ouassaid M, Maaroufi M (2016) Analysis and evaluation of battery-supercapacitor
hybrid energy storage system for photovoltaic installation. Int J Hydrogen Energy
12. Zheng D, Wei D, Zhang W, Meng Z (2015) The study of supercapacitor’ transient power quality
improvement on Microgrid. IEEE Eindhoven PowerTech Eindhoven Netherlands
13. El-Shahat A, Sumaiya S (2019) DC-microgrid system design, control, and analysis. J
Electronics
14. Jing W, Lai CH, Wallace Wong SH, Dennis Wong ML (2017) Battery-supercapacitor hybrid
energy storage system in standalone DC microgrids: a review. Inst Eng Technol
15. Gaeed A (2022) Study of power management of standalone DC microgrids with battery
supercapacitor hybrid energy storage system. Int J Electr Comput Eng (IJECE)
16. Campagna N, Castiglia, Miceli R, Mastromauro RA Trapanese M, Viola F (2020) Battery
models for battery powered applications: a comparative study. J Energies
17. Nguyen XH, Nguyen MP (2015) Mathematical modelling of photovoltaic cell/module/arrays
with tags in MATLAB/Simulink. Environ Syst Resources
18. Chitransh A, Kumar S (2021) The different type of MPPT techniques for photovoltaic system.
Indian J Environ Eng (IJEE)
19. Kumar H, Kumar S (2018) Smart grid integration of solar energy system using MPPT with
incremental conductance and control analysis. International conference on power energy
environment and intelligent control (PEEIC)
20. Baghaee R, Mirsalim M, Gharehpetian GB, Talebi HA (2016) Reliability/cost based multi-
objective Pareto optimal design of stand-alone wind/PV/FC generation microgrid system. J
Energy
21. Jing W, Lai CH, Wallace Wong SH, Dennis Wong ML
22. Battery-supercapacitor hybrid energy storage system in standalone DC microgrids: a review
The Institution of Engineering and Technology (2017)
23. Masaud TM, El-Saadany EF (2020) Correlating optimal size, cycle life estimation, and tech-
nology selection of batteries: a two-stage approach for microgrid applications. IEEE Trans
Sustain Energy
24. Cabrane Z, Kim J, Yoo K, Ouassaid M (2021) HESS-based photovoltaic/batteries/
supercapacitors: energy management strategy and DC bus voltage stabilization, J Solar Energy
25. Zhou T, Sun W (2014) Optimization of battery–supercapacitor hybrid energy storage station
in wind/solar generation system. IEEE Trans Sustain Energy
26. Gomez-Gonzalez M, Hernandez JC, Vera D, Jurado F (2020) Optimal sizing and power
schedule in PV household-prosumers for improving PV self-consumption and providing
frequency containment reserve. J Energy
Chapter 19
Prediction of CKD: A Performance
Analysis of Six Machine Learning
Algorithms

Pallavi V. Baviskar, Vidya A. Nemade, and Vishal V. Mahale

1 Introduction

Chronic kidney disease (CKD) is a pervasive health condition characterized by sub-

stantial kidney damage, impairing the effective filtration of blood [1]. The kidneys’
primary role in eliminating waste and excess water from the blood is integral to urine
production. The gradual decline in kidney function associated with CKD leads to the
accumulation of waste within the body, marking it as a chronic condition with a global
impact [2]. CKD presents diverse health challenges, often associated with underlying
conditions such as diabetes, high blood pressure, and heart disease [3]. Age and gen-
der also contribute to susceptibility, with compromised kidney function manifesting
symptoms like back pain, stomach pain, diarrhea, fever, nosebleeds, and rash. Dia-
betes and high blood pressure emerge as significant contributors to long-term kidney
damage [4]. Managing and controlling these underlying diseases is crucial in pre-
venting CKD, considering its asymptomatic nature in the early stages. Recognition
often comes late, when the disease has advanced significantly. This paper delves into
the global landscape of CKD prevalence, emphasizing its gradual progression, its
association with prevalent health conditions, and the challenges posed by its asymp-
tomatic early stages. Approximately 10% of the world’s population is affected by
CKD, with varying prevalence rates observed globally, such as 10.8% in China,
10–15% in the United States, and 14.7 % Mexican adult general population [5–9].
Characterized by a slow deterioration in renal function, CKD may not show notice-
able symptoms until about 25% of kidney function is lost [10]. The disease’s high
morbidity and mortality rates, coupled with its potential to induce cardiovascular

P. V. Baviskar (B) · V. V. Mahale

Sandip Institute of Engineering and Management, Sandip Foundation, Nashik, India
e-mail: palbaviskar@[Link]
V. A. Nemade
PES’s Modern College of Engineering, Pune, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 245
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
246 P. V. Baviskar et al.

disease, underscore the urgency of prediction and early-stage diagnosis for timely
intervention [11–13]. CKD, recognized as a progressive and irreversible pathologic
syndrome, emphasizes the essentiality of proactive measures in prediction and diag-
nosis to ameliorate its progression [14]. Early detection and the management of risk
factors emerge as crucial strategies in mitigating the impact of CKD on global public
health.
Machine learning, as defined, pertains to a computer program capable of calcu-
lating and deducing information relevant to a given task, extracting characteristics
of corresponding patterns [11]. This technique has the potential to provide accurate
and cost-effective illness diagnoses, making it a potential method for identifying
chronic kidney disease (CKD). With the advancement of information technology,
machine learning has arisen as a new medical tool [12], finding extensive application
potential, particularly with the quick growth of electronic health records [13].
In health care, machine learning has already shown useful in identifying human
body status [14], assessing significant disease components [15, 16], and diagnosing a
range of disorders. The integration of machine learning into medical practices show-
cases its transformative impact on health care, facilitating enhanced diagnostic capa-
bilities and contributing to the broader landscape of medical advancements [17–19].
In this paper, we present a deep literature survey on machine learning models for
early prediction of CKD, and also we analyzed the performance of six algorithms
and provided comparative analysis.
The rest of the paper is organized as follows: Sect. 2 provides literature survey. and
Sect. 3 discusses our proposed approach for implementing the algorithm. Section 4
shows the results and performance of our proposed model. Section 5 concludes the
paper and provides future research directions.

2 Literature Survey

Predicting Chronic Kidney Disease (CKD) using machine learning is a broad and
evolving field, as demonstrated by various in-depth studies. In the research con-
ducted by Xiao et al. [20], a thorough examination of clinical and blood biochemical
measurements from 551 patients with proteinuria was undertaken. Multiple machine
learning models, including RF, XGB, LR, elastic net (ElasNet), lasso, ridge regres-
sion, KNN, Support Vector Machine (SVM), and Artificial Neural Network (ANN),
were compared for CKD risk prediction. The study revealed that ElasNet, lasso,
ridge, and LR demonstrated superior predictive performance, with LR ranking first,
achieving an AUC of 87.3%. In [21], researchers utilized various algorithms, includ-
ing SVM, AdaBoost, Linear Discriminant Analysis (LDA), and gradient boosting
(GBoost). Notably, the gradient boosting classifier emerged as the top performer,
achieving an impressive accuracy of 99.80%.
Similarly, a study [22] utilized a CKD dataset to create three different CKD
prediction models using logistic regression (LR), decision tree (DT), and K-nearest
neighbors (KNN) algorithms. LR achieved the highest accuracy at 97%, surpassing
19 Prediction of CKD: A Performance Analysis of Six Machine Learning Algorithms 247

DT at 96.25% and KNN at 71.25%. Another study [23] assessed Naïve Bayes (NB),
random forest (RF), and LR models for CKD risk prediction, with accuracies of 93.9,
98.88, and 94.76%, respectively. Studies [24, 25] used data from 455 patients and
real-time datasets to develop CKD risk prediction systems using RF and artificial
neural networks (ANN), achieving accuracies of 97.12 and 94.5%.
In addition, research [26] developed a machine learning model to predict CKD,
testing various classifiers including ANN, C5.0, LR, linear support vector machine
(LSVM), K-nearest neighbors (KNN), and RF. The LSVM algorithm showed the
highest accuracy at 98.86% when used with the Synthetic Minority Over-sampling
Technique (SMOTE) and all features included. The combination of SMOTE and
feature selection by LASSO regression provided better results compared to using
LASSO regression alone.
A study [27] implemented and evaluated nine different machine learning algo-
rithms, such as XGBoost, logistic regression, lasso regression, support vector
machine, random forest, ridge regression, neural network, Elastic Net, and KNN, for
CKD prediction, with linear models showing the highest accuracy. Another study
[28] used a CKD dataset from UCI with 400 instances and 25 attributes, applying
algorithms like Naïve Bayes and KNN, with KNN achieving the best accuracy. Simi-
larly, [29] employed classifiers including extra-trees (ET), AdaBoost, KNN, GBoost,
XGB, DT, Gaussian Naïve Bayes (NB), and RF, with KNN and ET showing the best
performance, achieving accuracies of 99 and 98%.
Additionally, a study [30] proposed an ANN-based regression analysis for man-
aging sparse medical datasets. The researchers introduced new variables to enhance
the radial basis function (RBF) input-doubling technique for output signal calcula-
tion. Another study [31] presented an innovative input-doubling method based on
the classical iterative RBF neural network, evaluated using a small medical dataset
and performance metrics like Mean Absolute Error and Root Mean Squared Error.
In a different study [32], an inventive data augmentation approach was used to
improve disease categorization based on generative adversarial networks (GAN).
Experiments on the NIH chest X-ray image dataset resulted in a test accuracy of
60.3% for the Convolutional Neural Network (CNN) model. The GAN-augmented
CNN model showed enhanced performance with a test accuracy of 65.3%. Another
study [33] introduced a supervised learning methodology to develop efficient models
for predicting CKD risk.
Researchers in [34] presented a method for data assertion and sample diagnosis in
CKD, using KNN for data assertion. Six classification algorithms, including logistic
regression, random forest, support vector machine, K-nearest neighbor, Naïve Bayes
classifier, and feed-forward neural network, were evaluated for diagnostic accuracy,
with random forest achieving the highest accuracy at 99.75%.
Researchers in [35] developed a neural network model to predict the risk of
Chronic Kidney Disease (CKD), achieving a 95% accuracy on a dataset of 40,000
instances. Another study [36] applied three models—K-nearest neighbor (KNN),
support vector machine (SVM), and soft independent modeling of class analogy
(SIMCA)—to a UCI dataset to calculate CKD risk. Both SVM and KNN models
reached an accuracy of 99.7%, with SVM showing strong performance even in the
248 P. V. Baviskar et al.

Table 1 Comparative analysis

Prediction model Dataset collection Accuracy (%)
Fuzzy C Big dataset 89
Gradient boosting 400 Patients and 25 attributes 99.1
SVM and ANN Kidney function test (KFT) 87.7
dataset
KNN and Naïve Bayes UCI CKD dataset with 400 100
occurrences and 25 attributes
KNN and Naïve Bayes Dataset with 24 attributes 97
BPN + GA 430 patients dataset 91.71
RF and ANN 455 patients from UCI 97.12
KNN, CFS, AdaBoost UCI 98

presence of noise. Given the invasive and costly nature of CKD, early detection is
crucial to prevent progression to advanced stages without treatment. In [37], an SVM
machine learning classifier achieved an accuracy of 93%.
In [38], researchers proposed early CKD detection for diabetic patients using
machine learning classifiers, utilizing data from a diabetes research center in Chennai.
The Naive Bayes classifier achieved the highest accuracy at 91%. Another study
[39] explored the use of Decision Tree, Random Forest, and SVM, including SVM
with various functions, on the MIMIC-II database, finding that Random Forest and
Decision Tree yielding prediction accuracies of 80 and 87%, respectively.
The authors in [40] built a model with various machine learning classifiers, con-
cluding that the Multiclass Decision Forest algorithm was best suited for the CKD
dataset, achieving an accuracy of 99.1%. Researchers [41] utilized the SVM algo-
rithm for CKD prediction, employing feature selection through two approaches:
Wrapper and filter. SVM achieved the highest accuracy, reaching 98.5%. A few
researchers from [42] worked on a CKD dataset from UCI, preprocessing data, iden-
tifying missing data, filling it with zeros, and then applying algorithms for important
attributes. The K-Nearest Neighbor (KNN) classifier achieved the highest accuracy.
Similarly, the authors in [42, 43] used machine learning models for prediction of
diseases too and have achieved accuracy of 98 and 93%.
Table 1 provides a comparative analysis, summarizing prediction models, dataset
collections, and accuracies. It offers insights into the diverse approaches used in
CKD prediction, emphasizing the importance of tailored solutions based on dataset
characteristics and algorithmic strengths.

3 Proposed System

In our study on Chronic Kidney Disease (CKD) prediction, we proposed a hybrid

approach combining Support Vector Machines (SVM) with other machine learning
algorithms including J48, Random Forest (RF), Naive Bayes (NB), Multilayer Per-
ceptron (MLP), and K-nearest neighbors (KNN). The hybrid model utilizes SVM
19 Prediction of CKD: A Performance Analysis of Six Machine Learning Algorithms 249

as its core component, leveraging its ability to outperform other classifiers in CKD
prediction tasks. Our experimental results demonstrate that the SVM-based hybrid
model surpasses individual classifiers when trained on comprehensive feature sets
under similar conditions. Furthermore, existing literature primarily addresses this
challenge through advanced feature engineering and selection techniques. Naive
Bayes and Random Forest algorithms are recognized for their capability to discern
underlying data structures compared to alternative methods.

3.1 Dataset

The Chronic Kidney Disease dataset was used in this study. It was retrieved from
the Kaggle machine learning repository. The data was collected over a 2-month
period in India and included 25 features such as RBC count, WBC count, subject ID,
diabetes, hypertension, creatinine, urea, albuminuria, age, gender, GFR, and CKD
risk evaluation. The aim is the ‘classification’, which is either ‘ckd’ or ‘notckd’. CKD
stands for chronic renal disease. There are 400 rows.

3.2 Architecture

The proposed system architecture is depicted in Fig. 1, which includes preprocessing,

classification model, testing, and parameter evaluations.
The CKD dataset undergoes rigorous preprocessing to prepare it for machine
learning algorithms. This includes meticulous handling of missing values, outliers,

Fig. 1 Proposed architecture

250 P. V. Baviskar et al.

and inconsistencies to ensure data integrity. Relevant features are carefully selected to
enhance the predictive power of the model. Numerical variables are scaled to prevent
bias due to differing scales, while categorical variables are encoded to enable their
use in algorithms. Class imbalance is addressed through various techniques such as
oversampling or undersampling. Finally, the dataset is split into training and testing
sets, preserving class distributions to accurately evaluate model performance. These
preprocessing steps are crucial in transforming the raw data into a suitable format,
ultimately improving the accuracy and reliability of the CKD prediction model.
J48 Algorithm: The J48 algorithm, derived from the C4.5 algorithm, is exten-
sively applied in both categorical and continuous data analysis across diverse fields.
Notably, it finds utility in interpreting clinical data for diagnosing coronary heart
disease, classifying E-governance data, and similar tasks.
RF Algorithm: Random Forest stands out as a popular supervised learning algo-
rithm adept at handling classification and regression problems. It uses ensemble
learning, which combines numerous classifiers to solve complex issues and improve
model performance. Random Forest consists of several decision trees built on dis-
tinct subsets of the dataset, and averaging their outputs improves the model’s forecast
accuracy.
The K-Nearest Neighbor Algorithm: The K-nearest neighbor (KNN) technique,
a nonparametric method, is frequently used in classification and regression tasks. It
works by evaluating the K-nearest training instances in the feature space. In KNN
classification, an object’s class membership is determined by a majority vote among
its K-nearest neighbors. When K = 1, the object is allocated to the class of its single
nearest neighbor.
Naïve Bayes Algorithm: Leveraging Bayes’ Theorem, the Naïve Bayes algorithm is
a classification technique assuming independence among predictors. It computes the
probability of an object belonging to a specific class based on the presence of various
features, selecting the class with the highest probability. The algorithm presupposes
that the presence of a particular feature in a class is unrelated to the presence of any
other feature.
SVM Algorithm: Support Vector Machines (SVMs) are supervised learning models
employed for classification and regression analysis. They construct a model by scru-
tinizing a set of training examples, assigning new examples to different categories
based on their attributes. SVMs map examples as points in a space, endeavoring to
establish a distinct gap between various categories. Subsequently, new examples are
mapped into this space and forecasted to belong to a particular category based on
their position relative to the gap.
MLP Algorithm: A Multilayer Perceptron (MLP) represents a fully connected feed-
forward artificial neural network, comprising multiple layers of perceptrons inter-
connected to subsequent layers. Often termed ‘vanilla’ neural networks, particularly
with a single hidden layer, MLPs are widely employed for tasks like classification and
regression. They possess the capability to discern intricate patterns and relationships
within the data.
19 Prediction of CKD: A Performance Analysis of Six Machine Learning Algorithms 251

After implementing this algorithm our proposed model evaluates the performance
of the algorithms using evaluation parameter precision, recall, F1-score, and accu-
racy.

4 Results

In this section, we present the outcomes of the developed system for predicting
heart disease. The performance of the algorithm is evaluated using metrics such as
Accuracy, Precision (P), Recall (R), and F-measure. Precision, as defined in Eq.
(1), offers a correct assessment of positive analysis, while Recall (Eq. 2) indicates
the quantity of true positives accurately identified. Accuracy is assessed through the
F-measure (Eq. 3):
TP
Precision = (1)
TP + FP

TP
Recall = (2)
TP + FN

2 ∗ Precision ∗ Recall
F-Measure = (3)
Precision + Recall

• True Positive (TP ): The test is positive as well as the patient has the CKD.
• False Positive (FP): The test is positive although the patient does not have CKD.
• True Negative (TN): The test is negative as well as the patient does not have CKD.
• False Negative (FN): The patient does have CKD, but the test came back negative.
The experimentation involves the utilization of the preprocessed dataset for con-
ducting tests, exploring and applying the aforementioned techniques. Table 2 shows
the comparative analysis and results obtained on the CKD dataset.
Table 2 shows the performance of six machine learning algorithms. SVM was
performs with the highest Precision (98%), Recall (96%), F-Measure (97%), and
Accuracy (97%). Random Forest also performed well with Precision (97%), Recall
(93%), F-Measure (95%), and Accuracy (91%). While J48, Naive Bayes, MLP, and
KNN showed good results, they were slightly outperformed by SVM and Random
Forest, making these two the most reliable for CKD prediction.

Table 2 Analysis of algorithms

Evaluation J48 (%) Random Naive SVM (%) MLP (%) KNN (%)
parameter forest (%) Bayes (%)
Precision 95 97 95 98 97 97
Recall 81 93 84 96 90 90
F-measure 87 95 89 97 83 83
Accuracy 80.22 91 84 97 89 89
252 P. V. Baviskar et al.

The result obtained for all six algorithms on the evaluation parameters is shown
in Figs. 2 and 3 and ROC Curve is depicted in Fig. 4.
From the graphs, it is observed that SVM classifier has gained the highest accuracy
of 97% as compared to other algorithms. Random Forest and MLP have shown the
accuracy of 89%.

Fig. 2 Comparative accuracy of machine learning algorithms

Fig. 3 Precision, recall, and F1-score comparative analysis

19 Prediction of CKD: A Performance Analysis of Six Machine Learning Algorithms 253

Fig. 4 ROC curve

5 Conclusion

In this paper, we present a comprehensive literature survey on CKD prediction using

machine learning algorithms and also implemented six machine learning classifiers
to analyze them. Machine learning algorithms provide varying levels of accuracy
for CKD diagnosis; based on the number of attributes looked into during classifica-
tion. we researched the state-of-the-art system and found that critical research gaps
exist, including the need for standardized evaluation metrics, improved interpretabil-
ity of high-performing models like gradient boosting, and exploration of imbalanced
datasets’ impact. The proposed system is designed and developed in Python and is
tested on the dataset available at Kaggle repository. Results show that SVM outper-
forms as compared to other algorithms and has achieved the accuracy of 97%. Future
research should address some variations to enhance the reliability and applicability
of CKD prediction models, advancing early detection and management in health
care.

References

1. Chen Z, Zhang X, Zhang Z (2016) Clinical risk assessment of patients with chronic kidney
disease by using clinical data and multivariate models. Int Urol Nephrol 48:2069–2075. https://
[Link]/10.1007/s11255-016-1346-4
2. Charleonnan A et al (2016) Predictive analytics for chronic kidney disease using machine
learning techniques. In: 2016 Management and innovation technology international conference
(MITicon). IEEE
254 P. V. Baviskar et al.

3. Subasi A, Alickovic E, Kevric J (2017) Diagnosis of chronic kidney disease by using ran-
dom forest. In: Badnjevic A (eds) CMBEBIH 2017. IFMBE Proceedings, vol 62. Springer,
Singapore. [Link]
4. Zhang L et al (2012) Prevalence of chronic kidney disease in China: a cross-sectional survey.
Lancet 379(9818):815–822
5. Chen Z, Zhang Z, Zhu R, Xiang Y, Harrington PB (2016) Diagnosis of patients with chronic
kidney disease by using two fuzzy classifiers. Chemometrics Intell Lab Syst 153:140145
6. Subasi A, Alickovic E, Kevric J (2017) Diagnosis of chronic kidney disease by using random
forest. In: Proceedings of the international conference on medical and biological engineering,
pp 589–594
7. Zhang L (2012) Prevalence of chronic kidney disease in China: A crosssectional survey. Lancet
379:815822
8. Singh A, Nadkarni G, Gottesman O, Ellis SB, Bottinger EP, Guttag JV (2015) Incorporating
temporal EHR data in predictive models for risk stratification of renal function deterioration.
J Biomed Inform 53:220228
9. Cueto-Manzano AM, Cortés-Sanabria L, Martínez-Ramírez HR, Rojas-Campos E, Gómez-
Navarro B, Castillero-Manzano M (2014) Prevalence of chronic kidney disease in an adult
population. Arch Med Res 45(6):507513
10. Polat H, Mehr HD, Cetin A (2017) Diagnosis of chronic kidney disease based on support vector
machine by feature selection methods. J Med Syst 41(4):55
11. Barbieri C, Mari F, Stopper A, Gatti E, Escandell-Montero P, Martínez-Martínez JM, Martín-
Guerrero JD (2015) A new machine learning approach for predicting the response to anemia
treatment in a large cohort of end stage renal disease patients undergoing dialysis. Comput Biol
Med 61:5661
12. Papademetriou V, Nylen ES, Doumas M, Probsteld J, Mann JF, Gilbert RE, Gerstein HC
(2017) Chronic kidney disease, basal insulin glargine, and health outcomes in people with
dysglycemia: the ORIGIN study. Am J Med 130(12):1465.e27–1465.e39
13. Hill NR (2016) Global prevalence of chronic kidney disease: a systematic review and meta-
analysis. PLoS One 11(7), Art. no. e0158765
14. Hossain MM, Detwiler RK, Chang EH, Caughey MC, Fisher MW, Nichols TC, Merricks
EP, Raymer RA, Whitford M, Bellinger DA, Wimsey LE, Gallippi CM (2019) Mechanical
anisotropy assessment in kidney cortex using ARFI peak displacement: preclinical validation
and pilot in vivo clinical results in kidney allografts. IEEE Trans Ultrason Ferroelectr Freq
Control 66(3):551–562
15. Alloghani M, Al-Jumeily D, Baker T, Hussain A, Mustana J, Aljaaf AJ (2018) Applications of
machine learning techniques for software engineering learning and early prediction of students’
performance. In: Proceedings of the international conference on soft computing in data science,
pp 246–258
16. Gupta D, Khare S, Aggarwal A (2016) A method to predict diagnostic codes for chronic
diseases using machine learning techniques. In: Proceedings of the international conference
on computing, communication and automation (ICCCA), pp 281–287
17. Du L, Xia C, Deng Z, Lu G, Xia S, Ma J (2018) A machine learning based approach to identify
protected health information in Chinese clinical text. Int J Med Inform 116:2432
18. Abbas R, Hussain AJ, Al-Jumeily D, Baker T, Khattak A (2018) Classification of foetal distress
and hypoxia using machine learning approaches. In: Proceedings of the international conference
on intelligent computing, pp 767–776
19. Mahyoub M, Randles M, Baker T, Yang P (2018) Comparison analysis of machine learning
algorithms to rank Alzheimer’s disease risk factors by importance. In: Proceedings of the 11th
international conference on developments in eSystems engineering (DeSE), pp 1–11
20. Xiao J, Ding R, Xu X, Guan H, Feng X, Sun T, Zhu S, Ye Z (2019) Comparison and development
of machine learning tools in the prediction of chronic kidney disease progression. J Transl Med
17:119
21. Ghosh P, Shamrat FJM, Shultana S, Afrin S, Anjum AA, Khan AA (2020) Optimization of
prediction method of chronic kidney disease using machine learning algorithm. In: Proceedings
19 Prediction of CKD: A Performance Analysis of Six Machine Learning Algorithms 255

of the 2020 15th international joint symposium on artificial intelligence and natural language
processing (iSAI-NLP), Bangkok, Thailand, pp 1–6
22. Ifraz GM, Rashid MH, Tazin T, Bourouis S, Khan MM (2021) Comparative analysis for pre-
diction of kidney disease using intelligent machine learning methods. Comput Math Methods
Med 2021:6141470
23. CKD Prediction Dataset. Available online: [Link]
chronic-kidney-disease. Accessed on 27 June 2022
24. Islam MA, Akter S, Hossen MS, Keya SA, Tisha SA, Hossain S (2020) Risk factor prediction
of chronic kidney disease based on machine learning algorithms. In: Proceedings of the 2020
3rd international conference on intelligent sustainable systems (ICISS), Palladam, India, pp
952–957
25. Yashfi SY, Islam MA, Sakib N, Islam T, Shahbaaz M, Pantho SS (2020) Risk prediction of
chronic kidney disease using machine learning algorithms. In: Proceedings of the 2020 11th
international conference on computing, communication and networking technologies (ICC-
CNT), Kharagpur, India, pp 1–5
26. Chittora P, Chaurasia S, Chakrabarti P, Kumawat G, Chakrabarti T, Leonowicz Z, Jasiński M,
Jasiński Ł, Gono R, Jasińska E et al (2021) Prediction of chronic kidney disease-a machine
learning perspective. IEEE Access 9:17312–17334
27. Xiao J, Ding R, Xu X, Guan H, Feng X, Sun T, Zhu S, Ye Z (2019) Comparison and development
of machine learning tools in the prediction of chronic kidney disease progression. J Transl Med
17:119
28. Drall S, Drall GS, Singh S, Naib BB (2018) Chronic kidney disease prediction using machine
learning: a new approach. Int J Manage Technol Eng 8:278–287
29. Baidya D, Umaima U, Islam MN, Shamrat FJM, Pramanik A, Rahman MS (2022) A deep
prediction of chronic kidney disease by employing machine learning method. In: Proceedings
of the 2022 6th international conference on trends in electronics and informatics (ICOEI),
Tirunelveli, India, pp 1305–1310
30. Izonin I, Tkachenko R, Dronyuk I, Tkachenko P, Gregus M, Rashkevych M (2021) Predictive
modeling based on small data in clinical medicine: RBF-based additive input-doubling method.
Math Biosci Eng 18:2599–2613
31. Izonin I, Tkachenko R, Fedushko S, Koziy D, Zub K, Vovk O (2021) RBF-based input doubling
method for small medical data processing. In: Proceedings of the international conference on
artificial intelligence and logistics engineering, Kyiv, Ukraine. Springer, Berlin/Heidelberg,
Germany, pp 23–31
32. Bhattacharya D, Banerjee S, Bhattacharya S, Uma Shankar B, Mitra S (2020) GAN-based
novel approach for data augmentation with improved disease classification. In: Advancement
of machine intelligence in interactive medical image analysis. Springer, Berlin/Heidelberg,
Germany, pp 229–239
33. Dritsas E, Trigka M (2022) Machine learning techniques for chronic kidney disease risk pre-
diction. Big Data Cogn Comput 6:98. [Link]
34. Qin J, Chen L, Liu Y, Liu C, Feng C, Chen B (2020) A machine learning methodology for
diagnosing chronic kidney disease. IEEE Access 8:2099121002
35. Vasquez-Morales GR, Martinez-Monterrubio SM, Moreno-Ger P, Recio-Garcia JA (2019)
Explainable prediction of chronic renal disease in the Colombian population using neural
networks and case-based reasoning. IEEE Access 7:152900152910
36. Chen Z, Zhang X, Zhang Z (2016) Clinical risk assessment of patients with chronic kidney
disease by using clinical data and multivariate models. Int Urol Nephrol 48(12):20692075
37. Amirgaliyev Y, Shamiluulu S, Serek A (2018) Analysis of chronic kidney disease dataset by
applying machine learning methods. In: Proceedings IEEE 12th international conference on
application of information and communication technologies (AICT), pp 1–4
38. Padmanaban KRA, Parthiban G (2016) Applying machine learning techniques for predicting
the risk of chronic kidney disease. Indian J Sci Technol 9(29)
39. Kilvia De Almeida L, Lessa L, Peixoto A, Gomes R, Celestino J (2020) Kidney failure detec-
tion using machine learning techniques. In: Proceedings of the 8th international workshop on
advances in ICT infrastructures and services, pp 1–8
256 P. V. Baviskar et al.

40. Gunarathne W, Perera KDM, Kahandawaarachchi KADCP (2017) Performance evaluation on

machine learning classification techniques for disease classification and forecasting through
data analytics for chronic kidney disease (CKD). In: Proceedings of the IEEE 17th International
Conference on Bioinformatics and Bioengineering (BIBE), pp 291–296
41. Polat H, Mehr HD, Cetin A (2017) J Med Syst 41(4):111. [Link]
0703-x
42. Mahale VV, Nandre AG, Korade MV, Hiray NR (2023) Machine learning algorithms and grid
search cross validation: a novel approach for diabetes detection. In: International conference
on paradigms of communication, computing and data analytics. Springer Nature Singapore,
pp 571–581
43. Mahale VV, Hiray NR, Korade MV (2023) Enhanced heart disease prediction using hybrid
random forest with linear model. In: International conference on computer vision and robotics.
Springer Nature Singapore, pp 389–397
Chapter 20
Unlocking the Power of Personalized
Content with Generative AI
in Healthcare Marketing

Tamanna, Subarna Rana, Vaibhav Kant Agrawal, and Manoj Kumar Dasi

1 Introduction

Healthcare Marketing is a field that requires extensive personalization. In the ever-

evolving healthcare landscape, one-size-fits-all marketing strategies leave patients
feeling unheard and doctors yearning for relevant information. Generic messages
miss the mark, failing to resonate with individual needs and concerns. This discon-
nect impedes engagement, wastes resources, and ultimately, hinders the delivery of
optimal care.
Patients seek personalized care, and doctors crave relevant information tailored
to their specific needs and expertise. Yet, traditional methods often struggle with
cost-effectiveness and scalability, leaving both audiences underserved and discon-
nected [1]. This disconnect translates to missed opportunities, wasted resources, and
ultimately, a less-than-optimal healthcare experience for all.
Enter GenAI, a cutting-edge language model poised to revolutionize healthcare
marketing by unlocking the power of personalization. By harnessing the insights
of your brand’s mission and values, GenAI weaves unique narratives for individual
doctors and patients, crafting targeted content that resonates deeply on a human level
[2].
This shift from generic broadcasts to personalized touchpoints is not merely about
efficiency, but about building genuine connections. Application of AI technologies in
healthcare has both overt and covert barriers [3]. By analyzing individual needs and
concerns, GenAI goes beyond automation to generate messages that feel empathetic
and human centric. This fosters trust, strengthens engagement, and ultimately leads
to better healthcare outcomes for everyone [4].

Tamanna (B) · S. Rana · V. K. Agrawal · M. K. Dasi

Centre of Excellence, Data Science and AI, Publicis Sapient, New Delhi, India
e-mail: mtamanna083@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 257
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
258 Tamanna et al.

However, with the power of AI comes the responsibility to wield it ethically.

As we navigate this exciting frontier, careful consideration must be given to data
privacy, transparency, and potential biases [3]. By embracing ethical best practices
and focusing on human-centered design, we can ensure that GenAI empowers, rather
than replaces, the human touch in healthcare marketing.

2 Literature Survey

Generative models have been around since the 1950s, starting with models like
HMMs [5] and GMMs [6] that produced sequential data. However, their capabilities
reached new heights with the rise of deep learning, leading to substantial improve-
ments in both performance and versatility. GenAI has gained tremendous interest
in various fields other than Computer Science for generating specific content such
as text, audio, and images etc. [7]. Transformer neural architecture based on the
encoder-decoder method is the core strength of GenAI state-of-the-art models such
as GPT-2 [8], DALL-E-2 [9], and Gopher [10]. It has resolved the limitations of tradi-
tional neural networks such as Recurrent Neural Networks (RNN) using attention
mechanisms [11]. Transformer based pre-trained language models (PTLM) exhibit
superior performance in terms of relevant content generation. Highly efficient GPUs
such as NVIDIA A100, distributed training, and cloud computing have unleashed
immense potential for GenAI applications. The current PTLMs can be categorized
as shown in Table 1.
The model we are going to use in this study is GPT-3.5 which is an autoregressive
language model and suitable for text generation based on user instructions. It is like
a word predictor. It looks at the words that have already been written and uses that
information to guess what the next word should be. This makes it great for creating
text, as it can generate it one word at a time.

Table 1 Pre-trained language models (PTLM)

Category Model Type Typical Tasks Strengths Weaknesses
Encoder Masked language Text Captures Can’t generate
model classification, context text
question information
answering well
Decoder Autoregressive Text generation, Can generate Relies heavily on
language model translation fluent and the training data
coherent text
Encoder-decoder Encoder-decoder Machine Combines the Can be
language model translation, text strengths of computationally
summarization encoders and expensive
decoders
20 Unlocking the Power of Personalized Content with Generative AI … 259

3 Case Study

Crafting personalized healthcare communications can be a daunting task. Each piece

of communication must resonate with the needs and thinking of the target audience,
their physical, ethical, moral and social considerations. In this section we explain the
problem statement and propose a solution in the form of an app framework which
utilizes the power of Retrieval Auguemented Generation (RAG) [12] in association
with Large Language Models [11, 13].

3.1 Problem Statement

Healthcare marketers face a critical challenge: tailoring messages to diverse target

audience needs while adhering to resource constraints and budget. For example, the
content targeted for a person with a negative attitude towards medicines must be
different from a person with a positive attitude. Traditional methods are often time-
consuming and resource-intensive, resulting in generic communications that fail to
resonate and drive engagement. This leads to:
• Ineffective campaigns: generic messaging yields low click-through rates and poor
Return On Investment (ROI),
• Wasted resources: manual content creation consumes valuable time and personnel,
hindering productivity, and
• Missed opportunities: Failing to personalize messages disengages target audience
and limits trust
We propose Craft-Persona-Rx, a healthcare marketing app leveraging the LLM-
RAG-Generative AI framework. This innovative tool automatically generates person-
alized marketing content such as banners or emails, addressing the above referred
challenges by: saving time and resources (automate content creation, freeing up
marketing teams for strategic planning and analysis), boosting engagement and ROI
(personalized messages resonate with individual patients, increasing click-through
rates and conversions) and enhancing patient relationships (tailored communication
builds trust and loyalty, leading to improved healthcare outcomes).

3.2 Experimental Design

Craft-Persona-Rx app is built using RAG based LLM. In this app we are using
OpenAI LLM model GPT-3.5 [14] also known as “writer’s best friend” because of the
capability of generating high quality content. RAG helps in bringing the dynamicity
in static paramedic knowledge of LLMs. It also helps in providing the correct context
for generating application specific content.
260 Tamanna et al.

Fig. 1 Flow chart of Craft-Persona-Rx app

Figure 1 depicts the flowchart of Craft-Persona-Rx and includes a number of

steps:
Step 1
In ideal case (real-time scenario), we have to collect and study all the data related
to marketing campaigns (details of active and upcoming marketing campaigns,
including target audiences, key messages, promotional materials, and call-to-action
strategies), media assets (logos, brand colors, fonts, marketing materials),brand
messaging (company’s core values, mission, and personality) and cultural context
(like media consumption habits, local festivals and offers, local language and dialect).
This will guide the app’s tone and voice, ensuring consistency across channels.
Step 2
After gathering all the data, the main purpose is to extract the information which will
work as an authentic source of information for personalization. For this we need to
extract mapping of brand claims with respect to target audience, persona, and topics.
Firstly, by going through all the documents in a glance we must figure out what are
the target audience, personas and themes for any brand and pass this information
to LLM with brand claims and examples. The task of the LLM is to categorize the
available raw data into categories defined by us. For simplicity, we have considered
20 Unlocking the Power of Personalized Content with Generative AI … 261

the brand of medicine, target audience, persona of target audience, and theme of
content (Table 2).
Now we are ready with a cleaned and mapped database with all the attributes. For
this case study we have created dummy data for avoiding conflict of interest. As this
field is new, we couldn’t find any publicly available data for the same. For fine tuning
the LLM we used live examples of medicine brands that are publicly available on
their websites and created this Database (DB). Our DB consists of 120 rows (thirty
rows for every brand) and five columns (Table 2).
This DB consists of following headers: Target Audience, Brand Content, Brand,
Persona and Theme. We have taken four fictitious brands:
• Migraineaid (for migraine)
• Glucoreg (for diabetes)
• Mobilium (for arthiritis), and
• Memora (for alzhiemer).
We have two types of target audiences: Patients and doctors. As both have different
mindset that’s why marketing content is also different for both of them. Next thing is
persona which means personality, further categorization for understanding behavior
of target audiences. In this DB we have four personas two for doctors and two for
patients. Conservative (who stick to proven methods only) and Pathfinder (who search

Table 2 Glimpse of the dataset generated and used in case study

Brand Brand content Target Theme Persona
audience
Migraineaid Transformative testimonials: Inspire Patient General Naysayer
the optimistic with stories of positive
change and renewed enjoyment
Glucoreg Smart Savings, Sweet Success: Patient Cost-effective Optimistic
Glucoreg—Your Affordable Partner
for a Healthy, Happy You
Mobilium Trusted by healthcare professionals Doctor Safety Conservative
Memora Unwavering Safety, Proven Efficacy: Doctor Safety Pathfinder
Memora—Your Confident Partner in
Alzheimer’s Care
Migraineaid Tailored treatment with Patient General Optimistic
MIGRAINEAID
Glucoreg #1 choice for controlling Diabetes Patient Cost-effective Naysayer
Mobilium Unparalleled efficacy with Mobilium Doctor Safety Pathfinder
Memora Memora: Because Every Memory Doctor General Conservative
Deserves a Fighting Chance
Glucoreg Uncompromising Care, Predictable Doctor General Optimistic
Results: Glucoreg—Confidence in
Every Prescription
262 Tamanna et al.

for new methods for curing patients) are types of doctors while naysayer (difficult to
please) and optimistic (open to experiment new things) are for patients.
Themes are the identified topics on which we want to create our content like
safety oriented or cost-effective etc. These themes are helpful in creating variations
in generating content with touch of personalization. Based on the above discussed
headers, we have brand content that satisfies above categorization.
Step 3
Now when the user (healthcare marketers) wants to create marketing content (banner
or emails) comes up with specific needs like content type, brand, persona and target
audience.
Step 4 and 5
After taking the desired information from the user, the orchestrator will fetch
respective data (Brand Content) from DB.
Step 6
We have two prompt [15] channels for content generation: one for emails and one for
banners. Here we select the respective prompt according to user preference. Creating
and optimizing prompt according to application specifications is not a one time
process as it requires various iterations and experimentation (like no shot training,
one shot training and multi-shot training etc.) For this case study we used one shot
training for prompt optimization (it suits well for our dataset).
Step 7
After prompt selection and gathering the desired set of information we are ready
with the final prompt.
Step 8
This final prompt is fed to LLM for generating the content as per specifications. In
this case study we restrict our prompt to create text content only and not images or
any other media element.
Step 9
After getting the response from LLM, our next step is validation. Due to the sensitive
nature of marketing for healthcare brands, there are several key areas where LLMs
should be excluded from responding like Direct diagnosis or treatment recommen-
dations, Emotional appeals or fear tactics, or discriminative language, and false and
misleading information. For avoiding all these types of content, validating LLM
response becomes an essential step. For avoiding all these types of content, vali-
dating LLM response becomes an essential step. We have a list of exclusions for
catering this kind of generated data and also used guardrails in the prompt itself.
For accessing the quality of generated text, we used a combination of evaluation
metrics called READ SCORE. Single performance metric is not able to evaluate the
quality of output that is why we use combined score for covering different dimensions
20 Unlocking the Power of Personalized Content with Generative AI … 263

of evaluation (like BERT score is about accessing the goodness of generated text,
while ARI is for accessing readability complexity). READ SCORE of the generated
text, which is a combination of BERT Score [14], Automated Readability Index [16],
and Linsear Write [17].
AutomatedReadabilityIndex
+ BERTScore
+ LinsearWrite
READSCORE = 2.8 10
3
All the scores are on a different scale (like BERT is between -1 to 1, ARI is in 1
to 28). To bring them on the same scale, we normalize them in a range of 0 to 10. For
maintaining the quality of generated text, we set a threshold of 5. If the LLM output
READ Score is less than 5, it is rejected, and the output is generated again. If LLM
output passes all the validation checks; then it will be shown as the final response
to the user, otherwise it will be fed to LLM again until it passes all the validation
checks.

3.3 Craft-Persona-Rx Generated Content Analysis

In this section we explained how GenAI based solutions are helpful in shortening the
marketing life cycle with less human dependency and resource efficiency in terms
of time and cost with more layers of personalization.
Layer Based Personalization
As explained earlier here our main goal is to provide flexibility to users for creating
personalized content with minimal information. In Craft-Persona-Rx we have person-
alization layers which enable users to generate content with any variation they want.
We have designed a user interface (UI) for the healthcare marketers whereby they
can select various personalization options to generate content (Fig. 2).
If we want to generate banners for patients with an ‘optimistic’ profile and
Migraineaid brand, the same can be selected in the UI and output generated by
clicking the Generate Content button (Fig. 3).
The options selected by the user get embedded in a prompt through the DB which
is then processed by the LLM. Prompt plays a vital role in content generation. It is a
set of instructions from LLM to generate the content. A sample prompt is as shown
below in italics:
You are an expert marketing content generator in the field of Healthcare marketing.
Your task is to help the USER generate {WHAT TO GENERATE} for the brand
{BRAND}. The marketing content is to be generated for {TARGET AUDIENCE} with
a personality of {PERSONA} and content should highlight the {THEME} aspects of
the brand.
Follow the below mentioned rules strictly while generating the content:
Do not use derogatory language.
Use polite and respectful language.
264 Tamanna et al.

Fig. 2 Craft-Persona-Rx UI

Fig. 3 Generate Content Here is your personalized banner content:

using Persona-Craft-Rx UI

frame1: 8 out of 10 patients would

recommend a friend or family member talk
to their doctor about Migraineaid.

frame2: Listen to what Rihanna has to say

about the only migraine medication that can
treat and prevent, all in one!

frame3: Migraineaid helped Rihanna find

relief from migraine pain.

frame4: Talk to your doctor about

Migraineaid today!

Call to action: Get Relief Now

20 Unlocking the Power of Personalized Content with Generative AI … 265

Fig. 4 Craft-Persona-Rx Marketing Lifecycle

Use only the following content to generate: {BRAND CONTENT}

Example of content: {EXAMPLES}.
Short Marketing Life Cycle
The complete marketing content generation lifecycle will be modified as shown in
Fig. 4. Earlier all the stages were human-led and manual. ‘Create’ step was most time-
consuming as at this stage, where marketers used to design the requisite content. This
work is now taken up by Craft-Persona-Rx. There is substantial cost saving in the
process as the cost of Craft-Persona-Rx is mainly LLM costs (token costs) which is
very less and has less human dependency [18]. So, we can say that Craft-Persona-Rx
based on GenAI-RAG is less human dependent, time and cost saving substitute for
marketing life cycles. It is also able to provide different layering of personalization
in the hands of the user and even generate content with minimal specifications.

3.4 Ethical Considerations

Craft-Persona-Rx harnesses the RAG framework to deliver content rooted in propri-

etary data, ensuring a unique and tailored user experience. While the foundational
LLM model draws from extensive web-crawled data, the generated content is metic-
ulously crafted to exclude any trace of this underlying information. This deliberate
separation is maintained through carefully designed prompts and rigorous human
266 Tamanna et al.

review processes prior to content production. The symbiotic relationship between

generative AI and human feedback further enhances content refinement, contin-
uously improving quality and relevance. Should any objectionable content arise,
swift rectification is facilitated through prompt updates or the implementation of
guardrails, preserving the integrity and appropriateness of the generated material.
Through these proactive measures, Craft-Persona-Rx remains committed to deliv-
ering ethically sound and high-quality content while safeguarding against undesirable
outcomes.

4 Conclusion and Future Directions

Craft-Persona-Rx is a specialized and personalized app for generating healthcare

marketing content. We can generate thousands of variations of the content in an hour
which would have been close to impossible in conventional modus operandi involving
human content creators. The app can also be embedded in a pipeline whereby live
content can be provided to target users through mobiles, tablets etc. hence it is highly
scalable.
The task of content creation in every field is steadily utilizing the power of LLMs.
We have employed OpenAI LLM—GPT 3.5 for this app which can be replaced
with own server hosted LLMs such as Llama-2 [19]. Local LLMs can be fine-tuned
using Parameter Efficient Fine Tuning (PEFT) [20], LoRA [21], and QLoRA [22] for
specific tasks. The importance of data in LLM tasks cannot be understated. Quality
of data determines the output of the LLM.
The main challenge to GenAI comes from ‘hallucination’ or unreliable or irrel-
evant output by the LLM. This behavior is to a great extent subsided by providing
context to the LLM as in Craft-Persona-Rx app. Further, output can be regulated by
purging unwanted words and sentences.
Currently, Craft-Persona-Rx is generating only text content. A further exploration
would be to generate relevant images and other media based on user input through
LLMs. Also, end users can be given capability to modify model parameters through
Reinforcement Learning from Human Feedback (RLHF) [23].

References

1. Haeba Ramli A, Sjahruddin H (2015) Building patient loyalty in healthcare services. Int Rev
Manag Bus Res 4:391–401
2. Lee D, Yoon SN (2021) Application of artificial intelligence-based technologies in the health-
care industry: opportunities and challenges. Int J Environ Res Public Health 18:271. https://
[Link]/10.3390/ijerph18010271
3. Ali O, Abdelbaki W, Shrestha A, Elbasi E, Alryalat MAA, Dwivedi YK (2023) A system-
atic literature review of artificial intelligence in the healthcare sector: Benefits, challenges,
20 Unlocking the Power of Personalized Content with Generative AI … 267

methodologies, and functionalities. J Innov Knowl 8:100333. [Link]

2023.100333
4. Johnsen M (2017) The future of artificial intelligence in digital marketing: the next big
technological break
5. Knill K, Young S (1997) Hidden Markov models in speech and language processing. Presented
at the.[Link]
6. Reynolds D (2009) Gaussian mixture models. In: Encyclopedia of biometrics, pp 659–663.
Springer US, Boston, MA. [Link]
7. Yunjiu L, Wei W, Zheng Y (2022) Artificial intelligence-generated and human expert-designed
vocabulary tests: a comparative study. SAGE Open 12:215824402210821. [Link]
1177/21582440221082130
8. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P,
Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A,
Ziegler D, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark
J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are
few-shot learners. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances
in neural information processing systems, pp 1877–1901. Curran Associates, Inc.
9. Ramesh A, Dhariwal P, Nichol A, Chu C, Chen M (2022) Hierarchical text-conditional image
generation with CLIP Latents
10. Rae JW, Borgeaud S, Cai T, Millican K, Hoffmann J, Song F, Aslanides J, Henderson S, Ring
R, Young S, Rutherford E, Hennigan T, Menick J, Cassirer A, Powell R, Driessche G van
den, Hendricks LA, Rauh M, Huang P-S Glaese A, Welbl J, Dathathri S, Huang S, Uesato J,
Mellor J, Higgins I, Creswell A, McAleese N, Wu A, Elsen E, Jayakumar S, Buchatskaya E,
Budden D, Sutherland E, Simonyan K, Paganini M, Sifre L, Martens L, Li X.L, Kuncoro A,
Nematzadeh A, Gribovskaya E, Donato D, Lazaridou A, Mensch A, Lespiau J-B, Tsimpoukelli
M, Grigorev N, Fritz D, Sottiaux T, Pajarskas M, Pohlen T, Gong Z, Toyama D, d’Autume
C de M, Li Y, Terzi T, Mikulik V, Babuschkin I, Clark A, Casas D de Las, Guy A, Jones C,
Bradbury J, Johnson M, Hechtman B, Weidinger L, Gabriel I, Isaac W, Lockhart E, Osindero
S, Rimell L, Dyer C, Vinyals O, Ayoub K, Stanway J, Bennett L, Hassabis D, Kavukcuoglu K,
Irving, G (2021) Scaling language models: methods, analysis & insights from training gopher
11. Vaswani A, Brain G, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł,
Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
12. Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, Küttler H, Lewis M, Yih W, Rock-
täschel T, Riedel S, Kiela D (2020) Retrieval-augmented generation for knowledge-intensive
NLP tasks
13. Brarda S, Yeres P, Bowman SR (2017) Sequential attention: a context-aware alignment function
for machine reading, 75–80. [Link]
14. Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y (2019) BERTScore: evaluating text
generation with BERT
15. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: a
systematic survey of prompting methods in natural language processing. ACM Comput Surv
55:1–35. [Link]
16. Senter R, Smith EA (1967) Automated readability index. Technical report, DTIC document
17. [Link]
18. [Link]
19. Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M-A, Lacroix T, Rozière B, Goyal N,
Hambro E, Azhar F, Rodriguez A, Joulin A, Grave E, Lample G (2023) LLaMA: open and
efficient foundation language models
20. Xu L, Xie H, Qin S-ZJ, Tao X, Wang FL (2023) Parameter-efficient fine-tuning methods for
pretrained language models: a critical review and assessment
21. Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W (2021) LoRA: low-rank
adaptation of large language models
22. Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L (2023) QLoRA: efficient fine tuning of
quantized LLMs
268 Tamanna et al.

23. Ouyang L, Wu J, Jiang, X, Almeida, D, Wainwright C, Mishkin P, Zhang C, Agarwal S,

Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder
P, Christiano PF, Leike J, Lowe R (2022) Training language models to follow instructions
with human feedback. In: Koyejo, S, Mohamed, S, Agarwal, A, Belgrave, D, Cho, K, and
Oh, A. (eds.) Advances in neural information processing systems, pp 27730–27744. Curran
Associates, Inc.
Chapter 21
Web Personalization with Large
Language Models: Challenges
and Future Trends

Nipun Bansal , Manju Bala, and Kapil Sharma

1 Introduction

Personalization is the art of tailoring experiences to individual preferences and

bridges the gap between humans and machines in today’s technologically-driven
world. Web personalization recommender systems adapt to individual preferences,
enhancing user interactions across digital platforms like entertainment [1], e-
commerce [2], and job matching [3]. For instance, in movie recommendation plat-
forms like IMDB and Netflix, users receive suggestions based on their past inter-
actions and the content of movies, facilitating the discovery of new films matching
their interests. These systems rely on user-item interactions and associated textual
information (such as item descriptions, user profiles, and reviews) to predict the like-
lihood of a user liking a particular item [4]. They offer content recommendations
and customize user interfaces and communication styles. As artificial intelligence
progresses, personalization becomes more sophisticated, necessitating advanced
techniques to handle diverse user intents. The pursuit of improved personalization
stems from understanding users’ evolving needs. Hence, web personalization recom-
mendation systems seamlessly integrate human–machine interactions into daily life
for personalized experiences and grows exponentially as the technology evolves.
Deep Neural Networks (DNNs) [5, 6] have gained widespread adoption in
enhancing recommender systems, thanks to their exceptional representation learning

N. Bansal (B) · K. Sharma

Delhi Technological University, New Delhi, India
e-mail: nipunbansal@[Link]
K. Sharma
e-mail: kapil@[Link]
M. Bala
I.P. College for Women, Delhi University, New Delhi, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 269
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
270 N. Bansal et al.

capabilities. They excel in modeling user-item interactions through various architec-

tures. Recurrent Neural Networks (RNNs) are particularly effective for sequential
data, enabling the capture of high-order dependencies in user interaction sequences
[7, 8]. Graph Neural Networks (GNNs) have emerged as advanced techniques for
learning representations in graph-structured data, such as users’ online behaviors [9].
They effectively learn representations of users and items. Additionally, DNNs are
adept at encoding side information. For example, methods based on BERT extract and
utilize textual reviews from users, further enhancing the capabilities of recommender
systems [10]. However, several limitations exist. Firstly, traditional deep neural
network (DNN)-based models like Convolutional Neural Networks (CNN) and Long
Short-Term Memory (LSTM) networks, as well as pre-trained language models such
as BERT, have constraints in capturing comprehensive textual information about
users and items. Consequently, they exhibit inferior natural language understanding
capabilities, leading to less-than-optimal performance in various recommendation
scenarios. Secondly, many existing recommendation systems are tailored to specific
tasks and lack robust generalization abilities to handle unseen recommendation tasks
effectively. For instance, an algorithm trained on a user-item rating matrix to predict
movie ratings may struggle to provide top-K movie recommendations with explana-
tions. This limitation arises because these recommendation architectures are heavily
reliant on task-specific data and domain knowledge, making them less adaptable
to different recommendation scenarios. Thirdly, while DNN-based recommendation
methods excel in tasks requiring straightforward decisions such as rating predic-
tion and top-K recommendations, they encounter challenges in supporting complex,
multi-step decision-making processes. For instance, multi-step reasoning is essential
in trip planning recommendations, where the system needs to consider popular tourist
attractions based on the destination, arrange a suitable itinerary, and recommend a
travel plan tailored to specific user preferences like cost and time constraints.
Recently, Large Language Models (LLMs) with billions of parameters have shown
impressive generalization and reasoning abilities [11, 12], allowing them to adapt to
unseen tasks and domains without extensive fine-tuning. These transformer-based
models are pre-trained on extensive textual data from diverse sources, enabling
them to exhibit remarkable capabilities in language understanding and generation.
Various transformer architecture [13] based LLM types such as BERT (Bidirectional
Encoder Representations from Transformers) [14], GPT (Generative Pre-trained
Transformer), and T5 (Text-To-Text Transfer Transformer) [15] can be categorized
into three primary types: encoder-only models, decoder-only models, and encoder-
decoder models. Advanced techniques like in-context learning and prompting strate-
gies further enhance their performance, making LLMs promising candidates for
revolutionizing recommender systems (RecSys). LLMs play a significant role in
RecSys, particularly in predicting user ratings for items by analyzing past interac-
tions and preferences. This enhances recommendation accuracy [16, 17]. LLMs are
also utilized in sequential recommendations, such as TALLRec [18], M6-Rec [19],
PALR [20], and P5[21], which predict users’ next preferences based on interac-
tion sequences. Additionally, LLMs like ChatGPT are used for generating explain-
able recommendations, as seen in Chat-Rec, where ChatGPT provides transparent
21 Web Personalization with Large Language Models: Challenges … 271

reasoning behind suggestions, fostering user trust. Moreover, LLMs’ interactive

nature enhances the recommendation experience; for example, UniCRS [22] employs
a knowledge-enhanced prompt learning framework for conversation and recommen-
dation tasks. UniMIND [23] proposes a unified multi-task learning framework for
conversational recommender systems using prompt-based learning strategies. Addi-
tionally, exploring LLMs’ potential in graph learning, Chen et al. [11] introduce
pipelines like LLMs-as-Enhancers and LLMs-as-Predictors, offering insights into
designing LLMs for graph-based recommendations.
In this paper, we explore the various challenges and issues using LLM for person-
alization in various web applications such as fashion recommendation systems etc.
Concurrent to our study, various related surveys [24–28] overview the various training
techniques and learning goals in adapting language modeling paradigms for recom-
mender systems. Wu et al. [25] provide a comprehensive summary of LLMs in recom-
mender systems from both discriminative and generative perspectives. Discrimina-
tive models focus on predicting labels or ratings directly, while generative models
aim to generate recommendations based on user preferences and item characteristics.
This dual perspective allows for a thorough understanding of how LLMs can be lever-
aged across different recommendation paradigms, highlighting their versatility and
potential applications in various recommendation scenarios. Lin et al. [26] introduce
two distinct perspectives on adapting LLMs within recommender systems: where and
how. The “where” aspect likely explores the integration of LLMs within different
stages of the recommendation pipeline, such as pre-processing, feature extraction,
and post-processing. On the other hand, the “how” dimension likely delves into the
technical details of adapting LLMs, including model architectures, training strate-
gies, and optimization techniques. Fan et al. [27] summarizes LLMs for recommender
systems with a focus on pre-training, fine-tuning, and prompting approaches while
LaMP [28] presents an innovative benchmark designed for training and assessing
language models in generating personalized outputs tailored for information retrieval
systems.

2 Applications and Challenges

Today Large Language Models (LLMs) have emerged as a leading technology in the
field of artificial intelligence. Their adeptness in understanding and generating human
language, along with their strong capabilities in generalization and reasoning, has
greatly enhanced their performance. Additionally, their ability to adapt to new tasks
and domains highlights their versatility. Consequently, there is a growing interest in
utilizing LLMs to transform recommender systems, with the goal of providing users
with personalized and high-quality recommendations. Given the rapid progress in
this area, this paper examines the current landscape of LLM-powered recommender
systems in various domains.
272 N. Bansal et al.

2.1 Knowledge Graphs

Knowledge graphs, with nodes as entities and edges as relations, enhance recom-
mender performance as side information and act as the common format of knowledge
bases. LLMs have demonstrated remarkable proficiency in retrieving factual knowl-
edge, akin to explicit knowledge bases [29–37]. This ability opens up avenues for
constructing more comprehensive knowledge graphs within recommender systems.
Existing research [29] underscores how LLMs excel in storing not only factual infor-
mation but also common-sense knowledge, which can then be effectively applied to
downstream tasks. However, despite the promise offered by LLMs, existing methods
[38, 39] in knowledge graph construction face challenges in handling incomplete
knowledge graphs and integrating textual corpus data. Researchers [40, 41] have
commenced exploring how LLMs can address these challenges, particularly through
knowledge completion and construction tasks. In the realm of knowledge graph
completion, efforts are directed towards leveraging LLM models such as MTL-KGC
[42], MEMKGC [43], StAR [44], GenKGC [45], TagReal [46], and AutoKG [47]
to encode text or generate missing facts within knowledge graphs. This involves
enhancing the completeness of knowledge graphs by inferring and adding missing
information. On the other hand, knowledge graph construction involves the struc-
tured representation of knowledge, including entity discovery [48, 49], coreference
resolution [50, 51], and relation extraction [47, 52]. LLMs offer potential solutions
for each of these subtasks, allowing for more accurate and comprehensive knowledge
graph construction. Moreover, LLMs show promise in enabling end-to-end construc-
tion [53, 54] wherein they directly build knowledge graphs from raw text data. This
holistic approach streamlines the process of knowledge graph construction, elimi-
nating the need for intermediate steps and enhancing efficiency. However, LLMs due
to their inherent nature may introduce ambiguity or inaccurate information can mani-
fest as extraneous information or noise in the recommendation process [42], leading
to responses that lack informative context or relevance, despite being syntactically
correct.

2.2 Direct Recommendations and Automated Reasoning

The advancement of large language models (LLMs) has revealed their remarkable
reasoning abilities when sufficiently scaled, akin to human intelligence in decision-
making and problem-solving [55]. Through techniques like “chain of thoughts”
prompting, LLMs can exhibit emergent reasoning skills, enabling them to draw
conclusions based on evidence or logic [56]. In the realm of recommender systems,
these reasoning capabilities empower LLMs to enhance user interest mining, thereby
improving overall performance. Additionally, LLMs demonstrate “step by step”
reasoning, leveraging prompts that include intermediate steps to tackle complex
21 Web Personalization with Large Language Models: Challenges … 273

tasks effectively [56]. For instance, Wang and Lim [57] introduce NIR, a three-
step prompt designed to capture user preferences, filter items, and re-rank recom-
mendations. Moreover, Automated Machine Learning (AutoML) is increasingly
utilized in recommender systems to streamline manual setup processes, particularly
in optimizing embedding sizes [43–46] and other facets like feature selection and
model architecture. However following challenges needs to addressed effectively
by LLMs for significant improvements in automated learning approaches effective
recommendation algorithms and systems:
• Complex Search Space: the search space within recommender systems is notably
complex, encompassing diverse types and facing volume issues, making effective
exploration and optimization challenging.
• Lack of Foundation: Unlike other domains with well-established network struc-
tures, recommender systems lack a strong foundation of knowledge about the
informative components within the search space, particularly regarding effective
high-order feature interactions. Further, This knowledge gap is compounded by
the diverse and domain-specific nature of recommender systems, operating across
various scenarios.

2.3 Conversational Recommender Systems (CRS)

CRS personalize recommendations through dialogue, allowing real-time adjust-

ments based on user feedback. CRS mainly comprises dialogue and recommendation
modules; where the dialogue module is crucial for effective user-system interactions
and preference understanding. The dialogue system is further classified into chit-
chat and task-oriented categories. Task-oriented dialogue systems are more popular
as they assist users to accomplish specific tasks. The two most common approaches
used are response generation pipelines and end-to-end methods [58, 59]. The former
one generates responses in four components namely, dialogue understanding [60,
61], dialogue state tracking [62], dialogue policy learning[63] and natural language
generation [64, 65] but however suffers from scalability issues and lacks sync between
the different components. The latter one handles all the processing steps collectively
using an encoder-decoder model but requires substantial supervised data for training.
The LLMs such as Google’s BARD, META’s LLaMA have demonstrated remarkable
performance in conversational abilities. However, the first most prominent challenge
faced is regarding the private domain data as these LLM models predominantly rely
on publicly available internet sources for training data, resulting in a potential blind
spot concerning data within closed information platforms. Further, a massive high
quality dialogue data is required to train such complex models. And it is difficult to
generate such high quality dialogue data as the available dataset mainly comprises
explicit or implicit interactions between users and items and lacks conversational
context.
274 N. Bansal et al.

2.4 Web Personalized Content Creator

Traditional recommender systems have historically operated on the basis of

suggesting pre-existing items to users, leveraging their preferences and past interac-
tions. However, as technology evolves and content creation platforms become more
sophisticated, there has been a notable shift towards personalized content creation.
This personalized approach entails generating content tailored to individual user
interests and preferences, particularly prevalent in online advertising contexts. Such
content [66, 67] encompasses a variety of visual and semantic elements, including
titles, abstracts, descriptions, copywriting, ad banners, thumbnails, and videos. Text
ad generation, in particular, has garnered significant attention, focusing on crafting
personalized ad titles and descriptions. While early approaches [68–70] relied on
predefined templates to streamline the process, they often fell short of fully satis-
fying user preferences. More recent advancements [71–74] have seen the emergence
of data-driven methods, integrating user feedback within reinforcement learning
frameworks to guide the content generation process. Moreover, the incorporation
of pre-trained language models has significantly enhanced content generation capa-
bilities across various content types [75–77]. By leveraging these language models,
content generation models can be refined to better align with user preferences, thereby
improving overall effectiveness in delivering personalized content experiences.

2.5 Automated Machine Learning

Automated Machine Learning (AutoML) endeavors to streamline the intricate

machinery of machine learning, rendering it more user-friendly and approachable
to individuals lacking profound proficiency in data science or machine learning. By
mechanizing the laborious and protracted procedures entailed in model construction,
AutoML liberates users to dedicate their attention to loftier pursuits, including the
delineation of problems, domain acumen, and the elucidation of outcomes.
The emergent LLMs like GPT-NAS [78], NAS-Bench-101 [79], GENIUS [80],
LLMatic [81], and EvoPrompting [82] leverage the generative prowess inherent in
LLMs. These models showcase remarkable aptitude in memorization and reasoning,
which are pivotal for automated learning endeavors. However, there are various chal-
lenges associated. Primarily, the exploration of the search space within recommender
systems proves to be notably intricate, incorporating an array of diverse dimensions
and grappling with substantial volume concerns. This intricacy presents a formidable
challenge in effectively navigating and optimizing the search landscape. Secondly, in
contrast to conventional architecture searches in alternative domains, recommender
systems lack a robust foundation of knowledge concerning the informative elements
inherent within the search space, particularly regarding the efficacious high-order
feature interactions. Unlike the well-established network structures prevalent in other
spheres, recommender systems operate amidst a multitude of domains and scenarios,
21 Web Personalization with Large Language Models: Challenges … 275

thereby engendering an assortment of domain-specific constituents. Tackling these

obstacles and enhancing comprehension of the search space and informative elements
within recommender systems are crucial endeavors that will facilitate significant
advancements in automated learning methodologies.

2.6 LLMs for Online Fashion Recommendation

Fashion-centric recommender systems encounter distinctive hurdles owing to the

volatile market dynamics and heterogeneous consumer preferences. Conventional
recommendation models [83] face setbacks due to sparse transaction records and
inadequate product details. Recent strides in computer vision [84] are being harnessed
to surmount these obstacles, facilitating tasks such as classifying fashion items, ascer-
taining attributes, and gauging similarity. Furthermore, investigations [85–87] delve
into amalgamating apparel into ensembles and grasping fashion trends via datasets
derived from professional imagery and social media platforms. A pivotal concern lies
in prognosticating garment fit, a paramount consideration for e-commerce and retail
sectors, necessitating the utilization of purchase history, user measurements, and
cutting-edge imaging technologies. Although certain retailers offer bespoke garment
selections for trial at home, scalability is impeded by challenges like the cold start
problem and customer engagement [88].
Recent progressions in fashion recommendation systems have embraced the inte-
gration of GAN-driven models, demonstrating auspicious capabilities in both fashion
creation and recommendation endeavors. Kumar and Gupta [89] employ a refined
conditional GAN to produce harmonious fashion articles based on top items provided
as input, harnessing the generative prowess of GANs to fabricate lifelike products.
Huynh et al. [90] introduce CRAFT, a system that advocates complementary fashion
items through an adversarial process akin to a GAN, utilizing feature transformations
rather than direct image inputs, thereby facilitating a deeper comprehension of item
correlations. DVBPR [91] pioneers the utilization of GANs for visual recommen-
dation, fabricating clothing images harmonized with user preferences by iteratively
optimizing user contentment and image authenticity. Shih et al. [92] posit MrCGAN,
a compatibility learning framework employing metric-regulated conditional GANs
to fabricate compatible fashion items, validated through user assessments. Yang
et al. [93] further elevate fashion recommendation systems by amalgamating BPR-
driven recommendation with GAN-driven item generation, yielding complementary
items based on user preferences and appraising their congruency. These advance-
ments underscore the transformative potential of GAN-based models in fashion
recommendation systems, furnishing personalized and visually captivating recom-
mendations customized to individual inclinations. Thereby, Gen AI and LLM based
models [94–96] exhibit capabilities in both unimodal and multi-modal operations. In
uni-model, these models demonstrate proficiency in generating outfit recommenda-
tions by comprehending the nuanced compatibility among diverse fashion articles.
In multi-modal scenarios, these models amalgamate textual and visual data, thereby
276 N. Bansal et al.

presenting users with more personalized and tailored choices. For instance, users may
actively engage in the decision-making process by inputting both textual descriptions
and visual preferences into the system, enhancing the granularity and specificity of
recommendations.

2.7 LLMs for Software Engineering

The transformative impact of Large Language Models (LLMs) extends across

diverse domains, notably in Software Engineering (SE), where recent scholarly
endeavors have scrutinized their applications. Yet, a comprehensive comprehen-
sion of LLMs’ influence, implications, and potential constraints within SE remains
incipient. To address this lacuna, a meticulous examination, termed LLM4SE [97],
was conducted, aiming to elucidate how LLMs could optimize SE processes and
outcomes. Analysing 229 research papers spanning from 2017 to 2023, the inves-
tigation was structured around four pivotal research inquiries (RQs). In tackling
RQ1, various LLMs enlisted for SE undertakings were systematically categorized,
delineating their unique attributes and utilization patterns. This scrutiny revealed a
diverse array of over 50 LLMs, classified into encoder-only, encoder-decoder, and
decoder-only architectures based on underlying principles. RQ2 delved into method-
ologies employed in data collection, pre-processing, and application, emphasizing
the pivotal role of meticulously curated datasets in facilitating successful LLM imple-
mentation for SE tasks. Notably, the prevalence of open-source datasets, constituting
approximately 59.35% of the analysed corpus, underscored their significance in this
context. Furthermore, the adeptness of LLMs in handling text and code-based data
types was highlighted, leveraging their innate natural language processing capa-
bilities. Addressing RQ3, strategies to optimize and evaluate LLM performance in
SE were scrutinized. Noteworthy findings included the predominance of fine-tuning
and Adam optimization techniques, along with the efficacy of prompt engineering
paradigms in enhancing LLM performance, particularly in tasks marked by data
scarcity. Lastly, RQ4 elucidated specific SE tasks where LLMs have demonstrated
efficacy, with software development emerging as a focal area, encompassing diverse
activities such as code generation and program repair. The identification of 55 SE
tasks, with software development being the most prevalent, underscored the pervasive
utility of LLMs in this domain.
The generalizability of Large Language Models (LLMs) poses a substantial
challenge, encompassing their ability to consistently and accurately execute tasks
across diverse datasets, domains, and contexts beyond their training parameters.
This hurdle becomes pronounced in Software Engineering (SE), where the semantic
nuances of code or documents can vary significantly across projects, languages,
or domains. Mitigating this challenge necessitates meticulous fine-tuning, valida-
tion across varied datasets, and ongoing refinement to prevent models from exces-
sively adapting to their training data, thereby curtailing their applicability in real-
world scenarios. Furthermore, evaluating LLMs in the SE domain presents its own
21 Web Personalization with Large Language Models: Challenges … 277

set of challenges. While conventional metrics like Accuracy, Recall, or F1-score

offer quantitative insights, they often fall short in capturing qualitative aspects
such as interpretability, robustness, and sensitivity to errors. Customized evalua-
tion metrics tailored to specific SE tasks are increasingly recognized as indispens-
able for providing a comprehensive assessment of model performance, highlighting
the limitations of traditional evaluation approaches. Interpretability and trustworthi-
ness emerge as paramount concerns in leveraging LLMs for SE tasks, given their
opaque decision-making processes. The opaqueness inherent in their “black-box”
nature raises apprehensions among developers, particularly regarding issues like
security vulnerabilities in code generation. Addressing these concerns demands the
development of methodologies and tools that illuminate the internal workings of
LLMs, fostering transparency and bolstering confidence in their outputs. Moreover,
the proprietary nature of many LLMs raises questions regarding data ownership,
quality, and representativeness, further compounded by the potential for adversarial
attacks aimed at exploiting vulnerabilities in automatically generated code snippets.
As such, ensuring ethical usage and establishing mechanisms for transparency and
accountability are imperative steps towards fostering trust and promoting the respon-
sible adoption of LLMs in SE, ultimately facilitating more efficient and reliable
development practices.

3 Future Outlook

Personalization services, notably those incorporating recommender systems, repre-

sent intricate industrial offerings fraught with myriad hurdles during real-world
deployment. Herein, we shall delineate the primary challenges encountered for future
endeavors. Efficient response times are vital for online services, encompassing both
large language model inference and managing concurrent user requests. The deploy-
ment of such models introduces latency challenges, mitigated through techniques
like pre-computing intermediate outputs or distillation. However, labor-intensive
data collection poses another hurdle, particularly for personalized tasks requiring
domain-specific data. Challenges also arise in long text modeling due to token limits,
prompting strategies such as prioritizing relevant information or compressing input.
Additionally, interpreting large language models presents complexity, necessitating
methods for transparency and fairness to build user trust. Evaluating system health
and user-centric metrics become imperative, focusing on satisfaction and diversity.
Yet, a trade-off exists between system performance and honesty, emphasizing the
need to ensure accuracy, privacy, and ethical considerations in recommendation
systems. Addressing these issues is essential to harness the benefits of large language
models while upholding user trust and fairness.
278 N. Bansal et al.

4 Conclusion

Personalization in large language models (LLMs) for user services poses several
critical challenges that need to be addressed. Firstly, achieving effective person-
alization requires a deep understanding of user preferences, which often involves
domain-specific knowledge beyond the general knowledge acquired by LLMs
through training. This suggests that adapting LLMs to effectively cater to personal-
ized services remains a significant unresolved issue. Moreover, there are concerns
regarding privacy when using LLMs for personalization. Since LLMs have the
capacity to memorize users’ confidential information to provide personalized
services, there’s a legitimate worry about safeguarding user privacy. Ensuring that
LLMs maintain privacy while delivering personalized experiences is crucial for
building trust with users. Additionally, LLMs trained on internet data are suscep-
tible to exposure bias, leading to potentially unfair predictions for minority groups.
This highlights the importance of mitigating biases in LLMs to ensure fair and
equitable outcomes for all users. To tackle these challenges, the research commu-
nity needs comprehensive benchmarks and evaluation datasets. However, the current
availability of such resources is limited, indicating a gap that needs to be addressed
through collaborative efforts within the research community. Furthermore, to fully
harness the potential of LLMs for personalization, it’s essential to establish systematic
methodological and experimental frameworks. These frameworks should encom-
pass various perspectives, including understanding user preferences, addressing
privacy concerns, mitigating biases, and evaluating model performance accurately.
In summary, addressing the challenges associated with personalization using LLMs
requires a multidimensional approach involving domain-specific knowledge, privacy
protection measures, bias mitigation strategies, and the development of robust evalu-
ation frameworks. Collaboration within the research community is key to advancing
research in this area and realizing the full potential of personalized services powered
by LLMs.

References

1. Gao Y, Sheng T, Xiang Y, Xiong Y, Wang H, ZhangJ.(2023) Chat-rec: towards interactive and
explainable llms-augmented recommender system. arXiv preprint arXiv:2303.14524
2. Chen J, Ma L, Li X, Thakurdesai N, Xu J, Cho JH, ... Achan K (2023) Knowledge graph
completion models are few-shot learners: An empirical study of relation labeling in e-commerce
with llms. arXiv preprint arXiv:2305.09858
3. Chen X, Fan W, Chen J, Liu H, Liu Z, Zhang Z, Li Q (2023) Fairly adaptive negative sampling
for recommendations. In: Proceedings of the ACM web conference 2023, pp 3723–3733
4. Fan W, Zhao X, Chen X, Su J, Gao J, Wang L, ... Li Q (2022) A comprehensive survey on
trustworthy recommender systems. arXiv preprint arXiv:2209.10117
5. Zhang S, Yao L, Sun A, Tay Y (2019) Deep learning based recommender system: a survey and
new perspectives. ACM Comput Surv (CSUR) 52(1):1–38
6. Fan W, Liu C, Liu Y, Li J, Li H, Liu H, ... Li Q (2023) Generative diffusion models on graphs:
methods and applications. arXiv preprint arXiv:2302.02591
21 Web Personalization with Large Language Models: Challenges … 279

7. Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2015) Session based recommendations with

recurrent neural networks. arXiv preprint arXiv:1511.06939
8. Fan W, Ma Y, Yin D, Wang J, Tang J, Li Q (2019) Deep social collaborative filtering. In:
Proceedings of the 13th ACM conference on recommender systems, pp 305–313
9. Fan W, Ma Y, Li Q, He Y, Zhao E, Tang J, Yin D (2019) Graph neural networks for social
recommendation. In: The world wide web conference, pp 417–426
10. Qiu Z, Wu X, Gao J, Fan W (2021) U-bert: Pre-training user representations for improved
recommendation. Proc AAAI Conf Artif Intell 35(5):4320–4327
11. Chen Z, Mao H, Li H, Jin W, Wen H, Wei X, ... Liu H (2023) Exploring the potential of large
language models (llms) in learning on graphs. arXiv preprint arXiv:2307.03393
12. Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, ... Zhang Z (2023) A survey of large language
models. arXiv preprint arXiv:2303.18223
13. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, ... Polosukhin I (2017)
Attention is all you need. Adv Neural Inform Process Syst, 30
14. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805
15. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, ... Liu PJ (2020) Exploring the
limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485–
5551
16. Kang W-C, Ni J, Mehta N, Sathiamoorthy M, Hong L, Chi E, Cheng DZ (2023) Do llms
understand user preferences? evaluating llms on user rating prediction. arXiv preprint arXiv:
2305.06474
17. Zhiyuli A, Chen Y, Zhang X, Liang X (2023) Bookgpt: a general framework for book
recommendation empowered by a large language model. arXiv preprint arXiv:2305.15673
18. Bao K, Zhang J, Zhang Y, Wang W, Feng F, He X (2023) Tallrec: An effective and efficient
tuning framework to align large language models with recommendations. arXiv preprint arXiv:
2305.00447
19. Cui Z, Ma J, Zhou C, Zhou J, Yang H (2022) M6-rec: Generative pretrained language models
are open-ended recommender systems. arXiv preprint arXiv:2205.08084
20. Chen Z (2023) Palr: personalization aware llms for recommendation. arXiv preprint arXiv:
2305.07622.
21. Geng S, Liu S, Fu Z, Ge Y, Zhang Y (2022) Recommendation as language processing (rlp):
A unified pretrain, personalized prompt & predict paradigm (p5). In: Proceedings of the 16th
ACM conference on recommender systems, pp 299–315
22. Wang X, Zhou K, Wen J-R, Zhao WX (2022) Towards unified conversational recommender
systems via knowledge-enhanced prompt learning. In: Proceedings of the 28th ACM SIGKDD
conference on knowledge discovery and data mining, pp 1929–1937
23. Deng Y, Zhang W, Xu W, Lei W, Chua T-S, Lam W (2023) A unified multi-task learning
framework for multi-goal conversational recommender systems. ACM Trans Inform Syst
41(3):1–25
24. Liu P, Zhang L, Gulla JA (2023) Pre-train, prompt and recommendation: a comprehensive
survey of language modeling paradigm adaptations in recommender systems. arXiv preprint
arXiv:2302.03735
25. Wu L, Zheng Z, Qiu Z, Wang H, Gu H, Shen T, ... Liu Q (2023) A survey on large language
models for recommendation. arXiv preprint arXiv:2305.19860
26. Lin J, Dai X, Xi Y, Liu W, Chen B, Li X, ... Tang R (2023). How can recommender systems
benefit from large language models: a survey. arXiv preprint arXiv:2306.05817
27. Fan W, Zhao Z, Li J, Liu Y, Mei X, Wang Y, ... Li Q (2023) Recommender systems in the era
of large language models (llms). CoRR, abs/2307.02046
28. Salemi A, Mysore S, Bendersky M, Zamani H (2023) Lamp: When large language models
meet personalization. arXiv preprint arXiv:2304.11406
29. Petroni F, Rocktäschel T, Lewis P, Bakhtin A, Wu Y, Miller AH, Riedel S (2019) Language
models as knowledge bases? arXiv preprint arXiv:1909.01066
280 N. Bansal et al.

30. Roberts A, Raffel C, Shazeer N (2020) How much knowledge can you pack into the parameters
of a language model? arXiv preprint arXiv:2002.08910
31. Petroni F, Lewis P, Piktus A, Rocktäschel T, Wu Y, Miller AH, Riedel S (2020) How context
affects language models’ factual predictions. In: Automated knowledge base construction
32. Jiang Z, Xu FF, Araki J, Neubig G (2020) How can we know what language models know?
Trans Assoc Comput Linguist 8:423–438
33. Wang C, Liu X, Song D (2020) Language models are open knowledge graphs. arXiv preprint
arXiv:2010.11967
34. Poerner N, Waltinger U, Schütze H (2020) E-bert: efficient-yet-effective entity embeddings for
bert. In: Findings of the association for computational linguistics: EMNLP 2020, pp 803–818
35. Heinzerling B, Inui K (2021) Language models as knowledge bases: On entity representa-
tions, storage capacity, and paraphrased queries. In: Proceedings of the 16th conference of the
european chapter of the association for computational linguistics: Main Volume (pp 1772–1791)
36. Wang C, Liu P, Zhang Y (2021) Can generative pre-trained language models serve as knowl-
edge bases for closed-book qa? In: Proceedings of the 59th annual meeting of the association
for computational linguistics and the 11th international joint conference on natural language
processing (Volume 1: Long Papers) (pp 3241–3251)
37. Guu K, Lee K, Tung Z, Pasupat P, Chang M (2020) Retrieval augmented language model
pre-training. In: the International conference on machine learning, pp 3929–3938
38. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings
for modeling multi relational data. Advances in neural information processing systems, 26
39. Zhu Y, Wang X, Chen J, Qiao S, Ou Y, Yao Y, ... Zhang N (2023) Llms for knowledge graph
construction and reasoning: Recent capabilities and future opportunities. arXiv preprint arXiv:
2305.13168
40. Zhang Z, Liu X, Zhang Y, Su Q, Sun X, He B (2020) Pretrainkge: Learning knowledge repre-
sentation from pretrained language models. In: Findings of the association for computational
linguistics: EMNLP 2020 (pp 259–266)
41. Kumar A, Pandey A, Gadia R, Mishra M (2020) Building a knowledge graph using a pre-
trained language model for learning entity-aware relationships. In: 2020 IEEE international
conference on computing, power and communication technologies (GUCON) (pp 310–315).
IEEE
42. Razniewski S, Yates A, Kassner N, Weikum G (2021) Language models as or for knowledge
bases. arXiv preprint arXiv:2110.04888
43. Liu S, Gao C, Chen Y, Jin D, Li Y (2021) Learnable embedding sizes for recommender systems.
arXiv preprint arXiv:2101.07577
44. Liu H, Zhao X, Wang C, Liu X, Tang J (2020) Automated embedding size search in deep
recommender systems. In: Proceedings of the 43rd International ACM SIGIR conference on
research and development in information retrieval (pp 2307–2316)
45. Deng W, Pan J, Zhou T, Kong D, Flores A, Lin G (2021). Deeplight: deep lightweight feature
interactions for accelerating ctr predictions in ad serving. In: Proceedings of the 14th ACM
international conference on Web search and data mining (pp 922–930)
46. Ginart AA, Naumov M, Mudigere D, Yang J, Zou J (2021) Mixed dimension embeddings
with application to memory efficient recommendation systems. In: 2021 IEEE International
symposium on information theory (ISIT) (pp 2786–2791). IEEE
47. Wang H, Focke C, Sylvester R, Mishra N, Wang W (2019) Fine-tune bert for docred with
two-step process. arXiv preprint arXiv:1909.11898
48. Yan H, Gui T, Dai J, Guo Q, Zhang Z, Qiu X (2021) A unified generative framework for various
ner subtasks. In: Proceedings of the 59th annual meeting of the association for computational
linguistics and the 11th international joint conference on natural language processing (Volume
1: Long Papers) (pp 5808–5822)
49. Li B, Yin W, Chen M (2022) Ultra-fine entity typing with indirect supervision from natural
language inference. Trans Assoc Comput Linguist 10:607–622
50. Kirstain Y, Ram O, Levy O (2021) Coreference resolution without span representations. In:
Proceedings of the 59th annual meeting of the association for computational linguistics and the
21 Web Personalization with Large Language Models: Challenges … 281

11th international joint conference on natural language processing (Volume 2: Short Papers)
(pp 14–19)
51. Cattan A, Eirew A, Stanovsky G, Joshi M, Dagan I (2021) Cross-document coreference reso-
lution over predicted mentions. In: Findings of the association for computational linguistics:
ACL-IJCNLP 2021 (pp 5100–5107)
52. Lyu S, Chen H (2021) Relation classification with entity type restriction. In: Findings of the
association for computational linguistics: ACL-IJCNLP 2021 (pp 390–395)
53. Han J, Collier N, Buntine W, Shareghi E (2023) Pive: prompting with iterative verification
improving graph-based generative capability of llms. arXiv preprint arXiv:2305.12392
54. Trajanoska M, Stojanov R, Trajanov D (2023) Enhancing knowledge graph construction using
large language models. arXiv preprint arXiv:2305.04676
55. Wei J, Tay Y, Bommasani R, Raffel C, Zoph B, Borgeaud S, ... Zhou D (2022) Emergent
abilities of large language models. arXiv preprint arXiv:2206.07682
56. Wei J, Wang X, Schuurmans D, Bosma M, Chi E, Le Q, Zhou D (2022) Chain of thought
prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903
57. Wang L, Lim E-P (2023) Zero-shot next-item recommendation using large pretrained language
models. arXiv preprint arXiv:2304.03153
58. Wen T-H, Vandyke D, Mrksic N, Gasic M, Rojas-Barahona LM, Su P-H, Ultes S, Young S
(2016) A network-based end-to-end trainable task-oriented dialogue system. arXiv preprint
arXiv:1604.04562
59. Zhang Y, Sun S, Galley M, Chen Y-C, Brockett C, Gao X, ... Dolan B (2019) Dialogpt: Large-
scale generative pre training for conversational response generation. arXiv preprint arXiv:1911.
00536
60. Yao K, Zweig G, Hwang M-Y, Shi Y, Yu D (2013) Recurrent neural networks for language
understanding. In Interspeech (pp 2524–2528)
61. Mesnil G, He X, Deng L, Bengio Y (2013) Investigation of recurrent-neural-network
architectures and learning methods for spoken language understanding. In: Interspeech (pp
3771–3775)
62. Mrkšić N, Séaghdha DO, Wen T-H, Thomson B, Young S (2016) Neural belief tracker: Data-
driven dialogue state tracking. arXiv preprint arXiv:1606.03777
63. Cuayáhuitl H, Keizer S, Lemon O (2015) Strategic dialogue management via deep reinforce-
ment learning. arXiv preprint arXiv:1511.08099
64. Zhou H, Huang M, Zhu X (2016) Context-aware natural language generation for spoken
dialogue systems. In: Proceedings of COLING 2016, the 26th international conference on
computational linguistics: technical papers (pp 2032–2041)
65. Dušek O, Jurˇcíˇcek F (2016) Sequence-to-sequence generation for spoken dialogue via deep
syntax trees and strings. arXiv preprint arXiv:1606.05491
66. Zhang X, Zou Y, Zhang H, Zhou J, Diao S, Chen J, ... Xiao Y et al. (2022) Automatic product
copywriting for e-commerce. Proc AAAI Conf Artif Intell 36(11):12 423–12 431
67. Lei Z, Zhang C, Xu X, Wu W, Niu Z-Y, Wu H, ... Li S (2022) Plato-ad: a unified advertise-
ment text generation framework with multi-task prompt learning. In: Proceedings of the 2022
conference on empirical methods in natural language processing: industry track (pp 512–520).
68. Thomaidou S, Lourentzou I, Katsivelis-Perakis P, Vazirgiannis M (2013) Automated snippet
generation for online advertising. In: Proceedings of the 22nd ACM international conference
on information & knowledge management (pp 1841–1844)
69. Bartz K, Barr C, Aijaz A (2008) Natural language generation for sponsored-search advertise-
ments. In: Proceedings of the 9th ACM conference on electronic commerce (pp 1–9)
70. Fujita A, Ikushima K, Sato S, Kamite R, Ishiyama K, Tamachi O (2010) Automatic generation
of listing ads by reusing promotional texts. In: Proceedings of the 12th international conference
on electronic commerce: roadmap for the future of electronic business (pp 179–188)
71. Hughes JW, Chang K-H, Zhang R (2019) Generating better search engine text advertisements
with deep reinforcement learning. In: Proceedings of the 25th ACM SIGKDD international
conference on knowledge discovery & data mining (pp 2269–2277)
282 N. Bansal et al.

72. Wang X, Gu X, Cao J, Zhao Z, Yan Y, Middha B, Xie X (2021) Reinforcing pretrained
models for generating attractive text advertisements. In: Proceedings of the 27th ACM SIGKDD
conference on knowledge discovery & data mining (pp 3697–3707)
73. Chen C, Wang X, Yi X, Wu F, Xie X, Yan R (2019) Personalized chit-chat generation for
recommendation using external chat corpora. In: Proceedings of the 28th ACM SIGKDD
conference on knowledge discovery and data mining (pp 2721–2731)
74. Kanungo YS, Negi S, Rajan A (2021) Ad headline generation using a self-critical masked
language model. In: Proceedings of the 2021 conference of the north American chapter of the
association for computational linguistics: human language technologies: industry papers (pp
263–271)
75. Wei P, Yang X, Liu S, Wang L, Zheng B (2022) Creater: Ctrdriven advertising text generation
with controlled pre-training and contrastive fine-tuning. arXiv preprint arXiv:2205.08943
76. Kanungo YS, Das G, Negi S (2022) Cobart: Controlled, optimized, bidirectional and auto-
regressive transformer for ad headline generation. In: Proceedings of the 28th ACM SIGKDD
conference on knowledge discovery and data mining (pp 3127–3136)
77. Chen Q, Lin J, Zhang Y, Yang H, Zhou J, Tang J (2019) Towards knowledge-based personalized
product description generation in e-commerce. In: Proceedings of the 25th ACM SIGKDD
international conference on knowledge discovery & data mining (pp 3040–3050)
78. Yu C, Liu X, Tang C, Feng W, Lv J (2023) Gpt-nas: Neural architecture search with the
generative pre-trained model. arXiv preprint arXiv:2305.05351
79. Ying C, Klein A, Christiansen E, Real E, Murphy K, Hutter F (2019) Nas-bench-101: Towards
reproducible neural architecture search. In: The international conference on machine learning
(pp 7105–7114)
80. Zheng M, Su X, You S, Wang F, Qian C, Xu C, Albanie S (2023) Can gpt-4 perform neural
architecture search? arXiv preprint arXiv:2304.10970
81. Nasir MU, Earle S, Togelius J, James S, Cleghorn C (2023) Llmatic: Neural architecture search
via large language models and quality-diversity optimization. arXiv preprint arXiv:2306.01102
82. Chen A, Dohan DM, So DR (2023) Evoprompting: Language models for code-level neural
architecture search. arXiv preprint arXiv:2302.14838
83. Koren Y, Bell RM, Volinsky C (2009) Matrix factorization techniques for recommender
systems. IEEE Comput 42(8):30–37
84. Ak KE, Kassim AA, Lim JH, Tham JY (2018) Learning attribute representations with local-
ization for flexible fashion search. In: Proceedings of the IEEE conference on computer vision
and pattern recognition, pp 7708–7717
85. Hsiao W-L, Grauman K (2018) Creating capsule wardrobes from fashion images. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, 7161–7170
86. Simo-Serra E, Fidler S, Moreno-Noguer F, Urtasun R (2015) Neuroaesthetics in fashion:
Modeling the perception of fashionability. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, 869–877
87. Vittayakorn S, Yamaguchi K, Berg AC, Berg TL (2015) Runway to realway: visual analysis of
fashion. In: 2015 IEEE winter conference on applications of computer vision. IEEE, 951–958
88. Zielnicki K (2019) Simulacra and selection: clothing set recommendation at stitch fix. In:
Proceedings of the 42nd International ACM SIGIR conference on research and development
in information retrieval, 1379–1380
89. Kumar S, Gupta MD (2019) c+GAN: complementary fashion item recommendation. KDD
’19, Workshop on AI for fashion, Anchorage, Alaska-USA
90. Huynh CP, Ciptadi A, Tyagi A, Agrawal A (2018) CRAFT: complementary Recommendation
by Adversarial Feature Transform. In: ECCV Workshops (3) (Lecture Notes in Computer
Science), 11131, 54–66. Springer
91. Kang W-C, Fang C, Wang Z, McAuley JJ (2017) Visually-aware fashion recommendation and
design with generative image models. In: 2017 IEEE international conference on data mining,
ICDM 2017, New Orleans, LA, USA, November 18–21, 2017, 207–216
92. Shih Y-S, Chang K-Y, Lin H-T, Sun M (2018) Compatibility family learning for item
recommendation and generation. In: AAAI. 2403–2410. AAAI Press
21 Web Personalization with Large Language Models: Challenges … 283

93. Yang Z, Su Z, Yang Y, Lin G (2018) From recommendation to generation: a novel fashion
clothing advising framework. 2018 7th International conference on digital home (ICDH), 1, 1,
180–186
94. Fan W, Zhao Z, Li J, Liu Y, Mei X, Wang Y, ... Li Q (2023) Recommender systems in the era
of large language models (llms). arXiv preprint arXiv:2307.02046
95. Hou Y, Zhang J, Lin Z, Lu H, Xie R, McAuley J, Zhao WX (2023) Large language models are
zero-shot rankers for recommender systems. arXiv preprint arXiv:2305.08845
96. Wang W, Lin X, Feng F, He X, Chua T-S (2023) Generative recommendation: towards next-
generation recommender paradigm. arXiv preprint arXiv:2304.03516
97. Hou X, Zhao Y, Liu Y, Yang Z, Wang K, Li L, ... Wang H (2024) Large language models for
software engineering: a systematic literature review. arXiv preprint arXiv:2308.10620
Chapter 22
Deep Learning-Based Gland
Segmentation for Enhanced Analysis of
Colon Histology Images

Ajay Kumar, Vivek Kumar, Jay Prakash Singh, and Ashok Patel

1 Introduction

The colon, being the longest segment of the large intestine, plays a pivotal role in
the digestive cascade. It functions as the recipient of partially digested food, aiding
in its subsequent processing and the efficient absorption of essential nutrients [1, 2].
After the absorption phase, the colon orchestrates the transport of waste materials
toward the rectum for eventual expulsion [3]. However, the onset of malignancies
within the colon can be ascribed to uncontrolled and aberrant cellular proliferation,
resulting in anomalous cell growth [4]. Oncogenic metamorphosis is predominantly
instigated by genomic alterations, commonly known as “gene mutations”, which,
also, act as pivotal drivers in the disease’s progression [5, 6]. These genetic aberra-
tions disrupt the innate cellular life cycle, enabling affected cells to evade apoptosis,
a process not observed in their healthy counterparts [6, 7]. Colorectal cancer may
manifest across various age groups, albeit it predominantly afflicts adults. With time,
cell clusters amass, forming polyps, minute growths within the colon. These polyps
evoke concern due to their typically asymptomatic nature, underscoring the impera-
tive for routine screening as a preventative measure and early detection to facilitate
more efficacious treatment modalities [8, 9]. Consequently, there is an urgent need
for comprehensive research and meticulous analysis in this field. Precise delineation

Ashok Patel contributed equally to this work.

A. Kumar (B) · V. Kumar · J. P. Singh

Department Computer Science and Engineering, Manipal University Jaipur, Dehmi Kalan, Jaipur
303007, Rajasthan, India
e-mail: ajay3789@[Link]
A. Patel
Department Computer Science and Engineering, VIT Bhopal University, Bhopal-Indore Highway,
Kothrikalan, Sehore 466114, Madhya, India
e-mail: [Link]@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 285
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
286 A. Kumar et al.

and segmentation of glandular structures in histopathological images are pivotal

for early disease detection, severity assessment, and formulating appropriate treat-
ment strategies [10–13]. Gland segmentation entails accurately identifying poten-
tially malignant structures within histopathological images, a critical endeavor in
medical diagnostics [12, 13]. These structures undergo detailed scrutiny, focusing
on factors such as spatial arrangements, dimensions, and morphology, contributing
to the assessment of polyp presence or absence.
Early detection and timely treatment planning are paramount in addressing this
disease. Hence, there is an imperative for precise glandular structure segmentation
from patient sample images [13]. However, achieving consensus among clinical
pathologists regarding cancer grading remains a challenge, highlighting the need
for modern computerized solutions [14]. Despite progress in research, developing a
model for precise gland segmentation in images remains relatively unexplored but
holds significant potential for advancement. This study aims to utilize deep learning
techniques, specifically the UNet model architecture, for image segmentation to
identify malignant tissue. Performance evaluation includes parameters like the Dice
coefficient, Intersection over Union (IOU), model performance, training epochs, and
a loss function based on the Dice coefficient.

1.1 Literature Review

Recent advancements in medical research have highlighted the significant role of

deep learning methodologies, particularly Convolutional Neural Networks (CNNs),
in medical image analysis. CNNs excel in processing visual data with remark-
able accuracy, extracting intricate features from complex medical images, thereby
enhancing their applicability in various medical domains [15, 16]. The integration
of CNNs and deep learning techniques, especially in colon gland segmentation,
shows promise for achieving precise glandular segmentation, enabling early detec-
tion of diseases such as colon cancer and other vital organ diseases [17, 18]. Tharwat
et al. [19] employed deep learning and histological image analysis to investigate
early detection and treatment of colon cancer, considering factors like symptoms,
grades, and imaging modalities. Evaluation of machine learning and deep learning
methods was performed using datasets like colorectal histology images (CCOHIS),
examining algorithms such as Support Vector Machines, Random Forests, and K-
nearest neighbors. Deep learning techniques like Convolutional Neural Networks
were scrutinized for identifying patterns associated with cancer progression stages,
contributing to early disease detection and diagnosis, potentially reducing mortal-
ity rates [20]. Alboaneen et al. [21] addressed an accurate gland segmentation in
histological images; researchers adopted a hybrid approach combining deep neural
networks with manually engineered features. They adopted the LinkNet architec-
ture, incorporating custom features such as the invariant local binary pattern (LBP)
to refine glandular segmentation quality. Experimental validation on the Warwick-
QU dataset demonstrated the system’s capability to differentiate between glands
22 Deep Learning-Based Gland Segmentation for Enhanced … 287

and identify malignant regions with promising results [19–22]. Another study eval-
uated artificial intelligence (AI) techniques for gland and nuclei segmentation in
histology images, analyzing 126 AI-based methodologies [23]. Traditional manual
feature extraction methods were compared with deep learning-based neural network
strategies. While acknowledging the effectiveness of R-CNN and its variants, their
limitations in histopathological contexts were highlighted, addressing challenges
such as data scarcity and color inconsistency. Promising avenues for future research
include techniques such as FCN-based Atrous spatial pyramid pooling and Encoder-
Decoder U-Nets, alongside addressing staining variations and limited training data
in deep learning models [19–23]. A novel deep learning framework, the Attention-
Guided Deep Atrous-Residual U-Net, was introduced for gland segmentation in
colon histopathology images, Leveraging Atrous-Residual units, attention units, and
transitional atrous units, the model addressed concerns like data unpredictability,
overfitting, and resolution degradation. Model evaluation using the GlaS challenge
dataset, CRAG, and a private hospital dataset (HosC) demonstrated promising results,
enhancing the accuracy of colon cancer diagnosis [24]. Transfer learning techniques
were employed to address limitations stemming from the scarcity of large datasets,
achieving remarkably high accuracy rates in colon cancer diagnosis [25].
This research presents a deep learning-based approach for precise glandular struc-
ture segmentation in colon histology images, utilizing the UNet model architec-
ture. The approach has been rigorously trained and tested using the Warwick-QU
dataset from the Gland Segmentation in Colon Histology Images challenge (GlaS).
The dataset comprises diverse samples, including Hematoxylin and Eosin (H&E)
stained tissue slide images, with corresponding ground truth annotations meticu-
lously provided by expert pathologists [26]. The study’s central focus is evaluating the
UNet architecture’s efficacy in image segmentation, particularly its feature-capturing
capabilities.

2 Materials and Method

The adopted approaches comprise data collection, preprocessing, model develop-

ment, and data visualization, as depicted in Fig. 1.

Fig. 1 Adopted layout of proposed methodology

288 A. Kumar et al.

2.1 Dataset

The dataset employed in this research is derived from the Warwick-QU dataset,
which is part of the Gland Segmentation in Colon Histology Images (GlaS) chal-
lenge ([Link] This dataset comprises
165 images, each in BMP format, extracted from 16 histological sections stained
with Hematoxylin and Eosin (H&E). These sections specifically relate to cases of
colorectal adenocarcinoma classified as stage T3 or T4. Importantly, the images rep-
resent individual patient samples. Moreover, each image in this dataset is paired with
a single ground truth object representing a label, as shown in Fig. 2.

2.2 Method

Comprehensive procedure for developing a medical image segmentation model

revolves around ten well-defined steps, driven by the renowned UNet architecture
(Fig. 3) and supported by the Warwick - QU dataset from the GlaS challenge. Pre-
liminary stages positively modified the UNet model, well-regarded for its aptness in
medical image segmentation. This is followed by rigorous data pre-processing and

Fig. 2 Illustration of dataset a consists of histopathological images, and b masks images

Fig. 3 Flowchart of UNet model architectures

22 Deep Learning-Based Gland Segmentation for Enhanced … 289

loading, which standardizes image and mask dimensions to 256 × 256 pixels and
normalizes pixel values within the range [0, 1]. After that, various performance met-
rics such as Intersection over Union (IoU), Dice coefficient, model loss, Dice loss,
Recall, Precision, and Accuracy are employed to evaluate the model’s performance.
Our proposed model is then trained on the dataset containing patient samples
with functional use. A variety of callbacks, including model checkpointing, learn-
ing rate reduction, CSV logging, TensorBoard logging, and early stopping, are uti-
lized to monitor the training process effectively. These callbacks ensure the efficient
management of the training process and optimize model performance. Furthermore,
the model evaluation incorporates custom metrics, and post-processing of predicted
masks is conducted. Lastly, the images are segmented by the trained model for med-
ical image segmentation, allowing for a comprehensive performance assessment by
comparing predicted masks with ground truth masks alongside the original images.
This systematic approach ensures that with each stage, there is progress in the devel-
opment of the model, with a particular emphasis on medical image analysis, as shown
in Fig. 4.

3 Results and Discussion

Deep learning algorithms, specifically utilizing the UNet model architecture and
CNNs, are successfully employed for gland segmentation in colon histology images.
Through a comprehensive evaluation utilizing diverse metrics, this method demon-
strates remarkable performance (training accuracy and loss obtained like 98% and
0.97% respectively), as shown in Fig. 5. The segmentation results exhibit (Fig. 5b
depicted 4 number of epochs at Dice Coefficient reached 0.024, and IOU reached
0.011) indicating the model’s capability to precisely identify glandular structures in
histological images, also shown in Fig. 6. The UNet-based approach adeptly captured
intricate details of gland boundaries, even in challenging images with overlapping
structures.

Fig. 4 Diagrams illustrating a–b the processes involving predicting a segmentation mask for a test
image and subsequently post-processing the mask to visualize the segmented region
290 A. Kumar et al.

Fig. 5 Visualizing model performance: a training accuracy, loss, and b Dice coefficient assessment

Fig. 6 Diagrams a–b–c illustrating the original test image, the ground truth mask, and the predicted
mask
22 Deep Learning-Based Gland Segmentation for Enhanced … 291

4 Conclusion

The study highlights the efficacy of deep learning methodologies, specifically the
UNet model, within colon histology image analysis. The achieved precision in
gland segmentation offers substantial potential for enhancing diagnostic accuracy
and facilitating pathological research pertaining to colon diseases. Obtained results
underscore the transformative impact of deep learning techniques in medical image
analysis, heralding a new era of more precise and efficient diagnostic tools. As
we progress in this domain, the integration of deep learning approaches into med-
ical imaging workflows stands to deepen our comprehension of intricate diseases,
ultimately driving improved patient outcomes and propelling medical research and
diagnostics forward.

Declarations

The authors sincerely thank Department of Computer Science and Engineering,

Manipal University Jaipur, Rajasthan, India, for facilitating the computing facil-
ity to execute the work. The Tissue Image Analytics (TIA) Centre, based in the
Computer Science department at the University of Warwick, is also acknowledged
for providing the dataset used in current research.
• Funding: No
• Conflict of interest: The authors declare that they have no conflict of interest.
• Ethics approval: No
• Availability of data and materials: NA
• Code availability: NA
• Authors’ contributions: The authors contributed equally to this work.

References

1. Debas HT (2004) Small and large intestine. Gastrointest Surg Pathophysiol Manag 239–310
2. Nichols TW, Faass N (2005) Optimal digestive health: a complete guide
3. Gustafsson J (2012) Colonic barrier function in ulcerative colitis-interactions between ion and
mucus secretion
4. Petrova TV, Nykänen A, Norrmén C, Ivanov KI, Andersson LC, Haglund C, Puolakkainen
P, Wempe F, Melchner H, Gradwohl G et al (2008) Transcription factor prox1 induces colon
cancer progression by promoting the transition from benign to highly dysplastic phenotype.
Cancer Cell 13(5):407–419
5. Pandurangan A, Divya T, Kumar K, Dineshbabu V, Velavan B, Sudhandiran G (2018) Colorectal
carcinogenesis: insights into the cell death and signal transduction pathways: a review. World
J Gastrointest Oncol 10(9):244
6. Raulet DH, Guerra N (2009) Oncogenic stress sensed by the immune system: role of natural
killer cell receptors. Nat Rev Immunol 9(8):568–580
292 A. Kumar et al.

7. Martirosyan A, Moreno E, Gorvel J-P (2011) An evolutionary strategy for a stealthy

intracellular Brucella pathogen. Immunol Rev 240(1):211–234
8. Jones OT, Calanzani N, Saji S, Duffy SW, Emery J, Hamilton W, Singh H, Wit NJ, Walter FM
(2021) Artificial intelligence techniques that may be applied to primary care data to facilitate
earlier diagnosis of cancer: systematic review. J Med Internet Res 23(3):23483
9. Yip C-H, Smith RA, Anderson BO, Miller AB, Thomas DB, Ang E-S, Caffarella RS, Corbex M,
Kreps GL, McTiernan A et al (2008) Guideline implementation for breast healthcare in low-and
middle-income countries: early detection resource allocation. Cancer 113(S8):2244–2256
10. Salvi M, Bosco M, Molinaro L, Gambella A, Papotti M, Acharya UR, Molinari F (2021) A
hybrid deep learning approach for gland segmentation in prostate histopathological images.
Artif Intell Med 115:102076
11. Sirinukunwattana K, Pluim JP, Chen H, Qi X, Heng P-A, Guo YB, Wang LY, Matuszewski
BJ, Bruni E, Sanchez U et al (2017) Gland segmentation in colon histology images: the glas
challenge contest. Med Image Anal 35:489–502
12. Wang LY (2019) Gland instance segmentation in colon histology images. PhD thesis, University
of Central Lancashire
13. Inamdar MA, Raghavendra U, Gudigar A, Bhandary S, Salvi M, Deo RC, Barua PD, Ciaccio
EJ, Molinari F, Acharya UR (2023) A novel attention based model for semantic segmentation
of prostate glands using histopathological images. IEEE Access
14. Moxley-Wyles B, Colling R, Verrill C (2020) Artificial intelligence in pathology: an overview.
Diagn Histopathol 26(11):513–520
15. Anwar SM, Majid M, Qayyum A, Awais M, Alnowami M, Khan MK (2018) Medical image
analysis using convolutional neural networks: a review. J Med Syst 42:1–13
16. Lu L, Zheng Y, Carneiro G, Yang L (2017) Deep learning and convolutional neural networks
for medical image computing. Adv Comput Vis Pattern Recogn 10:978
17. Kainz P, Pfeiffer M, Urschler M (2017) Segmentation and classification of colon glands with
deep convolutional neural networks and total variation regularization. PeerJ 5:3874
18. Dabass M, Dabass J, Vashisth S, Vig R (2023) A hybrid U-Net model with attention and
advanced convolutional learning modules for simultaneous gland segmentation and cancer
grade prediction in colorectal histopathological images. Intell Based Med 7:100094
19. Tharwat M, Sakr NA, El-Sappagh S, Soliman H, Kwak K-S, Elmogy M (2022) Colon cancer
diagnosis based on machine learning and deep learning: modalities and analysis techniques.
Sensors 22(23):9250
20. Masud M, Sikder N, Nahid A-A, Bairagi AK, AlZain MA (2021) A machine learning approach
to diagnosing lung and colon cancer using a deep learning-based classification framework.
Sensors 21(3):748
21. Alboaneen D, Alqarni R, Alqahtani S, Alrashidi M, Alhuda R, Alyahyan E, Alshammari T
(2023) Predicting colorectal cancer using machine and deep learning algorithms: challenges
and opportunities. Big Data Cogn Comput 7(2):74
22. Hasan MI, Ali MS, Rahman MH, Islam MK et al (2022) Automated detection and
characterization of colon cancer with deep convolutional neural networks. J Healthc Eng
23. Nasir ES, Parvaiz A, Fraz MM (2023) Nuclei and glands instance segmentation in histology
images: a narrative review. Artif Intell Rev 56(8):7909–7964
24. Dabass M, Vashisth S, Vig R (2021) Attention-guided deep atrous-residual U-Net architecture
for automated gland segmentation in colon histopathology images. Inf Med Unlock 27:100784
25. Alzubaidi L, Al-Amidie M, Al-Asadi A, Humaidi AJ, Al-Shamma O, Fadhel MA, Zhang
J, Santamaría J, Duan Y (2021) Novel transfer learning approach for medical imaging with
limited labeled data. Cancers 13(7):1590
26. Oliveira SP, Montezuma D, Moreira A, Oliveira D, Neto PC, Monteiro A, Monteiro J, Ribeiro
L, Gonçalves S, Pinto IM et al (2023) A cad system for automatic dysplasia grading on h&e
cervical whole-slide images. Sci Rep 13(1):3970
Chapter 23
Monkeypox Detection and Other Skin
Regularities Using OpenCV

Vijay Gaikwad and Tejas Kinare

1 Introduction

Monkeypox is a disease that emerged as a dangerous disease and had its outbreak
in 75 countries while the world was busy in recovering during Covid 19 [1]. The
first instance of this mysterious disease was reported on May 24, 2003. A small
negative bacillus was outlined in a laboratory which concluded that it was identified
as an Acinetobacter species and was contaminated in the means of its behavioral
symptoms. The early detection of monkeypox is helpful in treating the symptoms
to ensure faster recovery. This can be achieved by analyzing the skin texture using
computer vision and machine learning approaches [2]. It requires the dataset of the
skin infected with monkeypox. The skin changes due to monkeypox can be identified
using traditional computer vision or deep learning methods. The skin features can
be extracted using texture descriptors or convolutional neural networks (CNN) can
be used for feature extraction. The various machine learning based classifiers can be
used to categorize skin disorders.
The most vulnerable hit countries from the monkeypox are central and western
Africa [3]. To tackle the disease and help in diagnosis a light cycler method was
proposed which was also known as (LC-qPCR).
The biological findings of the monkeypox disease were conducted with the help of
various laboratory findings and dermatologists [4]. Apart from monkeypox, similar
findings were the Identification of endemic containing community spread.
The testing with the help of gonorrhea and chlamydia and recognized MPXV-
DNA-positive samples of four men presented a certainty of monkeypox cases [5].

V. Gaikwad · T. Kinare (B)

Department of Electronics and Telecommunication, Vishwakarma Institute of Technology, Pune,
Maharashtra, India
e-mail: tejas.kinare20@[Link]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025 293
H. Mittal et al. (eds.), Proceedings of International Conference on Paradigms of
Communication, Computing and Data Analytics, Algorithms for Intelligent Systems,
[Link]
294 V. Gaikwad and T. Kinare

These cases were not all diagnosed which made an outcome that testing and quaran-
tining will not help to contain the widespread. Although the Varicella-Zoster Virus/
Chickenpox virus which is prevalent in humans is self-limiting, the monkeypox
spread did not exist many years back [6]. A study suggested by Kurt Daniel Reed,
Mary Beth Graham, Russell L Regnery, Yu Lithe suggested the isolation and verifi-
cation of monkeypox from humans in the top half of the world [7]. This was mainly
due to the direct contact of humans with the dogs who were kept alone or left as
street dogs or sold as pets. Although the main motive to work on this topic was to
clarify the difference in the accuracies of the ML model and open cv algorithm.

1.1 Existing System

There are many systems that detect monkeypox disease. Some of the systems used
the machine learning model then some of those used image processing methods to
classify the skin diseases by training the dataset of different skin rashes and other
related symptoms. One of the existing systems used the CNN, VGG16 and PSOBER
algorithms which helps to optimize the neural network by tweaking the parameters
to obtain best accuracy.

1.2 Proposed System

The proposed system is based on the OpenCV platform and gives the accuracy about
85%. This system is user friendly as compared to other existing systems. Here we
have used various feature extraction algorithms like GLCM, LCBM and compared
it to machine learning algorithms like XGBOOST which makes the system more
unique. We solved the problem of dataset because some of the existing systems have
a limited number of datasets. We have not only a large number of data sets but also
an effective and efficient dataset. In short, the proposed system solved most of the
problems which were faced by the existing systems.

2 Literature Survey

Brief description of this section paper mentions research carried out during 2003
till 2022. The major findings that came across the detection, analysis, processing of
various computer vision or deep learning models are discussed.
Various recent developments in computer vision, emphasizing the integration of
image processing, pattern recognition, machine learning, and computer graphics to
extract and interpret information from images [8].
23 Monkeypox Detection and Other Skin Regularities Using OpenCV 295

A Study chosen to analyze and build a skin database consisting of skin rash
images [9] including five different anomalies viz. monkeypox, chickenpox, cowpox,
smallpox, and measles. This was done using a highly python imaging library
and scikit-image learning library 0.19.3 modules with a large range of trainable
parameters ranging from one to 26 million.
The overall understanding according to [10] is that the first step is Image
Processing and second is Machine Learning. The image processing phase involves
the uprooting of features from the input images. In the machine learning phase, many
feature-rich images are fed into the training set in which they have used the Keras
library for implementation. Another way of classification of skin anomalies/diseases
is by using popular CNN architecture viz. VGG16, ResNet50 and InceptionV3 as
mentioned in [11] These architectures can perform in large scale image processing
tasks including recognition of the the later architecture compiles of the reschedule
models in which the convolution operation is performed and the last performs the
better back propagation respectively. Authors in [12] explained about some of the
main causes of the spread of Monkeypox virus like frequent mobility of humans,
cross border transport of animals, biological warfare and potential threat of bioter-
rorism. Convolutional neural network was pre-trained followed by a SVM module
[13]. SVM is useful for the classification and regression challenges.
Another algorithm proposed is the PSOBER algorithm [14] which is based on
binary techniques of extraction, which indeed is a mix of bit error rate (BER) and
particle swarm optimization (PSO) algorithms and is intended to find the best and
most popular set of features. The latter is denoted by SCBER; which helps to optimize
the neural network by tweaking the parameters to obtain best accuracy.
Comparison of 13 different deep learning models was performed according to
[15] and the research identified the highest performing DL model to unite both to
improve the overall performance. Two limitations are: Data set is limited and the AI
method for performing and executing code is based on the grounds of NN models.
[16] The system accuracy is 98.25%, sensitivity is 96.55%, specificity and F1 score
as 100.00% and 98.25% respectively is obtained with the help of best performing
models namely MobileNet V2. Potential of using the fuzzy logic system and the
powerful capabilities of artificial neural networks [17] to construct a verse which can
be able to distinguish, classify monkeypox disease efficiently.
Also, implementation of the VGG16 model was done to study small and collectible
datasets as in [18]. It also uses Local Interpretable Model-Agnostic Explanations
(LIME) to explain the reason behind prediction systems, used in great demand over
various ML models in the industry nowadays.
A contemporary review [19] on the state of monkeypox and similar diseases in
the healthcare industry in relation to the ongoing outbreaks around the world. The
authors also discuss the future predictions of this disease. The detailed review [20]
gives insights about the current MPX understandings and draws a picture of world
leading understandings of management, treatment and prevention and diagnosis.
296 V. Gaikwad and T. Kinare

3 Methodology

This section of the paper explains the working of the project, where open cv is used
for the purpose of image processing. According to the flow of the project given in
Fig. 1, the working of the project is in the following manner: Firstly, the data set has
been chosen and imported for the working of a project of monkeypox detection Using
OpenCV. The data is then used for data augmentation in order to grab more features
which can be used later for feature extraction. Then the overall dataset including the
original images and the augmented images which is sent for preprocessing part of the
project. Where we change the size, that is, crop the image to certain pixel dimensions
and perform operations like gray scaling, magnifying, blurring etc.
After preprocessing it follows a commonly specified lists of operations like feature
extraction then training and testing the dataset using the above-mentioned machine
learning models.
The dataset includes 7136 images in total. Images in total which will be used for
training and testing purposes, out of which 5357 images are used for the training
purpose and the rest 1779 images are split to testing data shown in Table 1.
Performed the pre-processing on the training data set. Where separately trained
the monkeypox images and the non monkeypox images or the other images are shown
in Fig. 2. Furthermore, to extract more viable features for actual feature extraction
from the image later, we performed data augmentation on the dataset.
The data augmentation of the images is again done in two halves viz, for the
monkeypox images and other images differently. This is done in the batch of 10
images each by using a for loop and datagen_flow function.
The obtained images are then stored in a separate folder in which the size of
images is resized to 64 × 64, the images are then converted into grayscale and are
mapped to a matrix.

3.1 Feature Extraction

Once the processing of the images is done and the images are converted to matrices,
the gray level co-occurrence matrix (GLCM) algorithm is used to extract features
from the stored matrix.
The GLCM properties of an image are expressed as a matrix with the same number
of rows and columns as the gray values in the image. Mathematical operations which
are used in the system are based on the features. Equation 1 represents the contrast
of the image, which is extracted, whereas the dissimilarity is given in Eq. 2. Harlik
statistical features like the Homogeneity, attribute selection measure also known
as ASM and energy are represented in Eqs. 3, 4 and 5 respectively. Thus, these
mathematical operations are performed on the converted matrices, which help us to
extract features from the images.
23 Monkeypox Detection and Other Skin Regularities Using OpenCV 297

Fig. 1 Flowchart of the project

298 V. Gaikwad and T. Kinare

Table 1 Dataset details

Sr no Class No. of images
1 Monkeypox 3386
2 Others 3750
Total 7136

Fig. 2 Dataset images description

levels−1
Contrast = Pi,j (i − j)2 (1)
i,j=0
levels−1
Dissimilarity = Pi, j|i − j| (2)
i,j=0

levels−1 Pi,j
Homogeneity = Pi, j (3)
i,j=0 1 + (i − j)2
levels−1
ASM : P2i, j (4)
i,j=0

√
Energy : ASM (5)

The main logic behind the extraction of features based on all the above-mentioned
parameters is given below in Algorithm 1.
Algorithm 1: Feature Extraction using GLCM

Input: Image dataset (7136)

Output: Extracted feature matrix
1. for each class in train do
2. Specified dataset
3. for loop for each image in dataset
4. Pre-processing
5. Compute GLCM and extract feature vector of 1x50 (contrast, dissimilarity, correlation and
homogeneity and energy)
6. Append in excel file
7. for end
8. for end
9. Final Feature vector (7136 x 50)
23 Monkeypox Detection and Other Skin Regularities Using OpenCV
299
300 V. Gaikwad and T. Kinare

3.2 Training and Testing

The complete understanding of all types of features now helps us to go to the next
step, which is to train the data set using various training and testing machine learning
models.
Light Gradient Boosting Machine (LGBM) is a gradient boosting framework that
uses decision tree-based learning algorithms to classify between different classes. In
this project we have used an LGBM classifier with Dropouts meet Multiple Additive
Regression Tree (DART) boosting type. We have used a DART boosting classifier in
LGBM for training and testing in which the split is in the ratio of 75% for training
and 25% for testing.
The parameters for LGBM classifiers are one of the major factors in deciding the
overall accuracy of the model. Some of the most important of those are as num_
leaves are the most important parameters to calculate the overall accuracy of the
model as well in deciding whether the model is overfitted.
Max_depth has a major impact on the time required by the model to run and the
accuracy of the model.
In our project we have kept both above mentioned parameter’s value equal which
is 100, as this helps in getting highest possible accuracy.
An overall view of the different processes of the project is summarized in the
above-mentioned block schematic as shown in the below Fig. 3, which describes the
general working idea of the project.

Fig. 3 Block diagram of the project

23 Monkeypox Detection and Other Skin Regularities Using OpenCV 301

4 Results

After storing the extracted features into a Pandas DataFrame and exporting them into
an Excel file according to Table 2, the interpreted data can be utilized to analyze the
significance of important features. This analysis typically involves plotting graphs
to visualize the relationships or distributions of these features. As shown in table 2
various GLCM features are extracted like energy, correlation, contrast etc. around
2141 features are extracted from the dataset.
For instance, in Fig. 4, you might expect to see a graph that illustrates how the
extracted features vary or correlate with each other. This could involve plotting the
features against each other, or perhaps plotting them against some target variable
if available, to understand their impact or significance in the context of the project.
Such visualizations can provide valuable insights into the underlying data patterns
and aid in making informed decisions or drawing conclusions about the project’s
outcomes.
After the successful feature extraction using GLCM we have predicted and classi-
fied the data whose performance parameters are accuracy, precision and recall which
are given by the Eqs. 6, 7 and 8 respectively.

TP + TN
Accuracy = (6)
TP + TN + FP + FN
TP
Precision = (7)
TP + FP
TP
Recall = (8)
TP + FN

where,

Table.2. Extracted features into a table

Energy Correlation Dissimilarity Homogeneity Contrast
0 0.039458 0.992374 2.125200 0.394390 9.359585
1 0.039458 0.992374 2.125200 0.394390 9.359585
2 0.147137 0.981567 3.048547 0.479066 154.517477
3 0.250421 0.987548 2.197209 0.550316 133.013673
4 0.039458 0.992374 2.125200 0.394390 9.359585
– – – – – –
2137 0.04 5100 0.996902 1.446028 0.521859 5.202835
2138 0.011389 0.286349 29.460242 0.051382 1869.463885
2139 0.011829 0.591886 20.695648 0.048687 681.304552
2140 0.069268 0.998665 0.750941 0.682194 1.366051
2141 0.069268 0.998665 0.750941 0.682194 1.366051
302 V. Gaikwad and T. Kinare

Fig. 4 GLCM extracted feature comparison

TP: True Positive = Successful prediction/classification of target variable.

TN: True Negative = Successful prediction/classification of non-targeted variable.
FP: False Positive = Unsuccessful prediction/classification of target variable.
FN: False Negative = Unsuccessful prediction/classification of non-target variable.
The tabular representation of the comparison of accuracy, precision and recall is
shown in Table 3.
As shown in Table 3 LGNM model performance parameter shows 80% score with
precision of 0.79 and recall score is achieved of 0.8.
Furthermore, all the possible results of the final prediction of monkeypox disease
are shown. Figure no 5 represents the correct prediction of monkeypox disease which
is the true positive outcome w.r.t performance parameters. Figure no 6 represents the
correct prediction of other diseases which is the true negative outcome w.r.t perfor-
mance parameters. Figure no 7 represents the incorrect prediction of monkeypox
disease which is the true negative outcome w.r.t performance parameters. Figure no
8 represents the incorrect prediction of other diseases which is the false negative
outcome w.r.t performance parameters.
The labels on the top row of every image in these predicted images are the extracted
feature vectors (Figs. 5, 6, 7 and 8).

Table.3. Performance Parameters

Sr. No Model Performance Parameter Performance Score
1 LGBM (DART) Accuracy 80%
2 LGBM (DART) Precision 0.7976190476190477
3 LGBM (DART) Recall 0.8
23 Monkeypox Detection and Other Skin Regularities Using OpenCV 303

Fig. 5 Correct Monkeypox Detection (TP)

Fig. 6 Correct Other diseases detection (TN)

304 V. Gaikwad and T. Kinare

Fig. 7 Incorrect Monkeypox Detection (FP)

Fig. 8 Incorrect Other diseases detection (FN)

5 Conclusion

The system proposed in this project is a novel and unique approach that can success-
fully detect and classify the image whether it is of monkeypox or not. This is achieved
by the combination of various algorithms discussed in the above proposed method-
ology, which includes feature extraction algorithms like GLCM and then algorithms
for classification like LGBM, SVM and XG Boost. The accuracy of the system in
classification of monkeypox disease as compared to non monkeypox or other diseases
is 80%. The algorithm used for feature detection is GLCM and for training and testing
23 Monkeypox Detection and Other Skin Regularities Using OpenCV 305

is LGBM which makes the system more efficient and less complex and more effec-
tive than any other existing systems. But as in any case, our proposed method is not
perfect. The limitations of the proposed system are that the data set for the system
is limited. The system can predict only whether the disease is Monkeypox or not, it
cannot classify into other skin diseases and not 100% accuracy as some skin diseases
are quite similar to the Monkeypox skin rash. Future scope of the study will be to
overcome from the above limitations and make the system more efficient and more
effective.

References

1. Goswami T, Dabhi V, Prajapati HB (2020) Skin disease classification from image. [Link]
org/10.1109/ICACCS48705.2020.9074232
2. Hosny KM, Kassem MA, Foaud MM (2019) Department of information technology, “classi-
fication of skin lesions using transfer learning and augmentation with Alex-ne”
3. Manjurul Ahsan Md, Uddin MR, Farjana M et al. (2022) Image data collection and imple-
mentation of deep learning-based model in detecting monkeypox disease using modified
vgg1”
4. Masayuki et al. (2008) Diagnosis and assessment of monkeypox virus (MPXV) infection
by quantitative PCR assay: differentiation of Congo Basin and West African MPXV strains
61:140–142
5. Thornhill JP et al. (2022) Monkeypox virus infection in humans across 16 countries, P 207323
6. De Baetselier I, Van Dijck C, Kenyon C (2022) Retrospective detection of asymptomatic
monkeypox virus infections among male sexual health clinic attendees in Belgium
7. Ramana KV, Mahendra P et al (2017) Epidemiology, diagnosis, and control of Monkeypox
disease: a comprehensive review. American J Infectious Diseases Microbiol 5(2):94–99
8. Wiley V, Lucas T, Informatika T (2018) TSCC Jakarta Indonesia, “computer vision and image
processing, international journal of artificial intelligence research”
9. Hussain MA et al. (2022) Can artificial intelligence detect MONKEYPOX from digital skin
images. 08.08.50319
10. Rajasekaran G, Aiswarya N, Keerthana R (2020) Assistant Professor, Dept of CSE, Jeppiaar
SRR Engineering College, Padur, Chennai, Skin disease identification using image processing
and machine learning techniques. Int Res J Eng Techno (IRJET) e-ISSN: 2395–0056 Volume
2, 03
11. Ali SN et al. (2022) Monkeypox skin lesion detection using deep learning models: a feasibility
study
12. Ali SN et al. (2018) Emergence of Monkeypox as the most important Orthopoxvirus published:
04 September 2018 [Link]
13. Sklenovská N et al. (2021) Intelligent system for skin disease prediction using machine learning
14. Abdelaziz A et al. (2022) Classification of Monkeypox images based on transfer learning and
the Al-biruni earth radius optimization algorithm
15. Chiranjibi, Bahadur Shahi ST (2022) Monkeypox virus detection using pre-trained deep
learning-based approaches. J Med Syst
16. Akin KD et al. (2022) Classification of Monkeypox skin lesion using the explainable conference
on innovative academic studies ICIAS
17. Akin D, Gurkan C, Budak A, Karatas H (2023) Artificial intelligence assisted convolutional
neural networks
18. Tom, et al. (2018) A neuro-fuzzy based model for diagnosis of Monkeypox disease. Int J
Comput Sci Trends Techno (IJCST) 6(2)
306 V. Gaikwad and T. Kinare

19. Reed KD, Graham MB, Regnery RL, Li Y (2004) The detection of Monkeypox in Humans in
the Western Hemisphere. [Link]
20. Titanji BK et al. (2022) Monkeypox a contemporary review for healthcare professionals,
310.6615388

Common questions

Different types of adders contribute to the performance of a hybrid adder by addressing specific needs within the VLSI circuit design. The Ripple Carry Adder (RCA) is suitable for small bit-width additions due to its simplicity . Han-Carlson, Weinberger, and Ling adders offer efficient parallel computation to enhance speed . Each type of adder is optimized for specific bit ranges and has mechanisms for conditional carry selection to improve functionality. The Weinberger adder minimizes logic stages and delays . This strategic combination enhances overall performance in terms of area, power, and delay, leading to improved VLSI circuit efficiency .

The Binary to Excess-1 Converter (BEC) is used in VLSI adder design to minimize logic redundancy, which helps in reducing power and area usage . It is employed to convert binary numbers to excess-1 notation, facilitating specific roles in functional operations such as efficient carry computations and optimizing the addition process . By minimizing redundant logic and improving area efficiency, BEC circuits contribute to the overall performance enhancements of hybrid adders, particularly in contexts where area and power utilization are critical metrics .

Hybrid technology in adder design enhances area, power, and delay efficiency by incorporating modules like Ling, Weinberger, and Han-Carlson adders, each addressing specific bottlenecks. Ling adders optimize long-distance carry propagation, Weinberger adders reduce logic stages and carry generation duration, while Han-Carlson adders ensure balance in complexity . These are complemented by Binary to Excess-1 Converter circuits to minimize logic redundancy. The strategic integration of these modules leads to improved overall efficiency, with hybrid adders demonstrating better performance metrics in VLSI circuit applications .

The hybrid adder improves performance metrics such as area, power, and delay by integrating various types of adders each optimized for specific tasks. For example, it combines Ripple Carry Adder, Han-Carlson Adder, Binary to Excess-1 Converter, Weinberger Adder, and Ling Adder, each with unique advantages like efficient parallel computation and reduced logic redundancy . These attributes help reduce area and power consumption by 4-35% and 10-50% respectively in 90 nm technology, and improve performance efficiency compared to traditional adders .

The Multi-objective Jaya Optimization (MJO) algorithm plays a key role in the Jaya Convolutional Neural Network (CNN) for handwritten Optical Character Recognition (OCR) by maximizing the initial weight number in the network . Its primary goals are to reduce intra-class variance and increase inter-class distance, which is crucial for improving the accuracy and reliability of the character recognition process. The MJO assists in refining the convolutional responses into a condensed feature space, thereby enhancing the performance of conventional classifiers to identify the characters from various datasets more effectively .

Bibliometric analysis in AI-guided breast cancer diagnosis and prognosis research offers insights by aggregating publications, discerning trends, and evaluating high-impact studies to inform clinical and research directions . Employing disruptive quality measures further emphasizes forward-thinking methodologies, highlighting pivotal studies that could shape future diagnostic advancements. This approach helps correlate citation impacts with research outcomes, guiding resource allocation and developmental prioritization in AI and cancer research domains .

Support Vector Machines (SVMs) offer opportunities in classification tasks due to their effectiveness in high-dimensional spaces and robustness against overfitting, particularly when dealing with a small number of examples relative to dimensions . However, challenges include computational inefficiency in very large datasets and the difficulty in selecting optimal kernel functions. SVMs also have limitations in handling non-linear data unless kernel tricks are applied. Despite these challenges, SVMs remain a powerful tool for various classification problems given their solid theoretical foundations .

The convolutional neural network (CNN) is effective for processing image-like data due to its ability to gather spatial data hierarchies representative of both basic and complex patterns . This is achieved through its architecture inspired by the visual cortex of animals . CNNs consist of three main types of learning layers: convolutional layers for feature detection, pooling layers for down-sampling, and fully connected layers for producing the final output such as classification . These components work in conjunction to autonomously extract and prioritize relevant features within the data .

Deep learning for recognizing Gujarati characters offers the advantage of self-learning representation features, which reduces manual feature extraction intervention . The approach achieved an accuracy of about 78.6%, indicating a higher performance compared to the hybrid method using k-Nearest Neighbor (k-NN) and tree classifiers, which only achieved a 63% accuracy . Deep learning models provide both feature extraction and classification autonomously, which makes them particularly suited for handling large volumes of input and complex recognition tasks, compared to traditional methods that might require more human oversight .

Deep learning models provide key advantages for feature extraction and classification tasks due to their self-learning capabilities which mimic the brain’s information processing . They offer autonomously determined feature prioritization with minimal programmer intervention, making them well-suited for handling high volumes of complex input and output data. These models can identify and utilize relevant features autonomously, enhancing the accuracy and speed of classification tasks compared to traditional methods that require manual feature extraction .