Advanced Computing
Advanced Computing
Rodrigues ·
Suneet Kumar Gupta · Xiaochun Cheng ·
Pushpender Sarao ·
Govind Singh Patel (Eds.)
Advanced Computing
13th International Conference, IACC 2023
Kolhapur, India, December 15–16, 2023
Revised Selected Papers, Part I
Communications
in Computer and Information Science 2053
Advanced Computing
13th International Conference, IACC 2023
Kolhapur, India, December 15–16, 2023
Revised Selected Papers, Part I
Editors
Deepak Garg Joel J. P. C. Rodrigues
SR University COPELABS, Lusófona University
Warangal, India Lisbon, Portugal
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
The objective of the 13th International Advanced Computing Conference (IACC 2023)
was to bring together researchers, developers, and practitioners from academia and indus-
try working in the domain of advanced computing. Researchers were invited to share
their thoughts and present recent developments and technical solutions in the domains of
Advances in Machine Learning and Deep Learning, Advances in Applications of Arti-
ficial Intelligence in Interdisciplinary Areas, Reinforcement Learning, and Advances
in Data Science. The conference took place on the 15th and 16th December 2023 at
Ichalkaranji, Kolhapur, Maharashtra, India. All editions of the series are successfully
indexed in ISI, Scopus, DBLP, Compendex, SJR, and Google Scholar etc.
Conference follows single blind review process and has the policy of at least three
reviews per paper. This year’s conference received 425 submissions, of which 72 articles
were accepted. The conference has the track record of acceptance rates from 15% to
20% in the last 12 years. More than 13 IEEE/ACM Fellows hold key positions on the
conference committee, giving it a quality edge. In the last 12 years the conference citation
score has been consistently increasing.
This has been possible due to adherence to quality parameters of the review and
acceptance rate without any exception that allows us to make some of the best research
available through this platform.
Honorary Co-chairs
General Co-chairs
Program Co-chairs
Where You Think Stock Takes with the Linear Regression Model . . . . . . . . . . . . 242
Bharat S. Rawal, William Sharpe, Elizabeth Moseng, and Andre Galustian
Texture Feature Extraction Using Local Optimal Oriented Pattern (LOOP) . . . . . 407
Shital V. Sokashe-Ghorpade and S. A. Pardeshi
1 Introduction
Modern medical technology makes it easier for doctors to precisely evaluate a patient’s
condition and prescribe the best course of treatment. The most recent work included dig-
ital radiographs of teeth, typically referred to as orthopantomograms (OPG), as input for
human gender identification and age evaluation [1]. Radiologic technology has evolved
in numerous ways as a consequence of advancements in technology in the discipline of
medicine, including radiographic fluoroscopy, molecular imaging, and digital imaging.
Because the traditional approach used by forensic experts to identify individuals takes
too long and fails to produce precise outcomes, an entirely automated system for human
identification has been developed.
The human body is prone to undergoing changes during the course of a lifetime as a
result of any external factors or internal metabolism changes. Due to their hardness and
slow metabolism, teeth are the only structure in this situation that will not be harmed
by any factors. Dental X-ray scans can help with identification and are regarded as
good material for genetic research, odonatology, anthropology, and forensic analysis
in populations that are alive or dead [2, 3]. Images of a person’s teeth can be used
for identification with more accuracy than any other human body part. The manual
investigation procedure in forensic dentistry is aided by the depictions of dental eruption
factors and tooth growth stages in some atlases. In the context of civil law, forensic
medicine identification of a person is both difficult and sensitive. Therefore, predictions
made using the anatomical characteristics of teeth should be made with more precision
[4]. Teeth scans must be obtained from dental colleges, hospitals, or clinics with X-ray
imaging equipment as they are not available to the general public.
Among human teeth, the cuspids or eyeteeth exhibit notable sexual differences com-
pared to other teeth. This paper’s principal objective is to offer up-to-date information
and highlight developing patterns in the domain. It aims to address research deficiencies
pertaining to the utilization of machine learning methods for age and gender identifica-
tion, focusing on both current data and emerging trends. These teeth are naturally robust
and less susceptible to disease [6]. Currently, medical image analysis, including the study
of tooth dimensions and craniofacial morphologies, is a prominent area of research. Tra-
ditionally, forensic experts have manually assessed gender and age differences based on
these dental characteristics.
2 Literature Survey
Numerous researchers have primarily focused on manual methods for gender and age
identification based on dental attributes, but only a limited number have ventured into
leveraging machine learning and computer vision technologies. In this domain, we will
6 M. V. Lolakapuri et al.
briefly discuss recent research endeavors that showcase various methodologies, technical
intricacies, and significant contributions to the field of age and gender prediction.
Denis Milosevic et al. [1] introduced a model using Convolutional Neural Networks
(CNNs) for age and gender prediction. They utilized hyperparameter search techniques,
with Grid search demonstrating superior performance. Within their work, Nicolas Vila-
Blanco and co-authors (Citation [2]) put forward a model that makes use of deep learn-
ing neural networks and Convolutional Neural Networks (CNNs) for the purpose of
estimating chronological age.
In 2020, Saloni and colleagues [3] developed an approach utilizing digital den-
tal images to distinguish individuals by assessing the morphometric attributes of the
mandibular ramus across a sample of 250 orthopantomogram (OPG) images. Their
study investigated the mandibular ramus as a potential indicator for gender determi-
nation within this particular population, with the application of discriminant function
analysis highlighting significant sexual dimorphism. In the year 2020, Poornima Vadala
and her collaborators [4] presented a method that focused on the permanent mandibular
teeth located on the left side of the jaw. Their research was geared towards achieving
precise age estimation by employing the Camerer method with Indian-specific formulas
on both the left and right sides of the mandibular teeth.
In 2020, Okkesim and Erhamza [5] carried out a study centered on the identification
of human gender by assessing the mandibular ramus. Their research underscored the
importance of mandibular teeth in gender determination, considering that the mandible
bone stands out as the largest, most sexually dimorphic, and robust bone in the skull.
It’s worth noting that recent studies have underscored the advantages of cone-beam
computed tomography (CBCT) compared to conventional methods.
Collectively, these varied studies make significant contributions to the progress of
gender and age prediction using dental characteristics. They encompass a spectrum
of techniques, spanning from conventional methods to state-of-the-art deep learning
approaches.
where,
hv (l+1) = updated feature representation for node v in the (l + 1) layer.
hu (l) = feature representation of a neighboring node u in the (l) layer.
N(v) = set of neighboring nodes connected to node v.
dv and du = degrees of nodes v and u respectively, which signify the number of
connections each node maintains.
σ = activation function, often ReLU, applied element-wise to the computed result.
This equation encapsulates the essence of GCNs. It enables the model to iteratively
update a node’s feature representation by aggregating information from its neighboring
nodes while considering the graph’s structure. By performing this operation across mul-
tiple layers, GCNs can capture information at various scales and excel in tasks such as
node classification and semi-supervised learning, especially when dealing with limited
labeled data.
2. However, this approach falls short in processing information derived from the edges.
For instance, in a knowledge graph, distinct edges may signify different relationships
between nodes, which the method can’t effectively account for.
3. The reliance on a fixed point can stifle the diversification of node distributions, making
it unsuitable for scenarios where the objective is to learn diverse representations of
nodes.
To tackle the mentioned problem, various GCN variations have been suggested.
The odontometric features chosen for the examination and evaluation of gender and
age encompass:
Incisor Width: We conducted an analysis and measurement of the width of central
incisors in both the mandibular and maxillary regions. Significantly, there are variations
in the measurement of incisors in the mandibular region that distinguish between males
and females.
Intercanine Distance: We conducted measurements for the intercanine distance,
which involved assessing the space between the canines in both the maxillary and
mandibular jaws. This intercanine distance is precisely the measurement between teeth
numbered 13 and 23 in the maxillary jaw and between teeth numbered 33 and 43 in the
mandibular jaw. For a visual representation of this process, please refer to Figs. 2(a) and
2(b), which provide examples of the measurement procedure for maxillary incisor teeth
and the mandibular intercanine distance.
4 Dataset Collection
The creation of our gender-specific radiographic tooth image dataset was a rigorous
process. We collaborated with dental institutions and clinics, ensuring ethical compliance
and data privacy (Figs. 3 and 4).
5 Experiment Analysis
Our project revolves around the creation of a fully automated system with the primary
goal of predicting human gender based on age estimation. This system exclusively relies
on orthopantomograms (OPG) of teeth as its input. Notably, this streamlined model is
capable of delivering highly precise results within a minute. In this section, we delve into
the classification techniques employed and the outcomes achieved using different kernels
and hyperparameters in our classifiers. The Graph Convolutional Network (GCN) plays
a pivotal role in age estimation and gender determination.
Preprocessing:
The initial step in our prediction model involves preprocessing the input OPG image
to eliminate any unwanted image noise that may have been introduced during image
capture. Subsequently, we enhance the image’s brightness and quality.
Image Segmentation:
Image segmentation is a method applied to break down images into individual segments.
In our specific scenario, this process is essential for pinpointing image objects and their
boundaries. To accomplish this, we rely on the Canny edge detection algorithm, a pivotal
step in ensuring the accurate prediction of both age and gender.
Gender Prediction:
Gender prediction is of significant importance in forensic identification. Traditionally,
forensic experts and medical specialists have relied on time-consuming methods for esti-
mating gender, which demanded extensive training and expertise. Our study introduces
the GCN model, which leverages deep transfer learning and a fully automated app-
roach to analyze panoramic dental X-ray images. The structural flexibility of the GCN
10 M. V. Lolakapuri et al.
architecture, coupled with its reduced parameter count, facilitates speedy execution of
training and validation processes (Fig. 5).
Fig. 5. (a) and (b) Dataset distribution based on gender and age group.
6 Conclusion
In this paper, we developed an easy synthesis image of human faces using textual descrip-
tions of facial traits using the fully trained graph convolutional networks (GCN). The
ability to automatically generate realistic images from text has been achieved using graph
convolutional networks (GCN). In this work, we use BERT embeddings to embed text
in the StyleGAN2 input latent space and manage the generation of facial images from
text.
The generated images exhibit excellent image quality, text-to-image similarity, and
image diversity according to the qualitative and quantitative experiment results. The
image quality and description consistency still need to be improved, however. The images
produced show a 57% similarity to the real-world images. The semantic alignment and
image quality still need to be improved.
Age and Gender Estimation Through Dental X-Ray Analysis 11
References
1. Milosević, D., Vodanović, M., Galic, I., SubasIc, M.: A comprehensive exploration of neural
networks for forensic analysis of adult single tooth x-ray images. IEEE Access 10, 70980–
71002 (2022)
2. Milosevic, D., et al.: Deep neural networks for chronological age estimation from OPG images
vol. 39, no. 7 (2020)
3. Saloni, P.V., Mahajan, P., Puri, A., Kaur, S., Mehta, S.: Morphometric analysis of mandibular
ramus: a panoramic study. Ind. J. Dent. Res. 31(3) (2020)
4. Vadla, P., Surekha, R., Rao, G.V., Deepthi, G., Naveen, S., Kumar, C.A.: Assessing the accu-
racy of Cameriere’s Indian-specific formula for age estimation on right and left sides of
orthopantomogram. Egypt. J. Forensic Sci. 10(1), 1–6 (2020)
5. Okkesim, A., Erhamza, S.: Assessment of mandibular ramus for sex determination: retrospec-
tive study. J. Oral Biol. Craniofac. Res. 10(4), 569–572 (2020)
6. Ataş, M., Ayhan, T.: Development of Automatic Tree Counting Software from UAV Based
Aerial Images With Machine Learning. arXiv preprint arXiv:2201.02698 (2022)
Driver Drowsiness Detection System Using
Machine Learning Technique
1 Introduction
Although people are drowsy, they have trouble remaining awake, even while accom-
plishing things. The circadian cycle, which regulates when the body sleeps and when it
wakes up, is associated with this. Consequently, it has little to do with getting things done.
The human body has a natural tendency to sleep more heavily at some times of the day
compared to other times. Because our bodies naturally shift towards sleep and alertness
diminishes between midnight and around 6 a.m., this is mostly true throughout the night.
Since there is currently no universally accepted methods for gauging alertness, one must
instead learn to recognize and interpret the telltale signals of a hyper vigilant driver.
Behavioral and physiological indicators both falls within this umbrella [1]. Drowsy
drivers pose a threat to everyone on the road, including passengers and pedestrians, and
they may be hard to see and prevent. Research on methods for accurately detecting and
predicting drowsy driving is, therefore, essential for the improvement of transportation
security. Driver drowsiness detection systems include a lot of various aspects that most
researchers may make use of. According to Dua et al. (2021), surveillance frequently
involves combining information about the driver’s behavior and physiological markers
with information gathered from the vehicle The Ministry of Road Transport and High-
ways has implemented mandatory rest periods for commercial vehicle drivers and issued
guidelines for safer driving practices. Additionally, technology-based solutions, such as
the facial expression detection system mentioned earlier, are being explored to detect
and prevent driver fatigue [2]. While India’s automobile industry continues to thrive, the
safety of road users must remain a top priority. The statistics reveal the gravity of the
situation, with drowsiness emerging as a significant factor in road accidents. Through
concerted efforts by the government, agencies, and the public at large, it is possible to
mitigate the risks posed by driver fatigue and ensure safer roads for everyone. Since
it is now well acknowledged that sleepy driving is a key contributor to accidents and
traffic fatalities, driver drowsiness detection technologies have attracted a lot of interest.
Drowsy driving, which can impede a driver’s ability to respond quickly and make wise
decisions, is the act of operating a vehicle while feeling drowsy or weary [3]. Researchers
and automakers have been creating and adopting driver drowsiness detection technolo-
gies to improve road safety after realizing the risks of driving when fatigued. In order
to identify indicators of sleepiness and notify drivers in a timely manner, these systems
monitor driver behavior, physiological signals, and vehicle data using cutting-edge tech-
nology and algorithms. A driver drowsiness detection system uses various technologies,
such as cameras, sensors, and machine learning algorithms, to detect signs of drowsiness
in a driver and alert them to take a break. Different types of Driver drowsiness detection
measure given below in Fig. 1.
Driving when fatigued can result from a variety of Circumstances. Lack of Sleep, Sleep
disorders, Alcohol, Medications, Time of day.
In order to avoid or reduce sleepy driving conditions, many tactics and interventions are
used in driver drowsiness control. Making sure drivers receive enough rest and sleep
before getting behind the wheel is one of the key components of sleepiness control. The
probability of becoming sleepy while driving can be considerably decreased by getting
7–8 h of decent sleep. Additionally, taking frequent stops during lengthy drives enables
drivers to relax and rejuvenate themselves, preventing the buildup of weariness.
Driver Drowsiness Detection System Using Machine Learning Technique 15
2 Literature Survey
Look at the past studies done on driver drowsiness detection systems some of the key
components of a driver drowsiness detection system. Accidents happen for a variety
of reasons, including the driver being ill, the driver being inebriated, the environment
of the sick passenger, the lack of appropriate pauses or relaxation, the driver having
mental disorders, the driver being an uneducated driver, etc. The causes of these issues
can be boiled down to two types of circumstances: A tired or sleepy motorist, or a
negligent driver. Drowsiness can have a number of detrimental implications on a motorist,
including loss of coordination, errors in signal or turn judgment, and a delay in reaction
time. While each of these elements is crucial for safe driving, losing any one of them
can have serious consequences and potentially result in an accident [3]. As is common
knowledge, issues arise when drivers don’t stop and relax, and there is no other way
to address this issue. As a result, it is currently a hot topic for research. The goal is
to create a system that warns the driver in these circumstances. The alert tone rings
if the driver is sleepy or preoccupied. When the tone makes them aware that they are
distracted, a driver can focus again on the road and the controls. Similar to this, a sleepy
driver can be awakened by the tone and take the appropriate action. The system’s aim
is to determine whether the eyes and mouth are open/closed and identify fatigue and
drowsiness after locating the eye’s position. Reducing road accidents caused by driver
fatigue and drowsiness is critical, as they account for a significant proportion of such
incidents. It aims to contribute to these efforts by leveraging image processing algorithms
to detect drowsiness and fatigue in drivers. The early detection of a driver’s drowsiness
and alertness is an effective strategy for preventing. The development of technology to
identify and prevent drowsiness is a significant problem in the area of casualty prevention
structure. By closely monitoring the eyes of the driver, signs of drowsiness can be found
early, which may help to prevent accidents. Yawning detection is also used as a measure
16 N. Paliwal et al.
of driver fatigue, as fatigued individuals tend to yawn more to increase oxygen to the
brain before falling asleep.
3 Existing Work
Here, we use machine learning to harvest real-world data about drowsiness-related
human behavior [1]. Automatic classifiers that use 68 nodes to identify facial actions
were created by the Facial Action Coding system utilizing machine learning on a dif-
ferent database of unscripted expressions. These facial gestures encompass a variety of
different face movements in addition to blinking and yawning. The main objective is to
detect the location of mouth and eyes in a facial image by leveraging established image
processing algorithms. This entails examining the whole face image to determine the
precise location of the pupil and mouth. The system’s aim is to find out if the eyes and
mouth are open/shut and identify weariness and sleepiness after locating the eye’s posi-
tion. Reducing road accidents caused by driver fatigue and drowsiness is critical, as they
account for a significant proportion of such incidents. To this end, extensive research
has been conducted with a vision of developing safe driving systems that can help lower
the frequency of incidents.
strategy employed to accomplish the primary goals and will focus specifically on the
implementation. Analyzing every stage will be important to finish this job. At each step
until the project is finished, every decision and result of the method employed will be
thoroughly discussed. The Anaconda IDE and PyCharm IDE were the programmes used
in this study. OpenCV and Dlib are the algorithms used to identify the face, eyes, and
mouth region. The face is identified using the Haar Adaboost face detection algorithm.
Face detection is followed by the marking of facial markers on the image, such as the
location of the eyes, nose, and mouth. The functions of the eyes and mouth are calcu-
lated based on these landmarks. By using eye blinking, convolutional neural networks
can categorise eyes and detect tiredness in drivers.Additionally, a number of variables,
including sleep disturbances, medicines, and underlying medical diseases, might have
an impact on the presence and intensity of drowsy symptoms as shown in Fig. 2. To
prevent accidents and advance road safety, it is essential to identify and treat tiredness
early on. The hazards of sleepy driving can be reduced and the safety of the driver and
other road users can be ensured by recognizing the signs and taking the necessary action,
such as taking pauses, resting, or taking a little nap [15].
One sleepiness detection technique is measuring blinks of an eye and the percentage
of eye closure (PERCLOS). Reference [10] suggests a technique that involves learning
the pattern of eyelid closure duration for detecting eye blinks. Reference [8] states that
this method calculates the time it takes a person to close their eyes. As stated in [8], the
typical duration for an eye blink is approximately 310.3 ms. To detect drowsiness, the
PERCLOS method calculates the percentage of eyelid “droopiness” by measuring how
long the eyelids are shut for. This is calculated as: Per close = [Closed eye time/(Closed
eye time + Open eye time)] * 100.T o differentiate between fully open and fully closed
eyes, a software library stores sets of open and closed eyes. As the driver becomes drowsy,
their eyelids will drop for longer durations, allowing the transition of drowsiness to be
monitored. According to PERCLOS, when the eyes are 80% closed, it is assumed that the
driver is drowsy, as stated in references [4, 8], and [16]. The eye blink pattern method
and PERCLOS method for detecting drowsiness share a common issue, which is the
18 N. Paliwal et al.
requirement for the camera to be positioned at a precise angle to capture clear images
or videos without interference from eyebrows and shadows that obscure the eyes.
3.4 EEG
To conduct this procedure, the scalp is covered in tiny metal discs with delicate wires
called electrodes. These electrodes detect the minuscule electrical charges produced
by the brain cells’ activity, which are amplified and displayed on a computer screen
as a graph or printed on paper as a recording. This is an implementation of a drowsy
driver detection system. It continuously captures frames from a webcam, detects faces
in the frames, and analyzes the person’s eye movements to determine whether they are
drowsy or alert. Wearable sensors along with additional technological devices may detect
signs of drowsiness in the driver. We can assess the driver’s condition by monitoring
their vital signs, which include their heart rate, blood pressure, brain function, and
body temperature. To detect drowsiness and improve performance in this area, three
significant signals are electroencephalography (EEG), electrooculography (EOG), and
electrocardiograms (ECG) [(Barua et al. 2019]. However, it may not be a viable option
for monitoring driver drowsiness in everyday driving situations due to the inconvenience
of wearing a device with wires on the head, which could easily come loose if the driver
moves their head [4].
A driver’s drowsiness can be analyzed by observing their face and behavior. The sug-
gested approach entails identifying drowsiness by analyzing the position of the mouth
and processing images using a cascade of classifiers based on the Viola-Jones method
for facial detection. The images are then compared with a set of data for yawning [17].
One obstacle in this method is that some people put their hand over the mouth while
yawning, which can make it difficult to get good images. However, yawning is a clear
sign of drowsiness and fatigue show in Fig. 3.
The duration of eye blinks demonstrates that the longer someone shuts their eyes,
the drowsier they are considered to be. Someone who is sleepy will have their pupils
closed for a longer period of time than usual. Additionally, yawning is a symptom of
drowsiness and a normal human response when feeling drowsy or fatigued.
4 Proposed Work
These systems can be implemented in a variety of ways, such as in-vehicle cameras,
sensors, and wearable devices. Detection methods are divided into two parts 1. Perfor-
mances 2. Condition.There are two distinct categories of methods used to determine the
status of the driver show in Fig. 4.
a. Utilize physiological signals b. Employ synthetic vision methods.
4.1 Challenges
1. The ability to accurately detect drowsiness in all drivers and all driving conditions.
2. The ability to distinguish drowsiness from other factors that may affect driver
performance.
3. Such as distraction, stress, or illness.
4. The ability to provide alerts that are not disruptive or distracting to the driver.
5. The need for privacy concerns in implementing such systems.
6. The topic of driver drowsiness detection is an active area of research with many
ongoing developments and research [18, 19].
4. Feed the region of interest to our classifier model for eye state classification.
5. Finally, we will calculate the duration of eye closure to determine if the person is
drowsy or not discuss in Fig. 5.
The distinctive land-marks of the eyes are found in every shot of the video. The eye
aspect ratio (EAR) between height and width of the eye is calculated.
p2 − p6 + p3 − p5
EAR =
2p1 − p4
Here p1, p2, p3, p4, p5 and p6 are the 2D landmark locations. When an eye is open, the
EAR is typically constant, and when an eye is closed, it approaches zero. The proposed
method exhibits partial insensitivity to variations in head and body posture [20]. The
proportions of an open eye demonstrates minimal differences across individuals and
remains unaffected by uniform scaling or movement of one’s face in level, given that
blinking discuss in above Fig. 6.
Occurs simultaneously in each eye, the average eye aspect ratio (EAR) is computed.
Driver Drowsiness Detection System Using Machine Learning Technique 21
5 Experimental Results
Let’s discuss the results in Table 1 and Figs. 6 and 7. Analysis after applying our method
of calculating sleepiness. It is difficult to get the necessary data for an accurate system
evaluation, because of the fact that occurrences of hazardous drowsiness may not be
guaranteed to happen during regular driving, which could provide difficulties when
evaluating apps (Table 2 and Fig. 8).
Table 2. Recognition thresholds for certain sleepiness Characteristics under particular circum-
stances.
Fig. 8. Percentage of Hit (when driver’s face covered vs. driver’s face not covered)
6 Conclusion
The main purpose was to make a communicable system that detects fatigue in drivers
and provides timely warnings to prevent accidents. By monitoring the driver’s eyes using
a camera and algorithm, this system aims to help prevent serious motorway accidents
caused by drowsy driving. Talks about the use of Facial Landmarks and EAR (Eye
Aspect Ratio) to get good results. One of the limitations is that the system may have
difficulty handling multiple faces in the view of the lens. This can lead to incorrect or
undesired output and also slows down the speed of detection because the system has to
process multiple faces. To avoid this, the system should be designed to ensure that only
the person’s profile is within the camera’s range.
References
1. Chacon-Murguia, M.I., Prieto-Resendiz, C.: Detecting driver drowsiness: a survey of system
designs and technology. IEEE Consum. Electron. Mag 4(4), 107–119 (2015)
2. Nordbakke, S., Sagberg, F.: Sleepy at the wheel: Knowledge, symptoms and behaviour among
car drivers. Res. Gate 1–10 (2007). ISSN 1369-8478
3. de Naurois, C.J., Bourdin, C., Stratulat, A., Diaz, E., Vercher, J.L.: Detection and prediction
of driver drowsiness using artificial neural network models. Accid. Anal. Prev. 126, 95–104
(2019)
4. Arefnezhad, S., Hamet, J., Eichberger, A., et al.: Driver drowsiness estimation using EEG
signals with a dynamical encoder–decoder modeling framework. Sci. Rep. 12, 2650 (2022)
5. Arefnezhad, S., Samiee, S., Eichberger, A., Nahvi, A.: Driver drowsiness detection based
on steering wheel data applying adaptive neuro-fuzzy feature selection. Sensors 19(4), 943
(2019)
6. Fu, R., Wang, H., Zhao, W.: Dynamic driver fatigue detection using hidden Markov model in
real driving condition. Expert Syst. Appl. 63, 397–411 (2016)
7. Li, G., Chung, W.Y.: Detection of driver drowsiness using wavelet analysis of heart rate
variability and a support vector machine classifier. Sensors 13(12), 16494–16511 (2013)
8. Kristensen, F., Hedberg, H., Jiang, H., Nilsson, P., Öwall, V.: An embedded real-time
surveillance system: implementation and evaluation. J. Sig. Process. Syst. 52, 75–94 (2008)
9. Chand, H.V., Karthikeyan, J.: CNN based driver drowsiness detection system using emotion
analysis. Intell. Autom. Soft Comput. 31(2), 717–728 (2022)
Driver Drowsiness Detection System Using Machine Learning Technique 23
10. Babu, A., Nair, S., Sreekumar, K.: Driver’s drowsiness detection system using Dlib HOG.
In: Karuppusamy, P., Perikos, I., García Márquez, F.P. (eds.) Ubiquitous Intelligent Systems.
SIST, vol. 243, pp. 219–229. Springer, Singapore (2022). https://doi.org/10.1007/978-981-
16-3675-2_16
11. Zhao, Z., Zhou, N., Zhang, L., Yan, H., Xu, Y., Zhang, Z.: Driver fatigue detection based on
convolutional neural networks using EM-CNN. Comput. Intell. Neurosci. 2020, 11 (2020).
Article ID 7251280
12. Caryn, F.H., Rahadianti, L.: Driver drowsiness detection based on drivers’ physical
behaviours: a systematic literature review. Comput. Eng. Appl. J. 10(3), 161–175 (2021)
13. Albadawi, Y., Takruri, M., Awad, M.: A review of recent developments in driver drowsiness
detection systems. Sensors 22(5), 2069 (2022)
14. Choudhary, Y., Aggarwal, A., Agarwal, A.: Detecting drivers’ drowsiness using Haar cas-
cade classifier. In: 2022 9th International Conference on Computing for Sustainable Global
Development (INDIACom), pp. 318–322. IEEE (2022)
15. Pauly, L., Sankar, D.: Detection of drowsiness based on HOG features and SVM classifiers.
In: International Conference on Research in Computational Intelligence and Communication
Networks (ICRCICN), Kolkata, India, pp. 181–186. IEEE (2015)
16. Altameem, A., Kumar, A., Poonia, R.C., Kumar, S., Saudagar, A.K.J.: Early identification and
detection of driver drowsiness by hybrid machine learning. IEEE Access 9, 162805–162819
(2021)
17. Tamanani, R., Muresan, R., Al-Dweik, A.: Estimation of driver vigilance status using real-time
facial expression and deep learning. IEEE Sensors Lett. 5(5), 1–4 (2021)
18. Abbas, Q., Alsheddy, A.: Driver fatigue detection systems using multi-sensors, smartphone,
and cloud-based computing platforms: a comparative analysis. Sensors 21, 56 (2021)
19. Chinara, S.: Automatic classification methods for detecting drowsiness using wavelet packet
transform extracted time-domain features from single-channel EEG signal. J. Neurosci.
Methods 347, 108927 (2021)
20. Dua, M., Singla, R., Raj, S., Jangra, A.: Deep CNN models-based ensemble approach to
driver drowsiness detection. Neural Comput. 33, 3155–3168 (2021)
Facial Expression Recognition: Detection
and Tracking
Ghaziabad, India
Abstract. One of the simplest ways to tell someone else apart from you is by their
face. A personal identification system like face recognition may use an individual’s
traits to identify them. Detection of any Face and Stage are the two stages of process
of the face recognition of human, which is used for facial image recognition modal
(face recognition) in biometric technology. The Eigen face method and the Fisher
face method are the two categories of methods that are frequently used in created
facial recognition patterns. Principal Component Analysis (PCA) for countenance
is used to reduce the number of faces in three-dimensional space by the Eigen face
approach for image facial recognition. Finding the eigenvector that resembled the
most crucial Eigen value of the face image was the major goal of applying PCA
[1] on face recognition using Eigen faces [2]. Image processing is used in face
detection systems with face recognition. This requires mat lab software, which
is the required program. Neural networks are categorized as deep learning. Deep
learning’s foundational component, feature learning, aims to obtain hierarchical
information using hierarchical networks in order to address significant issues that
previously required artificial design features. The framework used is termed as
Deep Learning and it may include n number of significant algorithms.
1 Introduction
The task of displaying facial expressions on an image in accordance with features that are
several, including such as happiness, anger, surprise, fear, sadness, and many mores, are
known as facial expression recognition. Detecting human facial expressions can be used
in a variety of contexts, including any kind of protection to sensitive personal data. The
primary justification for identifying someone is security. We can recognize the person
using a variety of methods, including voice recognition, passwords, retina detection,
and finger-print matching. Face expression recognition will be used to determine greater
intent. Artificial intelligence also makes use of facial expression. We can tell from this if
the person likes or dislikes the goods being promoted by the firm. The person’s emotion
and intention are discernible.
Pattern recognition [3] and its classification briefly shown by the benefit, greatly
from the machine learning [4] and it’s the used techniques for it. These features are as
the one of the most crucial components for every machine learning system or study of
such thing. This study examines the detection and modification of data for support vector
machine algorithms. Understanding a person’s facial expressions helps us understand
their intentions, state of mind, and emotional state. Non-verbal communication, which is
a non-verbal method of communication, allows us to decipher various face expressions
of emotion. While understanding any information, the power of the face increases. The
Facial recognition has drawn a lot of interest of the researchers in this domain that is
pattern recognition, psychology impact and the computer recognition. The useful FER
is used in a variety of contexts, including augmented reality (AR), virtual reality (VR),
education, entertainment, and human-computer interaction.
The task of displaying facial expressions on an image in accordance with several
features, that can be considered such as sadness with anger, or fear, or surprise, or with
happiness, and like it many more, majorly are known as facial expression recognition or
FER.
Detecting human facial expressions can be used in a variety of contexts, including the
protection of sensitive personal data. The primary justification for identifying someone
is security. We can recognize the person using a variety of methods, including voice
recognition, passwords, retina detection, and finger-print matching. Face expression
recognition will be used to determine greater intent. Artificial intelligence also makes
use of facial expression. We can tell from this if the person likes or dislikes the goods
being promoted by the firm. The emotion and intention are obvious.
In numerous contexts requiring personal information or security, human expression
recognition is used. It is regarded as a follow-up for detection of the face when it may
be necessary of establishment of a second layer of security by recognizing not only the
face but also the emotion. It’s used to make sure that the object (human) in front of the
camera that isn’t just as a two-dimensional model or required figure.
We can see that business promotions benefit from expression detection. The majority
of businesses rely on how customers react to their offerings and products. Based on the
user’s image or video, the artificial intelligence system captures as well as identifies the
real time emotions too. So, with it help it can also be decided whether the customers are
liking the product, given offers or not.
This paper will reveal and help us to detect the face and its time-to- time expressions
in the basis of data set that have been used as the matching of the situation.
26 A. Bhatia et al.
2 Background
A. Face detection, preprocessing of face and face recognition generally are all compo-
nents of a comprehensive recognition system of any face. Thus, it becomes necessary
so that separating the face from the pattern and extract the face region out from the
process of face detection. This will give us an idea for the subsequent face and feature
extraction. The new ascent of the face in view of the profundity of learning identification
techniques, contrasted with the conventional strategy abbreviate the time, yet gives us
precision with successfully improvement.
Various approaches that can be used for recognition of faces. There are mainly two
approaches that can be used to their cognition problem: Photometric (a view based) and
Geometric [5] (a feature based). As a part of research work many of the researchers
worked and developed algorithms that are due to their keen interest in the area of recog-
nition of face, there are three out of all which can be seen and can be well-studied in the
recognition of the faces. The used algorithms for such Facial Recognition mainly can
be categorized into majorly two approaches that can be used:
Feature Based: It completely depends on mathematical connection between the given
facial milestones or given spatial setups to these facial highlights.
View Based or stereo: It is used to determine an object’s shape from multiple images
taken under the various lighting conditions. A gradient map, comprised of an array of
surface normal, depicts the recovered object’s shape.
The popular algorithm for this process:
a) Principal Component Analysis [PCA] [6]
b) Linear Discriminate Analysis [LDA]
Facial Expression Recognition: Detection and Tracking 27
Image windows are divided into two sub classes by face detection, with one class includ-
ing faces. Although there are similarities in faces in the terms of their skin tone, color of
it with the age count and facial expression [], it is a challenging task. Different lighting
situations, geometric with image quality, partial occlusion and disguise all sort of add
to the complexity of the issue. The perfect face detector might be able to identify any of
the face in the background, with many illuminations in it.
a) FACS
Using the Facial Action Coding System, a facial moment’s number can be assigned.
The term “action unit” [Fig. 1 Gives the actual unit for each] refers to this quantity.
The effect of combining action units is a facial expression [7]. Generally, any action
unit describes the alterations that are minute on any face that are muscles of the it
also called as facial muscles. For any instance, action units define a face with smile
as 6 + 12, which just have referred to the movements of the AU12 and AU6 muscles,
producing a happy face. AU 6 raises the cheeks, and AU 12 pulls the corners of the
lips. Determining that which of the facial muscles are mostly used in which expression
using the defined coding system for any of the facial action, which depends on the
action units. Based on them, modelsthat are real-time based can be created for all.
b) Checkmarks
A mark on the face is exceptionally critical. It is used to recognize and detect faces.
Expressions are also marked with a checkmark. The 68 facial checkmark [Fig. 2]
detectors in the D lib libraries identify the checkmark’s location on the face.
28 A. Bhatia et al.
Fig. 2. Check Marks of face on which the detection process is carried away
c) Feature Descriptors
Good characteristics also aid in correctly recognize the object. For identifying any
desired image, the corners and edges of that works as identifier. Many feature detector
methods, like the Harris corner detector, are available in the Open CV libraries so
that corners and edges can be find out for any image.
These feature detections display a great deal more information, such as contours
hull and convexity. The algorithms used as feature detection tries to find out the
desired Key-points, which are generally edge points or the corners. The area that is
surrounded by the key-point is described in the feature descriptor. The description
can include anything like including of the unprocessed pixel intensities.
3 Literature Survey
3.1 Method Used for Detection
A newest technology that is known as face detection method locates and measures the
face of any human probably as a digital image. Only desired facial features are usually
picked up in this. If the digital image excludes any objects other than the human face,
such as background details, trees, and buildings, etc. The goal of face localization is to
pinpoint the face’s location. The basic feature-based and picture-based approaches are
the two techniques to identify facial features in an image. The feature base technique
seeks to match such images with the features by extracting features from the desired
image based on shape, color, grayscale edges, and many other characteristics. Image
matching is done using many methods, such as neural networks, linear subspace, PCA,
and SVM [8].
A tremendous as well as great research amount, energy and efforts from many of the
leading or can say the companies major once with topmost universities are dedicated to
this research work field.
In a series of works by Viola, Robust Real-time Object identification [10] is the
one that receives the most citations and makes face identification actually practical. By
this study, we can learn about face detection techniques and algorithms severally used
for. The articles Fast rotation invariant nature with multi-view for detection of the face
proposed a more capable as well as developed and useful for detecting multiple faces
framework, the next structure mentioned on the cascade structure improvements so have
good results, and supported real Adaboost for the first time Adaboosthas been applied
in real to object detection. An effective combination of tracking face and detection of it,
with both offline model and online model that can be found that in prescribed paper and
the tracking is done in Low Frame Rate Video: Discriminative Observers of Different
Life Spans with A Cascade Particle Filter is used.
We talked about the issues with face tracking and face detection. We will develop
real-time facial detection technologies, per the research. Except for tracking, the major
Facial Expression Recognition: Detection and Tracking 31
goal is to find the location and size of every face in the picture or video. It’s also important
to figure out how the many faces in the frame relate to one another.
4 Implementation
The picture records for the CK+ data set are in a few catalogs and sub-registries upheld
the individual and meeting number. Not every image conveys emotion. The format of
each and every file was portable networks graphic file (.png). Although they share a name
with image files, the emotion labels are located in a different directory. Using the name
of the emotion file, we created a small Java utility function to select the appropriate
image from the directory and replicate it in our final dataset folder. Additionally, we
merged the emotion and image file names. As a result, when we parse our program, we
will have the file’s emotion label [11, 12].
Taking any example, in the filename S138_001_00000786_7, S138 the main repre-
sentation is the as provided by subject number, then the session number is to be given
and is to be followed by the number of the images in the given session and finally last
bit represents the emotion the subject is for posing.
Only frontal face images were used to create the dataset. For a few images, the
lighting and illumination were different. Color was added to some images. Regardless
of the lighting conditions, all of the pictures had the same processing pipeline. Then the
name of that emotion is extracted from the.jpg file name in the RaFD database. As this
information base was considered as standard, we had a very fair count of the classes for
each of the inclination needed. The distribution between various kind of emotion classes
with depicted too are shown in the table [Table 1] (Figs. 4 and 5).
Type of Emotion Images and their number That depicted the emotion
Surprise 83
Anger 45
Happy 69
Fear 25
Contempt 18
Sadness 28
Disgust 59
the other one is as the identified image of any of the face considered. In the end, we
have decided to use a face detector with having base of the histogram so that oriented
gradients from the D library can be accessed. HoG descriptors as well as the SVM [14]
are used in such a wayso that identifying of the face in the any image can be made easy.
Pictures are changed over into the grayscale and resized (Fig. 6).
5 Conclusion
Due to the availability of Eigen Face [15] with a limited count used in the PCA trans-
form, the system works with faced detection in manual manner is done, automatic face
recognition or artificial intelligence [16, 17] based did not have a recognition accuracy
of more than 90%. This was only because the face recognition and its subsystem did
not show any invariance degrees towards the face images scale, segmented image, shift
errors or rotations. One of the system requirements that were identified was this. Car-
rying out an eye for such identification method would be as a minor expansion to that
the executed framework and it wouldn’t need lots of extra examination to be done. The
deformable template and the Principal Component Analysis strategies that are reflected
very well in the results during of all other implemented systems, which generally pro-
duces commendable results. There are improved methods, for example, acknowledging
iris or retina as well as the face acknowledgment that involves the warm rangesto clients
or those who have access and can check applications since these requirements are of
an extremely serious level of its exactness. We can explore different avenues regarding
facial activity coding framework or component descriptors as elements or a mix of the
two of them. Applications, for example, sluggishness recognition among drivers can be
created utilizing highlight choice and flowing various calculations together. Calculations
like strategic relapse, direct segregate investigation and irregular timberland classifier
can be adjusted to accomplish great exactness and results. Additionally, the correctness
of the model can be defined using metrics like the scoring recalling cross-validation and
f1 scores, Manets [18] and these such metric results can be used to improve the model.
References
1. Verma, P., Bhardwaj, T., Bhatia, A., Mursleen, M.: Sentiment analysis “using SVM, KNN
and SVM with PCA”. In: Bhardwaj, T., Upadhyay, H., Sharma, T.K., Fernandes, S.L. (eds)
Artificial Intelligence in Cyber Security: Theories and Applications. Intelligent Systems Ref-
erence Library, vol 240, pp. 35–53. Springer, Cham (2023). https://doi.org/10.1007/978-3-
031-28581-3_5
2. Ebine, H., Shiga, Y., Ikeda, M., Nakamura, O.: The recognition of facial expressions with
automatic detection of the reference face. In: 2000 Canadian Conference on Electrical and
Computer Engineering. Conference Proceedings. Navigating to a New Era, vol.2, pp. 1091–
1099. Halifax, NS (2000). (Cat. No.00TH8492)
3. Bhatia, A., et al.: Analysis of pattern recognition (text mining) with web crawler. Int. Trans.
Appl. Sci. 3(3), 435–450 (2011)
4. Kumar, M., Ali Khan, S., Bhatia, A., Sharma, V., Jain, P.L Machine learning algorithms:
a conceptual review. In: 2023 1st International Conference on Intelligent Computing and
Research Trends (ICRT), pp. 1–7. Roorkee, India (2023).https://doi.org/10.1109/ICRT57042.
2023.10146678
Facial Expression Recognition: Detection and Tracking 35
5. Loconsole, C., Miranda, C.R., Augusto, G., Frisoli, A., Orvalho, V.: Real-time emotion
recognition novel method for geometrical facial features extraction. In: 2014 International
Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal (2014)
6. Neerja and E. Walia, “Face Recognition Using Improved Fast PCA Algorithm,” 2008
Congress on Image and Signal Processing, Sanya, Hainan (2008)
7. Jiang, X.: A facial expression recognition model based on HMM. In: Proceedings of
2011 International Conference on Electronic & Mechanical Engineering and Information
Technology, Harbin, Heilongjiang, China (2011)
8. Rajesh, K.M., Naveenkumar, M.: A robust method for face recognition and face emotion
detection system using support vector machines. In: 2016 International Conference on Elec-
trical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT),
Mysuru (2016)
9. Swinkels, W., Claesen, L., Xiao, F., Shen, H.: SVM point-based real-time emotion detection.
In: 2017 IEEE Conference on Dependable and Secure Computing, Taipei (2017)
10. Saragih, J.M., Lucey, S., Cohn, J.F.: Real-time avatar animation from a single image. Face
and Gesture 2011, Santa Barbara, CA, USA (2011)
11. Dahmane, M., Meunier, J.: Emotion recognition using dynamic grid-based HoG features.
Face and Gesture 2011, Santa Barbara, CA (2011)
12. Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression
trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus,
OH (2014)
13. Bhatia, A., Jain, P., Verma, P., Gupta, G., Arya, D.: Data Mining: a process of extracting
patterns Webology (2019). (ISSN: 1735–188X)
14. Kaya, G.T.: A hybrid model for classification of remote sensing images with linear SVM and
support vector selection and adaptation. IEEE J. Sel. Top. Appl. Earth Observations Remote
Sens. 6(4), 1988–1997 (2013)
15. Le Ngo, A.C., Oh, Y.H., Phan, R.C.W., See, J.: Eulerian emotion magnification for subtle
expression recognition. In: 2016 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), Shanghai (2016)
16. Kumar, M., Ali Khan, S., Bhatia, A., Sharma, V., Jain, P.: A conceptual introduction of
machine learning algorithms. In: 2023 1st International Conference on Intelligent Computing
and Research Trends (ICRT), pp. 1–7. Roorkee, India (2023). https://doi.org/10.1109/ICR
T57042.2023.10146676
17. Kumar, A., Bhatia, A., Kashyap, A., Kumar, M.: LSTM network: a deep learning approach
and applications. In Advanced Applications of NLP and Deep Learning in Social Media Data,
pp. 130–150. IGI Global (2023)
18. Bhatia, A., Kumar, A., Jain, A., Kumar, A., Verma, C., Illes, Z.: Networked control system
with MANET communication and AODV routing Heliyon (2022)
Analysis and Implementation of Driver
Drowsiness, Distraction, and Detection System
1 Introduction
Driver drowsiness and distraction is the main problem in the daily life transportation
system. It can be analysed using many tools and techniques. This type of system may be
help to prodect life of many people. In addition to detecting of sleepiness this technique
can be used to detect facial expression, open or closed mouth, movement of head. This
system can also be used at factories to alert workers.This system can also be used for
railway drivers as well as in air-lines. But in this work, simple system has been analysed
and implemented to protect life of peoples.
In this system, input images are captured by a camera. We use an AI library that
includes dlib and numpy. This model is trained to get spot 68 facial-landmarks on the
face. That allows us to detect drowsiness and alert the driver to stay active while driving.
There are various other systems used to detect drowsiness, such as heart rate, pulse
rate, EEG, and ECG. However, these systems require direct physical connection. In our
project, we employ this image processing system to monitor driver’s condition, including
aspects like yawning, eye blink, and facial expressions, using a camera. Additionally,
our project incorporates Arduino, Python, an AI library, an alarm, and an LCD display
to alert the driver.
Drowsiness and distraction are two underlying causes of driving accidents that add to
the number of road fatalities each year. Many different techniques have been developed
over the years to identify drowsiness, but image processing techniques are more precise,
secure, and time-efficient than other approaches. The primary objective of this project
is to detect driver’s drowsiness and to issue warnings to prevent accidents.
The detection of drowsiness levels in drivers plays a crucial role in decreasing the
occurrence of accidents. Recent studies and reports indicate that vehicle accidents result
in the death or injury of 20 to 50 million people worldwide. According to the NSF of
the United States, 54 percentge of drivers have operated vehicles when we feel drowsy,
and 28 percentage have fallen sleep while driving [1].
Having reviewed these reports and statistics, it becomes evident that assessing
driver’s drowsiness levels is of utmost importance in diminishing the frequency of road
accidents. To identify drowsiness levels, the system examines parameters associated with
drowsiness, including the eyes and their alterations. Visual signs of sleepiness can be
identified by capturing the driver’s image and employing processing. PERCLOS (Per-
centage of Eye Closure) is utilized to assess drowsiness. The IR illuminator is employed
to measure PERCLOS, the short time of this cycle period eye close & the freq. of
eye-blinks regulrarly [2].
The Driver Drowsiness Detection System takes images as input, analyzes them
through eye blinking, and detects the driver’s state, such as sleepy, drowsy, or active. It
alerts the driver through an alarm/buzzer and displays messages on an LCD as output,
such as “Please wake up” for a sleepy state and “All Ok” for an Active state.
Numerous existing drowsiness detection systems rely on physiological signals and facial
characteristics. While physiological signals such as EEG(Electro-Encephalography),
EOG(Electro-Oculography), ECG(Electro-Cardiography), and body temperature are
effective, they can potentially harm human health. Hence, there is a requirement to
create a drowsiness detection system that is both efficient and safe for human utilization.
Following are some existing systems in detail:-
proposed as a method by Hayawi and Waleed to detect sleepiness (HRV). Song et al.
introduced a sys. For detecting driver fatigue, which involves monitoring the activity of
eye muscles using E.M.G. sensors. In a similar way to the observation of eyelid closure
and muscle movement via E.M.G. sensor input signals, Artanto et al. and Ma et al.
developed a system that utilizes ESP8266 to transmit drowsiness data internally [3].
Baheti et al. [6] utilized the same dataset and enhanced the V.G.G.16 N/W to achieve
a classification accuracy in terms of output was 96.31% when their method was applied
in real-time scenarios.
Kose et al. [7] enhanced the classification accuracy to 99.10% for the 10 classes in
real-time processing with applications. Furthermore, by integrating red-green-blue and
optical flow data with their system out performed other systems on the A.U.C. Distracted
Driver and Brain4Cars datasets.
Chawan et al. [8] introduced a method for detecting distracted driving using three
C.N.N. models:- V.G.G.-16, V.G.G.-19, and InceptionV3 with applications. This method
resulted in a log loss of 0.79.
Majdi et al. [9] introduced the DriveNet, a supervised learning method for determing
distracted driving technique. This approach attained an accuracy of 94% by combining
C.N.N. and random forest techniques to classify representative instances of distracted
driving.
Moslemi et al. [10] employed a 3D C.N.N. along with optical flow that considered
temporal information to enhance the detection of distracted driving, resulting in an
accuracy of 95%.
Anber et al. [11] introduced a non-invasive algorithm to driver fatigue detection,
relying on head position and mouth movement features. This method incorporated two
pre-trained AlexNet C.N.N.-based methodss, achieving a respectable level of detection
accuracy. However, it required testing on a dataset reflecting real driving conditions for
practical application.
From the above all literature review, it has been found that there is gap in accuracy
of Eye detection and Drowsiness Accuracy. This gap has been indenified and minimized
using proposed technique.
A comparative analysis of the literature survey has been presented in Table 1 and
Table 2 with their parameters, algorithm, and accuracy (Fig. 1).
The architecture of this system has Arduino UNO, LED, LCD, and a Buzzer/alarm. LED
and circuit of LCD and Buzzer are connected to Arduino. Here we used Python code to
program our Arduino. We use OpenCV/camera to detect the image. The camera captures
the image, and if sleep is detected, the alarm buzzes and an alert message appears on the
LCD, as shown in Fig. 1 (Fig. 2).
Analysis and Implementation of Driver Drowsiness 41
In the initial step that is START, the software program begins its execution. After that
the software imports the necessary Python Libraries and Modules for Image Processing
and Drowsiness Detection.The python libraries are numpy, cv2, dlib, etc. The next step is
about setting up initial configurations and variables for the drowsiness detection system
[12].Then after that the system reads images or frames from a source such as a camera or
video stream within a loop. Then initialize a facial landmark detector, which identifies
specific points on the face, including eye corners and the mouth, allowing us to calculate
the Eye Aspect Ratio (EAR). Then next step is computation of EAR, which typically
involves measuring the ratio of distances between certain landmarks.If the computed
EAR value is greater than a threshold (0.21), it suggests that the driver’s eyes are relatively
open, and they are not exhibiting drowsiness (Fig. 3).
42 G. S. Patel et al.
In this system, we utilized the Dlib library’s 68 facial landmark detectors and face
detectors. The 68 facial detector identifies the human face and associated points, allowing
us to subsequently assess whether the eyes are open or closed. Leveraging Dlib for this
task is advantageous as it yields improved accuracy by providing 68 landmark points
[13].
Here we work on 3 stages:
1) Active
2) Drowsy
3) Sleepy
We importing libraries:- dlib, numpy, and open cv.
We used an in-built function of dlib which acts as a frontal face detector.
The Dlib shape detector has 68 landmarks on the face.6 on each eye, 9 on the nose,
5 on each eyebrow, 20 on the lips, and 17 on the circumference of the face.
We use the function compute which measures the difference between 2 points A and
B and the next blink function [14].
Analysis and Implementation of Driver Drowsiness 43
3 Algorithm
In this system, we used the Eye Aspect Ratio (EAR) algorithm where EAR stands for
the Eye Aspect Ratio.
In facial landmarks, each eye has 6 points where 2 are at the corners of eyebrows are
long points and the other are short points.
Ratio = (Addition of both long points) / (2* short distance)
The above ratio is called EAR (Eye Aspect Ratio)
If EAR > 0.25 the return 2
If EAR > 0.21 & EAR < 0.25 then return 1
If EAR < 0.21 return 0
0,1, and 2 determine the blinking of an eye
If left_blink = = 0 right_blink = = 0
The driver is sleeping
If left_blink or right_blink is 1
44 G. S. Patel et al.
4 Result
Software used:- Python, AI libraries
Hardware:- Alarm/Buzzer, Arduino UNO, LED, LCD Display, Camera
68 landmarks were created on the driver’s face. Software is tested and it determines
the condition of the driver whether the driver is in a sleepy, drowsy active state.
When the driver is in a sleepy or drowsy state, the system will activate a buzzer,
illuminate an LED, and display a safety message on the LCD screen.
We made use of the laptop’s webcam, and the buzzer to produce alert sound output
for waking up the driver when drowsiness is detected.
Each volunteer participating in the test will be asked to squint their eyes multiple
times and simulate drowsiness on several occasions during the testing process [15, 16].
Eye Detection Accuracy = (Total no. of times eyes detected) / ( (Total no. of times
eyes detected) + (Total no. of times missed/not detected))
Drowsiness Detection Accuracy = (Total number of times alarm sounds) / ((Total
number of times alarm sounds
+ (Total number of times alarm not sound))
Comparison of proposed and existing work has been shown in the Table 3. It shows
that proposed work is more accurate as compare to other existing work. From the Table 3,
it has been concluded that eye detection accuracy and drowsiness accuracy is better than
all other reference taken in this work.
typically found only in high-end vehicles, this eye-detection driver security and safety
can also be implemented in regular cars. The system operates effectively even when the
driver is wearing glasses and in low-light conditions. Ultimately, the proposed system
offers superior protection and accuracy compared to other existing systems.
In addition to detecting of sleepiness this technique can be used to detect facial
expression, open or closed mouth, movement of head. This system can also be used at
factories to alert workers.This system can also be used for railway drivers as well as in
air-lines.
References
1. Pauly, L., Sankar, D.: Detection of drowsiness based on HOG features and SVM classi-
fiers. In: IEEE International Conference on Research in Computational Intelligence and
Communication Networks (ICRCICN), pp. 8–14 (2015)
2. Ihaddadene, N.: Drowsy driver detection system using eye blink patterns. machine and web
intelligence(ICMWI). In: International Conference on IEE, pp. 20–25 (2010)
3. Zhang, W., Cheng, B., Lin, Y.: Driver drowsiness recognition based on computer vision
technology. In: IEEE, pp. 67–70 (2012)
4. Sabet, M., Zoroofi, R.A., Sadeghniiat-Haghighi, K., Sabbaghian, M.: A new system for driver
drowsiness and distraction detection. In: 11th IEEE Joint Conference on Information Science,
pp. 120–125 (2012)
5. Sandberg, D.: conducted an analysis and optimization of systems for detecting sleepiness in
drivers. Chalmers University in Goteborg, Sweden in, pp 80–87 (2018)
6. Malla, A., Davidson, P., Bones, P., Green, R., Jones, R.: Automated video-based measurement
of eye closure for detecting behavioral microsleep. In: 32nd Annual International Conference
of the IEEE, Buenos Aires, Argentina, pp. 10–14 (2010)
7. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In:
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, pp. 55–60 (2001)
8. Hong, T., Qin, H., Sun, Q.: An improved real-time eye state identification system in driver
drowsiness detection. In: Proceedings of the IEEE International Conference on Control and
Automation, Guangzhou, CHINA, pp. 1–5 2007
9. Nixon, M.S., Aguado, A.S.: Feature Extraction and Image Processing. 2nd ed., Jordan Hill,
Oxford OX2 8DP, UK (2008)
10. Bhowmick, B., Kumar, C.: Detection and classification of eye state in IR camera for driver
drowsiness identification. In: Proceeding of the IEEE International Conference on Signal and
Image Processing Applications, pp. 40–44 (2009)
11. Rezaee, K., et al.: Real-time intelligent alarm system of driver fatigue based on video
sequences. In: Robotics and Mechatronics (ICRoM), First RSI/ISM IEEE International
Conference on. pp. 80–83 (2013)
12. Du, Y., et al.: Driver fatigue detection based on eye state analysis. In: Proceedings of the 11th
Joint Conference on Information Sciences, pp.1–6 (2008)
13. Choi, I.H., Hong, S.K., Kim, Y.G..: Real-time categorization of driver’s gaze zone using the
deep learning techniques. In: 2016 IEEE International Conference on Big Data and Smart
Computing (BigComp), pp. 1–25 (2016)
14. Bhop, R.A.: Computer vision based drowsiness detection for motorized vehicles with web
push notifications. In: ICICCS, pp. 17–19 (2022)
46 G. S. Patel et al.
15. Ananthi, S., Sathya, R., Vaidehi, K., Vijaya, G.: Driver drowsiness detection using image
processing and I-ear techniques. In: IEEE ICICCS, pp. 45–50 (2023)
16. Hossain, M.L., Hasan, M.S., Safwan Ahmed, K.M.A: Developing an image processing based
real-time driver drowsiness detection system. In: IJSRP, vol. 13, no. 4, pp. 1–7 (2023)
Object Detection and Depth Estimation
Using Deep Learning
1 Introduction
Object detection and image recognition technology have become indispensable
components of modern autonomous systems, revolutionizing the automotive,
drone, and robotics industries. In self-driving cars, detection of an object in real-
time is crucial for identifying pedestrians, cyclists, vehicles, and other obstacles
on the road, enabling the vehicle to make informed decisions and navigate safely.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
D. Garg et al. (Eds.): IACC 2023, CCIS 2053, pp. 47–56, 2024.
https://doi.org/10.1007/978-3-031-56700-1_5
48 R. Katiyar et al.
requirments of real time object detection [6],. In the recent years [7], YOLO has
gained popularity and is been continuously improvising, [8], YOLO V4 is been
used in the current project.
Camera: In the execution of the project, a camera serves as the primary instru-
ment for capturing intricate 2D images of the targeted scene. Specifically, a
high-resolution 64-megapixel mobile camera is employed. The camera holds a
dual purpose in the project’s workflow: firstly, it facilitates the creation of a
curated dataset crucial for model training, ensuring the development of an accu-
rate and robust system. Subsequently, during real-time testing and deployment,
the camera is employed to procure live feeds of the scene, which are promptly
relayed to the computer for instantaneous processing and analysis.
iVCam Software: The project harnesses the capabilities of the iVCam soft-
ware, a crucial tool that bridges the gap between the camera and the display
50 R. Katiyar et al.
As a part of data set generation , images were captured, the dataset encompassed
within a span of 3 m to 15 m, featuring images captured at consistent intervals
of 1 m. Once the dataset generation process is complete, the next step involves
the implementation of YOLO to generate bounding boxes for images. A list of
object classes (like “person,” “car,” etc.) is loaded. These are the types of objects
the model can recognize as shown in Fig. 2 (Fig. 3).
The diagonal length of the bounding box (in pixels) is noted down. This
systematic procedure is applied to each image in the dataset, resulting in the
compilation of a comprehensive Table 1 showcasing the calculated diagonal pixel
lengths for four different objects car, motorbike, person, bus at equal interval
of 1 m. The comprehensive dataset contains 15 data samples for each object as
shown in Table 1, which is used to obtain linear and polynomial equations for
estimating distances.
Distance (in m) Car (in pixels) Motorbike (in pixels) Person(in pixels) Bus(in pixels)
3 336 189 260 590
4 274 165 210 403
5 216 126 160 310
6 182 98 140 267
7 152 86 128 230
8 136 76 110 200
9 127 67 100 175
10 110 60 86 165
11 102 55 75 153
12 93 48 71 145
13 85 46 65 137
14 81 43 60 130
15 74 40 52 125
Object Detection and Depth Estimation Using Deep Learning 53
From the data tabulated in Table 1, it has been observed that as the distance
between object and camera is increasing the pixel values are decreasing
distance and estimated distance using linear regression model for car is tabulated
in Table 2. Similar observation is performed for all the four objects.
The results from Table 2 indicate a highest recorded error of 72%, with an
average error rate of 18.98% and an achieved accuracy of 81%. The comparison
between the actual distance and estimated distance for car using polynomial
regression model is tabulated in Table 3.
Actual distance (in m) Estimated distance (in m) Error (in m) Error (in %)
3 2.93 −0.07 2.33
4 4.14 0.14 3.50
5 5.00 0 0
6 6.01 0.01 0.18
7 6.77 −0.23 3.28
8 7.89 −0.11 1.37
9 9.28 0.28 3.11
10 9.90 −0.1 1.00
11 11.08 0.08 0.80
12 12.01 0.01 0.11
13 12.86 −0.13 1.00
14 14.21 0.21 1.50
15 14.89 −0.10 0.7
Object Detection and Depth Estimation Using Deep Learning 55
The results from Table 3 indicate a highest recorded error of 3.5%, with an
average error rate of 1.47% and an achieved accuracy of 98.53%.
References
1. Nagarajan, A., Gopinath, M.P.: Hybrid optimization-enabled deep learning for
indoor object detection and distance estimation to assist visually impaired persons.
Adv. Eng. Softw. 176, 103362 (2023). ISSN 0965–9978
2. Kumar, G.A., Lee, J.H., Hwang, J., Park, J., Youn, S.H., Kwon, S.: Lidar and cam-
era fusion approach for object distance estimation in self-driving vehicles. Symme-
try 12(2), 324 (2020)
3. Usmankhujaev, S., Baydadaev, S., Kwon, J.W.: Accurate 3D to 2D Object Distance
Estimation from the Mapped Point Cloud Data. Sensors 23, 2103 (2023)
4. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object
detection with region proposal networks. In: Advances in Neural Information Pro-
cessing Systems, vol. 28 (2015)
5. Girshick, R.: Fast r-CNN. In: Proceedings of the IEEE International Conference
on Computer Vision, pp. 1440–1448 (2015)
6. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe,
N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham
(2016). https://doi.org/10.1007/978-3-319-46448-0_2
7. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified,
real-time object detection. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 779–788 (2016)
8. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271
(2017)
9. Nugraha, B.T., Su, S.-F.: Towards self-driving car using convolutional neural net-
work and road lane detector. In: 2nd International Conference on Automation, Cog-
nitive Science, Optics, Micro Electro-mechanical System, and Information Tech-
nology (ICACOMIT), pp. 65–69. IEEE (2017)
10. Cai, Y., Luan, T., Gao, H., et al.: Yolov4-5d: an effective and efficient object
detector for autonomous driving. IEEE Trans. Instrum. Meas. 70, 1–13 (2021)
11. Zaheer, A., Rashid, M., Riaz, M.A., Khan, S.: Single-view reconstruction using
orthogonal line-pairs. Comput. Vis. Image Underst. 172, 107–123 (2018)
12. Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure
recovery. In: IEEE Conference on Computer Vision and Pattern Recognition, pp.
2136–2143. IEEE (2009)
13. Barinova, O., Konushin, V., Yakubenko, A., Lee, K.C., Lim, H., Konushin, A.:
Fast automatic single-view 3-d reconstruction of urban scenes. In: Forsyth, D.,
Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 100–113. Springer,
Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_8
14. Criminisi, A., Reid, I., Zisserman, A.: Single view metrology. Int. J. Comput. Vision
40, 123–148 (2000)
Optimizing Biomass Forecasting and Supply
Chain: An Integrated Modelling Approach
1 Introduction
The growing population and rapid improvements in technology have resulted in an
increased need for energy. The escalating energy requirements and the consequential
rise in greenhouse gas emissions have necessitated a heightened use of renewable and
environmentally friendly energy sources. In order to address the growing need for sus-
tainability, it is imperative to replace fossil fuels with alternative resources and technol-
ogy. Biofuels have the potential to significantly impact the current situation. However,
it is not been implemented on a large scale. Firstly, it should be noted that the imple-
mentation of alternative energy sources, such as Biofuels, entails significant costs and
necessitates the establishment of ambitious government objectives in order to achieve
parity with conventional fuel sources like diesel, petrol, and natural gas. Despite this,
there are various issues that still exist. Currently, it is estimated that a quantity of 100
metric tons of crude oil has the capacity to provide around 75 metric tons of gasoline and
diesel fuel. In order to produce an equivalent quantity of 75 tons of biofuel, a biorefinery
requires a minimum biomass input of 375 tons. In contrast to crude oil, it is not feasible
to obtain 375 tons of biomass from a singular site [1]. Also, to set up a Biorefinery
in a region a concrete understanding of the region’s temporal features such as biomass
production and spatial distribution is required. Hence, the entire supply chain must be
designed such that it incurs less costs and reduces the overall carbon footprint.
Figure 1 shows an overview of the Biomass supply chain considered in this paper.
Harvesting sites are the starting point of the chain. Biomass produced at each site is then
sent to Depots for densification and pelletization. Pellets from these Depots are then
sent to Biorefineries for final processing. The solution which we propose in this paper
is very robust and scalable in nature. The methodology is heavily inspired by the divide
and conquer paradigm, making it scalable to larger problems. It is designed considering
generality in implementation, making it applicable to regions other than Gujarat with
negligible changes in formulation and structure of solution. Solution can be broadly
divided into two parts: Time Series Forecasting and Supply Chain Optimization.
2 Literature Review
This paper utilizes various statistical, mathematical, and ML techniques to solve the
Biomass supply chain problem. The algorithms and techniques studied and tested were
Time series forecasting, Clustering, AutoML, and Linear programming. Following is a
brief overview of these techniques and algorithms.
Optimizing Biomass Forecasting and Supply Chain 59
2.1 Clustering
Clustering refers to the process of separating data into a collection of clusters. Clusters
are essentially a collection of similar objects which are separated from other collections
that are different from each other [2]. There are several algorithms for clustering which
are divided into distinct types. The types of algorithms that are of interest are Hierarchical
and Partitioning algorithms. Partitioning algorithms are popular for recognizing patterns
in data [3]. K means is one of the partitioning algorithms. Time series clustering is
an interesting application of clustering algorithms to find clusters in time series data.
Clustering of time-series data is mostly utilized for discovery of interesting patterns in
time-series datasets. Time series clustering is a useful method to find out patterns within
data which make it easier to model & extract meaningful conclusions from it [4].
3 Dataset
The data used in this paper is obtained from Shell.ai’s Agricultural Waste Challenge
2023 [1]. The dataset comprises two tabular files: one containing historical information
on annual biomass production, and the other detailing the distances between locations in
Gujarat. The state of Gujarat is divided into 2418 distinct locations each specified by a
pair of latitude and longitude. As shown in Fig. 2 each dot represents a pair of latitude &
longitude.
Every location has its own distinct annual Biomass production. Table 1 represents
the Biomass history which contains the biomass production from the years 2010 to 2017
for all 2418 locations, resulting in formation of distinct time series with 8 lags for each
location.
Table 2 represents a standard distance matrix for all the 2418 locations. Thing to note
here is that the matrix is not symmetric due to U-turns, one-ways etc. that may result
into different distances for ‘to’ and ‘from’ journey between source and destination.
4 Methodology
There are two parts to this problem, Forecasting and Optimization. To build a robust
supply chain for the future, temporal changes need to be considered. This constitutes the
first part of the solution. The second part is optimal placement of refineries and depots
i.e., Supply Chain Optimization. Further discussion would be on the proposed solution
for the problem.
together those time series which exhibited similar patterns. The clustering utilized K-
Means with Euclidean distance as the metric for cluster assignment and barycenter
computation [22]. This process aimed to mitigate underfitting by reducing the diverse
dynamics within a cluster. We opted for 8 clusters, striking a balance between cluster
similarity, the number of clusters, and cluster size considerations directly impacting
training time. Statistical parameters such as Mean, Standard Deviation, and differences
between 2015–2016 and 2016–17 were computed and treated as features during cluster-
ing to improve cluster quality. Figure 3 illustrates the average biomass production per
year (2010–2017) for all samples within each cluster.
Fig. 3. Average biomass production per year (2010–2017) for each cluster.
Following the clustering of time series into eight distinct clusters, each cluster under-
went training and hyperparameter optimization using AG-TS to predict biomass produc-
tion for the upcoming year (2018). Only temporal features were considered for training,
with features related to location (latitude and longitude) excluded due to their observed
lack of significant impact on predictions, as indicated by our analysis. AutoGluon pro-
vides a rich set of predefined presets that is high level configuration for fitting a model. In
this paper, data was trained using the “best_quality” preset with Mean Absolute Scaled
Error (MASE) as evaluation metric. As no specific validation set was provided, Auto-
Gluon holds the window with last prediction length time steps of each time series as
a validation set for cross validation [22]. After forecasting for 2018, above steps are
repeated for forecasting the values for 2019, considering the 2018 forecasts as given
and including them in training data. For every cluster, Fig. 4 shows average biomass
production per year (2010–2018) for all samples within that cluster.
Table 3 specifies the parameters/arguments passed into AutoGluon for time series
prediction. These parameters stay the same when predicting for both years - 2018 &
2019.
Optimizing Biomass Forecasting and Supply Chain 63
Fig. 4. Average biomass production per year (2010–2018) for each cluster.
Argument Value
enable_ensemble True
evaluation_metric MASE
exculed_model_types None
hyperparameters Best_quality
num_val_windows 1
prediction_length 1
random_seed None
time_limit None
verbosity 2
Objective. Objective comprises of minimizing the total cost which consists of two
components.
1. Transportation Cost
The cost incurred in transportation of Biomass yield from harvesting sites to depots
and pellets from depots to refineries represented by Eq. (1). Minimizing this cost will
effectively reduce carbon emission as well.
Costtransport = Dij × Bij + Djk × Pjk (1)
i,j j,k
2. Underutilization Cost
Cost incurred due to underutilization of the depot and refineries available resources
which is represented by Eq. (2).
Costunderutilization = Capdepot − Bij + Caprefinery − Pjk
j i k j
(2)
Total cost is the summation of above two costs represented by Eq. (3).
Constraints.
C1 All values must be greater than or equal to zero
C2 Number of depots should be less than or equal to 25.
C3 Number of refineries should be less than or equal to 5.
C4 Total Biomass reaching each depot must be less than or equal to its capacity (20000)
C5 Total Pellets reaching each refinery must be less than or equal to its capacity (105 )
C6 Total Biomass transported from each harvesting site must be less than or equal to
the forecasted value for that location.
C7 At least 80% of total forecasted Biomass must be utilized.
C8 Harvesting site and Refinery cannot be at same location
C9 Harvesting site and Depot cannot be at same location
C10 Total amount of biomass entering each preprocessing depot is equal to the total
amount of pellets exiting that depot
Solution. This optimization problem was formulated and solved using Mixed Integer
Linear Programming. Gurobi’s python library was used for solving the given problem
[19]. MILP becomes computationally very expensive with increasing variables and con-
straints, especially binary variables. Thus, it is important to formulate efficient linear
constraints and minimize the number of variables to get optimal solutions in a feasible
amount of time. Considering all the 2418 locations at once for optimization becomes
computationally expensive in terms of time as well as space. For instance, the vari-
able Bij (Biomass transported from i to j) can take 2418 x 2418 (> 5 million !) values
Optimizing Biomass Forecasting and Supply Chain 65
alone. According to our experiments, even 64GB of RAM wasn’t sufficient enough if
this approach was followed, even with minimal binary variables and optimized linear
constraints. The pre-solve stage of the Gurobi solver alone would consume a substantial
amount of RAM at this scale. [20]. To tackle this issue, the approach employed a “divide
and conquer” strategy, resolving the optimization problem efficiently and producing
excellent results within a feasible timeframe. Figure 5 shows the high-level workflow of
this approach.
The devised approach named “regional optimization” is a much better and efficient
approach to solve the problem. The entire map of Gujarat is divided into distinct smaller
regions and each region is optimized independently. As solvers like Gurobi are RAM
extensive solvers and do not use GPU, there is very less parallelization and memory
utilization. So, to effectively use memory and reduce the run time, all clusters were
solved at once using local and Google Colab instances. This approach speeds up the
entire process with parallelization and thus, making the solution more scalable.
Agglomerative Clustering. Using Agglomerative clustering with linkage – complete, a
total 9 clusters were formed considering the size of each cluster (directly proportional to
the complexity of optimization) and total Biomass production from each cluster, for the
complete map. Cost between any (i, j) pair was calculated as - Dij × Biomassi , and on
the basis of this cost matrix, clustering was done. This drastically reduced the variable
space from 2418 × 2418 to average of approximately 300 × 300. Figure 6 visualizes
the final clusters.
Placing Depots. After clustering, depots are placed with respect to each cluster. Biomass
for each location used for placing depots and refineries is the minimum biomass
production among 2018 & 2019 for that location. Following are the formulations
required.
• Objective
Minimize the cost of transportation of biomass from harvesting sites to depots Eq. (1). &
underutilization cost of depots Eq. (2). B&D represent biomass supply and distance
matrix respectively. Equation (4) represents the objective function to be minimized for
optimal placement of depots.
Minimize B∗D+ 20000 − B.sum(axis = 0)[i] (4)
i
• Subject to
• Objective
Minimizing the cost of transportation of pellets from Depots to refineries Eq. (1) &
underutilization cost of refineries Eq. (2). Equation (12) represents the objective function
to be minimized for optimal placement of refineries. P represents the pellet supply matrix.
Minimize P∗D+ 100000 − P.sum(axis = 0)[k] (12)
k
• Subject to
Ymask
j = max P j , 0
∀j (13)
Y j = min Ymask j , 1
Z [k] = max(P[:, k], 0)
∀k mask (14)
Z[k] = min(Zmask [k], 1)
∀jP.sum(axis = 1) j ≤ 20000{from C4} (15)
Optimization for 2018 & 2019. After acquiring the shortlisted locations for depots
and refineries, supply chain is designed for both the years separately using the same
formulations which are given below.
68 S. Oswal et al.
• Objective
exp1 = B∗D+ P∗D (19)
exp2 = 20000 − B.sum(axis = 0) j + 100000 − P.sum(axis = 0)[k] (20)
j k
Equations (19) & (20) represent transportation and underutilization costs respec-
tively. Equation (21) represents the objective function to be minimized for optimal supply
chain.
• Subject to
The equations corresponding to constraints C4, C5, C6 & C7 can be written exactly
in same way as written in previous sections. Respective examples for each are Eq. (8),
Eq. (16), Eq. (5) & Eq. (10). Also, the equations for assignment of a location as depot,
refinery or harvesting site remains the same as well. Respective examples are Eq. (6),
Eq. (14), & Eq. (7). The remaining required formulations are given below.
Z.sum() = 3 (22)
Y .sum() = 15 (23)
Number of refineries and depots are chosen by relaxing the comparison constraint in
this approach as shown in Eq. (22) & Eq. (23). This relaxation is done considering the
weightage of underutilization cost over transportation cost as well as the time is reduced
drastically as the number of combinations for choosing a depot or refinery location
reduces.
∀jB.sum(axis = 0) j = P.sum(axis = 1) j {from C10} (24)
Equation (24) ensures that C10 is satisfied. In Eq. (25) indices are the shortlisted
depots and refinery locations. In above formulations (i, j, k) will be (2418, 19, 5).
Table 4. Cross Validation MASE for 2018 Table 5. Cross Validation MASE for
2019
Cluster No MASE
Cluster No MASE
1 0.534
2 0.6654 1 0.1747
3 0.8170 2 0.0780
4 0.5103 3 0.1409
5 0.9459 4 0.1275
6 0.8302 5 0.1220
7 1.2670 6 0.1044
8 0.9138 7 0.1157
8 0.1938
Table 6 shows the total transportation as well as underutilization cost for both years.
The costs are calculated according to Eq. (1), (2) & (3).
train it using AutoGluon Tabular [23]. Due to scarcity of lags in each time series, each
cluster was further divided into four sub time series using a window slicing method
with a window of 5 [24]. Each of the 5 lags represents the local patterns of the original
time series. The last lag is considered as the target variable and the first 4 as features.
AutoGluon Tabular was used to train on this model. Intuition behind this process was to
convert forecasting problem into a supervised setting to learn the local patterns caused
by past 4 years and generate enough data for it to train upon, while clustering handles
homogeneity in trends of a given cluster for better fitting.
6 Conclusion
This paper introduces a versatile and efficient methodology for addressing complex
supply chain optimization problems, exemplified by the Biomass supply chain case.
The approach combines mathematical optimization and heuristic techniques to deliver
high-quality solutions efficiently. Furthermore, it underscores the value of AutoML in
Time Series forecasting and its potential in solving real-world challenges. While this
study focuses on a generalized framework without region-specific features like rainfall
for forecasting, it acknowledges the scope for further enhancements, as discussed in
Sect. 5.2. Moreover, the heuristic part of the solution can be strengthened further as
mentioned in Sect. 5.1.
In practical terms, this research holds profound implications for the energy sector.
The optimized Biomass supply chain design paves the way for regions to transition
effectively from fossil fuels to sustainable biofuels, curbing carbon emissions and trans-
portation costs while promoting eco-friendly energy production. Ultimately, this research
contributes not only to efficient supply chain management but also to a more sustainable
and eco-conscious energy future for us all.
Optimizing Biomass Forecasting and Supply Chain 71
References
1. Shell.ai Hackathon for Sustainable and Affordable Energy. https://www.shell.com/ene
rgy-and-innovation/digitalisation/digital-and-ai-competitions/shell-ai-hackathon-for-sustai
nable-and-affordable-energy.html
2. Madhulatha T.S.: An overview on clustering methods. arXiv preprint arXiv:12051117
3. Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern
Anal. Mach. Intell. 22, 4–37 (2000)
4. Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering–a decade review. Inf.
Syst. 53, 16–38 (2015)
5. Hyndman, R.J., Athanasopoulos, G.: Forecasting: principles and practice. OTexts (2018)
6. Benidis, K., Rangapuram, S.S., Flunkert, V., et al.: Deep learning for time series forecasting:
tutorial and literature survey. ACM Comput. Surv. 55, 1–36 (2022)
7. Taylor, S.J., Letham, B.: Forecasting at scale. Am. Stat. 72, 37–45 (2018)
8. Paldino, G.M., De Stefani, J., De Caro, F., Bontempi, G.: Does automl outperform naive
forecasting? Eng. Proc. 5, 36 (2021)
9. Jin, H., Song, Q., Hu, X.: Auto-keras: an efficient neural architecture search system. In:
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &
Data Mining, pp 1946–1956 (2019)
10. LeDell, E., Poirier, S.: H2o automl: Scalable automatic machine learning. In: Proceedings of
the AutoML Workshop at ICML (2020)
11. Alteryx EvalML 0.36.0 Documentation. https://evalml.alteryx.com/en/stable/. Accessed 15
Sep 2023
12. Shchur, O., Turkmen, C., Erickson, N., et al.: AutoGluon-TimeSeries: AutoML for Proba-
bilistic Time Series Forecasting. arXiv preprint arXiv:230805566 (2023)
13. Garcia, D.J., You, F.: Supply chain design and optimization: challenges and opportunities.
Comput. Chem. Eng. 81, 153–170 (2015)
14. Sun, O., Fan, N.: A review on optimization methods for biomass supply chain: models
and algorithms, sustainable issues, and challenges and opportunities. Process Integr. Optim.
Sustain. 4, 203–226 (2020)
15. Kuo, R.J., Han, Y.S.: A hybrid of genetic algorithm and particle swarm optimization for
solving bi-level linear programming problem–A case study on supply chain model. Appl.
Math. Model. 35, 3905–3917 (2011)
16. Mastrocinque, E., Yuce, B., Lambiase, A., Packianather, M.S.: A multi-objective optimization
for supply chain network using the bees algorithm. Int. J. Eng. Bus. Manage. 5, 38 (2013)
17. Lim, C.H., Ngan, S.L., Ng, W.P.Q., et al.: Biomass supply chain management and challenges.
In: Value-Chain of Biofuels, pp 429–444. Elsevier (2022)
18. De Meyer, A., Cattrysse, D., Rasinmäki, J., Van Orshoven, J.: Methods to optimise the design
and management of biomass-for-bioenergy supply chains: a review. Renew. Sustain. Energy
Rev. 31, 657–670 (2014)
19. Gurobi Python API. https://www.gurobi.com/documentation/10.0/refman/py_python_api_
overview.html#sec:Python. Accessed 15 Sep 2023
20. Advanced Gurobi Algorithms. https://assets.gurobi.com/pdfs/user-events/2016-frankfurt/
Die-Algorithmen.pdf. Accessed 15 Sep 2023
21. Adhikari, R., Agrawal, R.K.: An introductory study on time series modeling and forecasting.
arXiv preprint arXiv:13026613 (2013)
22. Forecasting Time Series - In Depth - AutoGluon 0.8.2 documentation. https://auto.gluon.ai/
stable/tutorials/timeseries/forecasting-indepth.html. Accessed 15 Sep 2023
23. Erickson, N., Mueller, J., Shirkov, A., et al.: Autogluon-tabular: Robust and accurate automl
for structured data. arXiv preprint arXiv:200306505 (2020)
24. Iglesias, G., Talavera, E., González-Prieto, Á., et al.: Data augmentation techniques in time
series domain: A survey and taxonomy. arXiv preprint arXiv:220613508 (2022)
Prediction of Deposition Parameters
in Manufacturing of Ni-Based Coating Using
ANN
1 Introduction
The thermal spray processes, which permits the application of coatings of varying nature,
adaptability, and superior characteristics on material surfaces with minimal added value
in order to boost their performance and properties is one solution to these difficulties
[1, 2]. A range of coating procedures known as thermal spray, deposit finely dispersed
non-metallic or metallic materials while they are still molten or semi-molten [3, 4]. The
high-velocity oxy-fuel process (HVOF) and atmospheric plasma spray (APS) are most
preferred thermal spray techniques due to their cost effectiveness [4]. The deposition
parameters of thermal spray coatings have a significant impact on performance of the
coating. Many processes controlling variables, including gas flow rate, powder feed rate,
current, power voltage, stand-off distance etc., have an impact on the mechanical and
morphological properties of the coatings during plasma spray deposition technique. The
overall performance of the deposited coating is significantly influenced by the values
of those parameters being either too high or too low. However, combination of these
factors may, to some extent, have an impact on the quality of the coating [5–7]. The
coating’s porosity increased as spraying power decreased. When the spraying power
was sufficient, nearly all of the powder was totally melted. The porosity of coating is
unaffected by the spraying power when higher spraying power is utilised [8]. When
spraying power is higher, degree of melting of the powder decreases with an increase
in the spraying distance [9]. The main process parameters that have greatest impact
on properties of in-flight particles were spraying distance and fuel or gas flow rates
[10]. Therefore, optimum process and operating parameters are essential for superior
wear resistance and minimum coefficient of friction of the coatings based on type of
application.
The most popular technique for estimating the effects of process parameters on the
coating characteristics is the Taguchi design approach, which is a preliminary solution to
determine the optimal process and operating parameters and differentiate the important
and unimportant variables [11–13]. However, as the coating quality is often impacted by
a cumulative influence of numerous process and operational parameters, it may fail to
attain the precise ideal parameters. The spray process has also been effectively simulated
and controlled using numerical simulation and modelling [7, 14]. However, numerical
modelling struggles to simulate the actual behaviors of the thermal spray mechanism
since it is a complicated multi-physical event. Consequently, a technique like machine
learning algorithms that are more beneficial in precise analysis, accurate prediction, and
parameter optimization is essential. Processes like thermal spraying technologies, which
involve complicated thermal and chemical reactions, must be studied using a robust
computational model, like the ANN model. However, it is difficult or expensive to collect
enough data sets about tribological properties of thermal spray coatings, particularly to
gather enough information about the coating properties [15, 16]. As a result, the use of
shallow ANN models, which typically have fewer than three hidden layers and are used
to applications with relatively little data, has also drawn more attention in the context of
thermal spraying.
The application of ANN models in material science has been widely investigated;
however, the extent to which these models can forecast the porosity, sliding wear rates,
and nano-hardness of thermal sprayed Ni-based coatings is not well documented. In this
74 S. Suryawanshi et al.
study, the training and validation datasets for back propagation neural network model was
created through carrying out HVOF spray deposited coating’s experiments. The devel-
oped model, which is intended to thoroughly investigate the HVOF deposition technique,
has been used to corroborate the relationship between the high-velocity oxy-fuel spray
process parameters and the mechanical performance of coatings. The back propagation
model’s accuracy and dependability were confirmed by additional experiments.
The organization of this article is as follows: In Sect. 2, the main conceptual threads of
the experimental work and techniques used in characterization are presented, explained,
and illustrated. The study’s methodology is also explained and shown in the same Section.
Section 3 presents the results of as-deposited coating characterization and analysis of
results obtained from developed back propagation ANN mode. This section also contains
effect of deposition parameters on output characteristics under considerations. Section 4
provides an overview of the study’s main conclusions and their consequences.
The substrate material was selected to be stainless steel 316L grade, and its true composi-
tion was verified using an optical emission spectroscope. In this work, the commercially
available powder of NiCrSiBFe was employed as feedstock. The powder had a nomi-
nal distribution of particle sizes of range of 5–35 µm when it was received. Cr 14.5%,
Fe 4.5%, Si 4.5%, B 3.2% and C 0.7%, along with the remainder of Ni, made up the
NiCrSiBFe powder’s composition. Before applying the coating powder using a spray
gun, the surfaces of substrate were pressure-blasted to improve adherence. The condi-
tions for surface blasting were as follows: virgin grade brown alumina blasting material,
air pressure of 5 bar, blasting angle of 90° with a blasting distance of 150 mm. High-
velocity oxy-fuel spraying, also known as HVOF, was used to manufacture the coatings
under investigation. Table 1 summarizes HVOF spray deposition process parameters.
Parameters Values
Powder feed rate (g/min) 35, 55, 75
Gun traverse speed (m/s) 0.7
Standoff distance (mm) 150, 200, 250
Particle velocity (m/s) 534
Oxygen gas flow rate (SLPM) 200, 240, 280
Propane gas flow rate (SLPM) 60
Prediction of Deposition Parameters 75
The experiments were carried out in accordance with ASTM G99 standard practise for
dry sliding wear. A ball-on-disc tribometer (Model: TR20LE CHM-800) designed by
Ducom Instruments in Bengaluru, India was used to conduct dry sliding wear tests.
Temperatures between 21 and 27 °C and humidity levels between 45 and 55% were the
environmental conditions under which the tests were carried out. Alumina balls were
used as the test counterpart and were cleaned with alcohol beforehand. The values for
the wear depth and frictional force were determined using an LVDT and a load cell,
respectively. Every two seconds during a test run, the tribo-meter’s data acquisition
system can record one data point. The exact same operating parameters are used in each
experiment three times. The scanning electron microscope (ZEISS Gemini) was used for
examination of deposited coating’s microstructure across the cross-section and ImageJ
software for the purpose to determine porosity in as-deposited coatings. An average value
was obtained after considering fifteen field of views. An Hysitron TS77, manufactured
by Bruker Inc., U.S.A., was used to test nano-hardness of the coatings. The hardness
was measured at ten different locations on cross-section with load of 9 mN lasted for
15 s and an average hardness is considered in each of the coating.
Pre-processing was done on the collected data in order to configure, train, and validate
the back propagation neural network model. The operational parameters and test results
under consideration were first gathered through the experiments. A database containing
144 records was created from 24 sets of HVOF spray tests and associated coating char-
acterization data. Three deposition process parameters make up the back propagation
neural network model: stand-off distance, powder feed rate, and oxygen gas flow rate.
Targets included sliding wear rate, porosity, and nano-hardness. Table 2 contains a sum-
mary of the experimental values for the inputs and outputs. In pre-processing, the data
would be normalised to fall between −1 and 1 to avoid calculation errors brought on
by varying input parameter magnitudes. The data was divided into three sets at random
using dividerand data division technique. A training dataset and a validation dataset were
each comprised of three datasets. The proportions were 20% and 80%, in that order.
76 S. Suryawanshi et al.
Because there are three inputs and three targets for the network, the number of neurons
in the input and output layers was fixed at three. The number of neurons in each hidden
layer and the total number of hidden layers are decided upon based on the model’s
performance during hyper-parameter tuning, taking into account the accuracy of the
trained back propagation model and the complexity of the network structure. When there
are fewer hidden layers and hidden layer neurons, higher accuracy is typically expected.
The model is trained in this study in a supervised manner using the back propagation
technique. It employs a multi-layer feed forward neural network, a conventional learning
method, that has been trained using the error BP method. The procedure of changing
the weights and thresholds of the algorithm, backpropagating the error, and repeating
these actions until the target error or the predefined number of iterations was reached.
The network’s weights and biases are changed during training to optimize performance
based on the network performance function.
Table 3 contains a list of the input parameters used to train the back propagation model.
The quantity of hidden layer neurons greatly influences the model’s training accuracy.
Consequently, error tolerance, noise factor, momentum parameter, learning rate of model,
and slope parameter are assessed at constant cycles for various values for number of
neurons in the hidden layer. As a result, eight neurons are taken into account in this
work, which is necessary for the back propagation model to have the ideal structure.
Experimental and back propagation model results are shown for three coating qualities
in Fig. 2. Table 4 displays the training and validation data; the training data pertains to
the first 19 groups, and the validation data pertains to the final 5 groups.
Fig. 1. Micrographs of as-deposited NiCrSiBFe by HVOF (a) cross-section (b) surface morphol-
ogy
Table 4. Validation parameters used for the back propagation neural network.
The back propagation model’s experimental results and predictions are shown in
Fig. 2, where the training data points are indicated by the pink curve and the validation
data points by the black curve. Porosity, nano-hardness, and sliding wear rate had aver-
age errors of 1.816%, 1.997%, and 4.405%, respectively. It illustrates the strong agree-
ment between the experimental and back propagation model results. Moreover, both
the increasing and decreasing trends of results are also similar, proving that developed
model is capable of predicting characteristics of coatings under consideration accurately
within the specified set of deposition process parameters.
Fig. 2. Experimental and predicted outcomes by ANN model for (a) Porosity, (b) Nano-hardness,
and (c) Sliding wear rate
The relationship between HVOF deposition process parameters and coating attributes
such porosity, nano-hardness, and sliding wear rate was investigated using different
stand-off distances, powder feed rates, and oxygen gas flow rates. Table 4 presents
validation parameters used for the back propagation neural network. Table 5 shows that
when the powder feed rate increases, porosity first reduces and then increases, and nano-
hardness first increases and then falls. The sliding wear rate exhibits the opposite trend
of these trends. Overall comparative investigation shows that better features of HVOF
80 S. Suryawanshi et al.
deposited coatings are obtained at a powder feed rate of 75 g/min. The effect of stand-
off distance on three properties of coatings is depicted in Table 5. Stand-off distance
has a significant impact on coating hardness, as seen in figure. On the other hand, the
porosity first decreases and then gradually increases as the standoff-distance grows.
Simultaneously, the sliding wear rate displays a consistent rising trend, whereas the
nano-hardness values decrease as the stand-off distance increases. The overall analysis
shows that the 200 mm stand-off distance is where all three coating qualities are at
their best. Table 5 shows that as the powder feed gas flow rate increases, the porosity
first rises and then decreases, while the overall nano-hardness shows a significant rising
trend and the sliding wear rate shows a decreased trend. It is evident that the coatings
under investigation have better characteristics when the powder feed gas flow rate is 240
SLPM.
4 Conclusion
The performance of NiCrSiBFe HVOF spray deposited coatings was predicted using a
model developed by back propagation neural network algorithm. The deposition process
parameters such as stand-off distance, powder feed rate, and oxygen gas flow rate were
used to predict the coating characteristics such as porosity, nano-hardness, and sliding
wear rate. The performance of developed model in terms of accuracy and reliability
have been verified using test dataset with relative errors lower than maximum errors of
the training and validation datasets. The average errors for porosity, nano-hardness, and
sliding wear rate are 1.816%, 1.997%, and 4.405%, respectively. The developed back
Prediction of Deposition Parameters 81
propagation model can therefore be applied to coating operating practice for spray per-
formance prediction, and also for parameter management and optimisation. The exper-
imental findings and predicted results by BP model on porosity, nano-hardness, and
sliding wear rate of indicate that following HVOF spray parameters are optimal for the
deposition of NiCrSiBFe coatings: 75 g/min of powder feed rate, 200 mm of stand-off
distance, and 240 SLPM of oxygen gas flow rate. The future direction is to involve more
parameters related to chemical and thermodynamic of HVOF deposition process since it
is difficult to research and develop a comprehensive model of the HVOF spray process
because of the complex chemical and thermodynamic processes involved.
References
1. Thakur, L., Arora, N., Jayaganthan, R., Sood, R.: An investigation on erosion behavior of
HVOF sprayed WC-CoCr coatings. Appl. Surf. Sci. 258, 1225–1234 (2011)
2. Singh, J.: A review on mechanisms and testing of wear in slurry pumps, pipeline circuits and
hydraulic turbines. J. Tribol. 143, 1–83 (2021)
3. Hermanek, F.J.: Thermal spray terminology and company origins. ASM Int. (2001)
4. Pawlowski, L.: The Science and Engineering of Thermal Spray Coatings. Wiley (2008)
5. Zeng, Z.Q., Zhang, B., Wang, J.Y.: Study on Properties of Ni Based+WC Coating by Plasma
Spray Welding. Powder Metallurgy Industry (2017)
6. Jiang, M., Ma, C., Xia, F., et al.: Application of artificial neural networks to predict the
hardness of Ni–TiN nanocoatings fabricated by pulse electrodeposition. Surf. Coat. Technol.
286, 191–196 (2016)
7. Dongmo, E., Wenzelburger, M., Gadow, R.: Analysis and optimization of the HVOF process
by combined experimental and numerical approaches. Surf. Coat. Technol. 202, 4470–4478
(2008)
8. Zhang, X.C., Xu, B.S., Xuan, F.Z., et al.: Microstructural and porosity variations in the
plasma-sprayed Ni-alloy coatings prepared at different spraying powers. J. Alloys Compd.
473, 145–151 (2009)
9. Lu, Y.P., Li, S.T., Zhu, R.F., et al.: Further studies on the effect of stand-off distance on
characteristics of plasma sprayed hydroxyapatite coating. Surf. Coat. Technol. 157, 221–225
(2002)
10. Saaedi, J., Coyle, T.W., Arabi, H., Mirdamadi, S., Mostaghimi, J.: Effects of HVOF process
parameters on the properties of Ni -Cr coatings. J. Therm. Spray Technol. 19, 521–530 (2010)
11. Singh, J., Kumar, S., Singh, G.: Taguchi’s approach for optimization of tribo -resistance
parameters Forss304. Mater. Today: Proc. 5, 5031–5038 (2018)
12. Praveen, A.S., Sarangan, J., Suresh, S., Channabasappa, B.H.: Optimization and erosion
wear response of NiCrSiB/WC–Co HVOF coating using Taguchi method. Ceram. Int. 42,
1094–1104 (2016)
13. Qiao, L., Wu, Y., Hong, S., Zhang, J., Shi, W., Zheng, Y.: Relationships between spray
parameters, microstructures and ultrasonic cavitation erosion behavior of HVOF sprayed
Fe-based amorphous/nanocrystalline coatings. Ultrason. Sonochem. 39, 39–46 (2017)
14. Li, M., Christofides, P.D.: Modeling and control of High-Velocity Oxygen-Fuel (HVOF)
thermal spray: a tutorial review. J. Therm. Spray Tech. 18, 753 (2009)
15. Lia, X., Zhu, Y., Xiao, G.: Application of artificial neural networks to predict sliding wear
resistance of Ni–TiN nano composite coatings deposited by pulse electrodeposition. Ceram.
Int. 40(8), 11767–11772 (2014)
16. Guessasma, S., Bounazef, M., Nardin, P.: Neural computation analysis of alumina–titania
wear resistance coating. Int. J. Refract Metal Hard Mater. 24(3), 240–246 (2006)
Decision Model for Cost Control
of Transmission and Transformation Projects
Considering Uncertainty: A GAN Algorithm
Si Shen1 , Shili Liu1 , Fulei Chen1 , Jian Ma2(B) , and Jinghua Liu2
1 State Grid Anhui Electric Power Co., Ltd., Economic and Technological Research Institute,
Hefei 230071, Anhui, China
[email protected]
2 China Electricity Council Technical and Economic Consulting Center of Electronic Power
Abstract. The article aims to propose an analysis model based on the GAN (Gen-
eral Adversarial Network) algorithm to address the impact of uncertain factors on
cost control decisions in power transmission and transformation projects (PTTP).
This article deeply analyzes the uncertainty factors of power transmission and
transformation engineering (PTTE), identifies the key factors that affect cost con-
trol, and uses GAN algorithm to simulate and predict them, improving the accuracy
and reliability of the decision-making process. The research results indicate that
the uncertainty cost control decision analysis model based on GAN algorithm can
effectively improve the cost prediction accuracy of PTTP, with a maximum of
96.5%. This provides an important reference basis for engineering management
and decision-making. Therefore, the article provides a new idea and method for
cost control of PTTP, which has important theoretical and practical significance.
1 Introduction
With the rapid development of the economy, the power engineering industry has also
developed rapidly, with the development of PTTE advancing by leaps and bounds.
According to relevant data, the total investment in China’s PTTP is very high, with
new investment accounting for more than half of it. However, with the large-scale con-
struction and use of PTTP in China, the issue of cost control is becoming increasingly
prominent. The uncertainty factors in the cost control process can have a significant
impact on cost control.
At present, many experts and scholars have conducted research and exploration
on cost control. Tian Wenjuan has constructed a maturity evaluation index system for
construction project cost control based on grey clustering method. The experiment has
proven the feasibility of this method, which provides a reference basis for the analysis
and evaluation of cost control maturity in practical engineering [1]. Cen Lu studied a
cost control and application analysis method for project design phase based on quota
design. The experimental results indicate that this method can effectively control the
engineering cost [2]. Zhang Aili proposed to strengthen top-level design and reasonably
determine preliminary investment estimates. He deepened the plan design, refined the
preparation of cost documents, and strengthened the integrated cost control of design and
construction economy. He implemented measures and suggestions such as full process
cost consulting services, providing reference for cost control and management of such
projects [3]. The above content provides new ideas and references for cost control of
construction projects, but its practicality and operability are insufficient.
With the continuous development of artificial intelligence technology, GAN algo-
rithm, as a powerful tool, has been widely applied in various fields of decision analysis
and prediction models. The article takes PTTE as the research object, analyzes various
uncertain factors that affect cost control decisions in depth, and uses GAN algorithm
for simulation and prediction. On this basis, this article establishes an uncertain cost
control decision analysis model based on GAN algorithm. This article aims to provide
a new approach and method for decision-making analysis in the field of cost control in
PTTP, and provide strong support for the smooth implementation and cost control of
engineering projects.
natural conditions, the main consideration is factors such as natural geographical envi-
ronment and meteorological conditions. In terms of socio-economic development, the
main consideration is the level of socio-economic development, energy supply status,
and other factors. In the project decision-making stage, it is usually necessary to compare
and select investment plans in order to choose the best investment plan. However, due to
the uncertainty of various information, comprehensive analysis and comparison of var-
ious plans cannot be carried out. Therefore, only a comprehensive evaluation of several
feasible investment plans can be conducted to ultimately select the best investment plan.
For example, when selecting wire sections, the voltage levels that different wire sec-
tions can carry are different. Under certain conditions, there may be significant changes
in the cross-section of the wire, leading to significant uncertainty in the selection of
design schemes. For example, when selecting the type of iron tower, there are signifi-
cant differences in tower height, height direction, grounding method, etc., which leads
to significant uncertainty in the selection of tower type. When selecting transmission
line path schemes, there are significant differences in materials and lengths required for
different line paths, resulting in significant uncertainty in the cost of PTTP [6].
The construction period of PTTP is long and requires crossing rivers, highways,
etc. There are many uncertain factors in the construction process, making it difficult to
control the cost of PTTP. For example, during the construction process of an iron tower,
changes in various factors would lead to changes in the specifications of the tower, thereby
affecting the cost of the tower. During tower construction, due to factors such as terrain
conditions and the quality of the tower itself, damage to the tower itself can occur, thereby
affecting the cost of the tower. During the installation of iron towers, various factors can
also cause changes in the installation costs of tower materials and accessories. When
pouring concrete, the increase in construction costs caused by factors such as concrete
transportation and mixing would affect the entire project cost. In addition, differences
in construction conditions and technical levels can also lead to fluctuations in the cost
of PTTP [7].
generator and the real data, thereby training the discriminator to generate more accurate
sample data. The GAN algorithm model introduces adversarial structures to make the
generator and discriminator compete with each other, thereby training more accurate
sample data. The main training idea is shown in Fig. 1.
Hidden layer
node
Discriminator
Confrontation Confrontation
structure Real data
network
Generator
Output layer
node
In Fig. 1, it first introduces the adversarial structure, GAN, in the generator. Then, the
generator network is trained by assigning different hidden layer nodes and output layer
nodes. Next, this article would input the existing real sample data as a training set into
the generator for training. Finally, it adds an adversarial structure to the discriminator,
which means that the discriminator also needs to adversarial generate samples, in order
to achieve alternating training of the generator and discriminator, and generate more
accurate sample data [9].
The training process of GAN algorithm mainly includes the following steps: (1)
Generator network training: Firstly, it calculates the error between the output value
of the generator network and the true value through backpropagation algorithm, and
determines the structural parameters of the generator based on the size of the error. It
includes the number of hidden layers, training times, learning rate, and discriminant
structure of the generator network. (2) The discriminator training process is to train
the generator network through the Adam optimizer, and adjust the parameters in the
Adam optimizer to adapt to the sample data. When the number of iterations of the Adam
optimizer reaches the set value, training stops and the model reaches a stable state. (3)
Cross validation: This article adds parameters from the GAN algorithm as variables to
the cross validation experiment. It corrects the GAN algorithm by repeatedly adjusting
the error between the generated data and the real data through repeated experiments,
so that it can better fit the real sample data. (4) Evaluation indicators: The evaluation
indicators of GAN algorithm mainly include accuracy, recall rate, F1 value, and the
relationship between accuracy and recall rate. By evaluating the GAN algorithm, it can
be seen that it can effectively utilize existing sample data for training, thereby better
fitting real sample data and predicting new sample data. The main calculation formulas
are shown in Eqs. (1) and (2) [10].
1
m
∇βn log{1 − D[G(z i )]} (1)
m
i=1
86 S. Shen et al.
1
m
∇βn log D(xi ) + log{1 − D[G(z i )]} (2)
m
i=1
In Eqs. (1) and (2), D is the discriminator and G is the generator. z and x are the opti-
mization coefficients of the generator and discriminator, respectively, and m represents
the number of samples. The main calculation parameters are shown in Table 1.
In Table 1, the learning rate value is 0.0002, the batch size is 64, and the noise
dimension equation is 100. The generator and discriminator have 3 layers and 20000
training rounds. The loss weights are all 1, and the input data range is [−1, 1]. In the
GAN algorithm model, input data and output data are learned by the hidden layer and
output layer, respectively, with n neurons in each hidden layer. Each neuron is connected
to its neighboring nodes, and each neuron is connected to the output value of an input
layer, ultimately generating an output result. During the training process, the generator
first converts the input of the training data into the input of the output data, and then the
discriminator adjusts the output data to obtain the predicted results. During the model
training process, both the generator and discriminator are learned through the Adam
optimization algorithm. During the training process, due to the random parameters in
the generator, the network can generate new data with similar features to real data, so
the generated new data can match the characteristics of real data.
In order to solve the problem of gradient vanishing during the training process, it is
necessary to adjust the parameters between the generator and discriminator. In the GAN
algorithm, the sigmoid activation function and the Adam optimizer are the two main
parameters that affect the performance of the GAN algorithm. By adjusting the sigmoid
activation function and Adam optimizer parameters, the generator can generate better
sample data.
Decision Model for Cost Control of Transmission 87
Before conducting the experiment, the article first collected evaluation indicators, and
the collection results are shown in Table 2.
There are a total of 520 people participating in the collection task in Table 2, of which
135 people have chosen the accuracy rate, accounting for 26%. The number of people
who selected response time is 121, accounting for 23%. The number of people choosing
stability is 125, accounting for 24%, while the number of people choosing reliability is
139, accounting for 27%. Therefore, the article first selects accuracy and reliability as
the main indicators for the experiment.
The article takes the material procurement cost, construction cost, labor cost, and
equipment cost of a certain project as input variables, and the actual cost data of the
project as output variables. It uses the GAN algorithm and traditional models for analysis
and prediction, and compares them with actual data to verify the effectiveness of the
model. The accuracy results obtained are shown in Fig. 2.
In Fig. 2, the prediction accuracy of the traditional model reached the highest of
92.3% in the fourth round and the lowest of 90.3% in the first round, with an average
accuracy of 91.38% calculated. The prediction accuracy of the GAN algorithm based
prediction method reached the highest of 96.5% in the fifth time and the lowest of 93.4%
in the first time. The average accuracy calculated is 95.3%. It can be seen that the cost
control decision analysis model based on GAN algorithm has higher prediction accuracy.
Further analysis experiments are conducted on the reliability of the model, and the results
are shown in Fig. 3.
In Fig. 3, the stability of the traditional model is 93.9%, 94.1%, 94.7%, 95.8%, and
96.3% from the first to the fifth time, showing a gradual upward trend. However, the
stability of the prediction model based on GAN algorithm was 95.8%, 96.7%, 97.1%,
97.6%, and 98.1% from the first to the fifth time, respectively. The overall trend also
shows an upward trend, and the stability is higher than that of traditional models. It
can be seen that the cost control decision analysis model based on GAN algorithm has
higher reliability. At the end of the article, the developed prediction model was put into
practical operation and a satisfaction questionnaire survey was conducted, as shown in
Fig. 4.
Decision Model for Cost Control of Transmission 89
120%
100%
12% 11% 13% 14%
80% 17% 20% 18%
Percentage
19%
60% 22% 22% 23% 21%
40%
41% 41% 39% 41%
20%
0% 8% 6% 7% 5%
Praccality Operability Risk idenficaon Risk response
Object
Poor General Good Great Excellent
In Fig. 4, in the evaluation of practicality, excellent accounts for 12%, very good
accounts for 17%, positive accounts for 22%, average accounts for 41%, and negative
accounts for 8%. In the evaluation of operability, the proportion of evaluations from
excellent to poor is 11%, 20%, 22%, 41%, and 6%, respectively. The proportion of
positive reviews for risk identification and risk response abilities far exceeds negative
reviews. It can be seen that the prediction model studied in the article has achieved good
results after being put into practical use.
The article compares the prediction accuracy and reliability of traditional models and
cost control decision analysis models based on GAN algorithm, and finds that models
based on GAN algorithm exhibit higher levels of prediction accuracy and reliability. The
experimental results show that the model based on GAN algorithm achieved the highest
accuracy of 96.5% in the fifth prediction, while the traditional model achieved the highest
accuracy of 92.3%. In terms of reliability, the model based on GAN algorithm achieved
98.1% reliability in the 5th round, while the reliability of traditional models was 96.3%.
In addition, satisfaction surveys conducted in actual operations have also confirmed the
effectiveness of the GAN algorithm based model in practical applications. Therefore,
the cost control decision analysis model based on GAN algorithm shows significant
advantages in prediction accuracy and reliability, and has high practical application
value.
5 Conclusions
The article introduces a cost control decision analysis model based on GAN algorithm,
which takes actual engineering data as samples and uses GAN algorithm to simulate
and predict the sample data. It compared the simulated prediction results with actual
engineering data, verifying the effectiveness and accuracy of the model. In practical
90 S. Shen et al.
applications, this model can not only predict the cost level of PTTP in the future, but
also effectively solve the uncertainty problem in cost control of PTTP. The future research
directions include the following: the first is to use GAN algorithm to simulate and predict
sample data. The second is to use different parameters to compare the simulation and
prediction results to provide more reference basis for the decision-making process. In
short, with the continuous development of information technology, there are more and
more uncertainties. In this case, the uncertainty in the cost control of PTTP would also
become prominent.
References
1. Tian, W.: Evaluation of the maturity of construction project cost control based on gray
clustering method. Chin. Architectural Metal Struct. 22(09), 159–161 (2023)
2. Cen, L.: Cost control and application analysis of project design phase based on limit design.
Eng. Constr. 37(04), 1348–1350 (2023)
3. Zhang, A.: Research on the difficulties and countermeasures of cost control of water
environment governance projects. Constr. Econ. 44(S1), 86–88 (2023)
4. Ajagekar, A., You, F.: Deep reinforcement learning based unit commitment scheduling under
load and wind power uncertainty. IEEE Trans. Sustain. Energy 14(2), 803–812 (2022)
5. Huang, C.: Demand response for industrial micro-grid considering photovoltaic power
uncertainty and battery operational cost. IEEE Trans. Smart Grid 12(4), 3043–3055 (2021)
6. Hu, B.: Decision-dependent uncertainty modeling in power system operational reliability
evaluations. IEEE Trans. Power Syst. 36(6), 5708–5721 (2021)
7. Gui, J.: A review on generative adversarial networks: algorithms, theory, and applications.
IEEE Trans. Knowl. Data Eng. 35(4), 3313–3332 (2021)
8. Liu, K.: FISS GAN: a generative adversarial network for foggy image semantic segmentation.
IEEE/CAA J. Automatica Sinica 8(8), 1428–1439 (2021)
9. Nguyen, D.C.: Federated learning for COVID-19 detection with generative adversarial
networks in edge cloud computing. IEEE Internet Things J. 9(12), 10257–10271 (2021)
10. Gecer, B.: Fast-ganfit: generative adversarial network for high fidelity 3D face reconstruction.
IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 4879–4893 (2021)
Optimization Model of Construction Period
in Special Construction Scenarios of Power
Transmission and Transformation Project Based
on Back Propagation Neural Network
Si Shen1 , Fulei Chen1 , Jian Ma2(B) , Tianrui Fang1 , and Wei Yan2
1 State Grid Anhui Electric Power Co., Ltd., Economic and Technological Research Institute,
Hefei 230071, Anhui, China
[email protected]
2 China Electricity Council Technical and Economic Consulting Center of Electronic Power
1 Introduction
PTTP is an important component of power grid construction, playing an important role
in power grid construction. Due to the numerous construction links, heavy tasks, and
tight schedule of power grid project, there are many uncertain factors in the construction
process of PTTP, which brings certain difficulties to the optimization of the schedule.
Meanwhile, with the rapid development of the economy, the construction of power grid
projects has also accelerated, posing higher requirements for the reliability and stability
of power supply. Therefore, the optimization of the construction period of power grid
project has become one of the urgent problems that power enterprises need to solve.
Currently, many experts and scholars have conducted research and exploration on
schedule optimization. Li Jiaxi proposed an optimization model for the production work-
shop schedule of prefabricated components for highway bridges and culverts under
multiple factor constraints. The results showed that this method can improve workshop
production efficiency and provide theoretical support for accelerating the construction of
a modern high-quality comprehensive three-dimensional transportation network [1]. Yu
Zongrang summarized the characteristics of prefabricated steel structure school build-
ings based on project practice, studied the optimization measures for the construction
period of prefabricated steel structure school buildings, and proposed feasible sugges-
tions for the application of prefabricated steel structure school buildings, in order to
provide reference for similar projects [2]. Chen Zhimin proposed an improved firefly
algorithm to solve the model. The results showed that the estimated construction period
was 89.6 days, with a confidence level of 95.6%. Compared with the original method, it
reduced the construction period by 13.4 days and can shorten the construction period by
13.1% [3]. Although the above methods can effectively shorten the construction period,
they may increase costs.
On the basis of analyzing the special scenarios and the importance of schedule opti-
mization in the construction of PTTPs, the article proposed a BP neural network model
for schedule optimization of PTTPs. Firstly, based on actual cases, special scenarios were
described. Then, the BP neural network model was introduced and analyzed. Finally,
the model was experimentally validated to have good performance in optimizing time.
In the construction process of PTTPs, there are many special scenarios, such as cli-
mate conditions, resource scarcity, etc. In terms of climate conditions, due to the harsh
climate environment, it is necessary to make preparations, conduct a detailed analy-
sis of the weather conditions, and take corresponding response measures. In terms of
resources, PTTPs involve numerous construction links, so there are strict requirements
for equipment, materials, etc., required for construction, and a large number of personnel
are involved in the construction process. Therefore, it is necessary to develop a scien-
tific and reasonable scheduling plan. In terms of resource scarcity, due to the fact that
PTTPs generally involve cooperation and coordination among multiple types of work,
it is necessary to make reasonable planning and configuration to ensure efficient and
orderly construction work. Therefore, for these special scenarios, scientific and reason-
able analysis and evaluation are required in order to develop scientifically effective plans
to optimize their construction period [4].
In the construction process of PTTPs, there are many different climatic environments
that need to be faced, so climatic conditions are one of the important factors that affect
the progress of the project. In terms of climate conditions, due to the fact that PTTPs are
generally located in the western region of China, there may be dry and rainy conditions in
this area, as well as frequent strong winds. Therefore, when carrying out the construction
of PTTPs in this area, it is necessary to make relevant preparations, analyze and predict
the weather conditions in detail, and develop corresponding response measures based
Optimization Model of Construction Period 93
on the predicted results. In addition, due to the seasonal changes in temperature in the
area, it is necessary to make sufficient preparations for the construction of PTTPs in this
area. For example, when the temperature is low in winter, it is necessary to do a good
job of preventing cold and keeping warm. When the temperature is high in summer, it
is necessary to do a good job in heatstroke prevention and cooling. In short, climate
conditions are one of the important factors affecting the progress of PTTPs [5].
Resource scarcity refers to the inability of certain resources to be provided to con-
struction personnel in a timely manner due to various factors during the construction
process of PTTPs, which hinders the construction of the project. This kind of resource
scarcity usually occurs in the construction of large-scale PTTPs, such as the Three Gorges
Project in China. Due to the involvement of many different departments and units in the
project, many situations of resource scarcity are encountered during the construction
process. In the construction process of the Three Gorges PTTP, many large mechanical
equipment are used. If maintenance materials or accessories cannot be provided in a
timely manner in the event of a malfunction, it would affect the overall progress of the
project. Therefore, in the case of resource scarcity, reasonable planning and allocation are
necessary to ensure the smooth progress of the entire project, and personnel scheduling
work needs to be done well during the construction process [6].
In the construction process of PTTPs, cooperation and coordination among multiple
types of work are often involved to ensure efficient and orderly construction. In PTTP,
there are many types of work involved, including material transportation, equipment
installation, foundation construction, etc. The coordination and cooperation among var-
ious types of work have a significant impact on the construction progress of PTTPs. In
the coordination and cooperation work among multiple types of work, it mainly includes
the reasonable allocation and allocation of resources, the reasonable allocation of human
resources, information communication and communication, etc. [7].
Schedule
Current construction
Insufficient construction
period
period and resources
a1
x1 y1
a2
x2 y2
a3
x3 y3
a4
In Fig. 1, the input node represents the current duration of each node, and all progress
plans form the hidden layer nodes in the network, while duration and resource shortages
form the output layer nodes. Therefore, the model consists of two parts: network nodes
and output layer nodes. During model training, each variable in the input layer is assigned
a value and then transferred to the output layer. Then, all outputs are normalized. Finally,
based on the normalization results, the network is adjusted to better match the actual
situation [9].
The article uses BP neural network to optimize critical paths and resource deficiencies
during the construction process. When dealing with actual project problems, the first step
is to input raw data into the network for training and testing. In order to better adapt
the network to actual situations, the raw data is first normalized and then input into the
neural network for training. During the training process, a BP neural network model is
first constructed based on the input data. Then, the network is learned and the learned
network is compared with the actual project progress. If the error between the two is
small, it indicates that the model has good predictive ability and robustness; on the
contrary, it indicates that the model may have some errors and needs to be improved.
Finally, whether the BP neural network can achieve the expected effect is tested through
training data.
This model has three input layer nodes and one hidden layer node. The objective
functions of the model are the minimum and maximum duration errors, and the error
function is that in neural network training, learning rate is an important parameter. If the
learning rate is too high, the training process of the network is relatively difficult, while if
the learning rate is too low, it is easy to cause the convergence speed of the network to be
too slow to achieve the expected effect. Therefore, the article adopts adaptive learning
rate for model training. Learning rate refers to the time interval between the start of
training and reaching the maximum value. The smaller the learning rate, the earlier the
maximum value is reached. The higher the learning rate, the later it reaches its maximum
value. The article selects 0.001, 0.005, and 0.050 as the range of learning rates [10].
After the model training is completed, the training data is normalized and then input
into the BP neural network toolbox for training. The input and output of a network model
can be represented by an array, where the variables in the array are used to represent
the results obtained from network training. For each input variable, the same number
of sine functions are selected for simulation input, and then the simulation output and
training data are compared to determine whether the network can achieve the expected
Optimization Model of Construction Period 95
effect. The connection weights between output variables and hidden layer nodes are
implemented using the sigmoid function, as shown in Eqs. (1) and (2) [11].
δ(x) − δ(−x)
T (x) = (2)
δ(x) + δ(−x)
In Eqs. (1) and (2), x represents the input value; S represents the sigmoid function;
T is the Tanh function. The calculation parameters are shown in Table 1.
Table 1 contains four parameters, among which the weights from input layer to hidden
layer are [0.2, 0.4, 0.6]; the weights from hidden layer to output layer are [0.5, 0.7]; the
learning rate is 0.01; the bias is 0.5. Due to the special nature of PTTP construction,
there may be a problem of insufficient resources in actual construction. If the problem of
insufficient resources can be considered in the plan, it can effectively solve the problem
of project extension. Therefore, the article adopts resource scarcity as the input of the
output layer node, with resource scarcity as the objective function, and establishes a
schedule optimization model based on BP neural network. Construction tasks are divided
according to the job duration, that is, the job duration is the current duration of each
job task. The resource scarcity of each task is used as input, and genetic algorithms
are adopted to optimize the network. The specific steps are as follows: the network is
learned and trained. Firstly, the training samples are input into the network, and then the
network is adjusted based on the actual progress of the project. The adjusted network
can be in line with the project progress and has strong predictive ability.
Through simulation testing, it is found that the trained model can well fit the actual
project progress curve. In addition, the article also finds an example that is quite similar to
the actual situation. Through simulation testing and analysis, it can be concluded that the
network has good predictive ability when the duration extension is greater than the target
duration. When the duration extension is less than the target duration, the network has
good robustness. In this example, both critical paths in the network require an extension
of time to meet the target duration requirements. However, in actual project, the actual
construction period of these two critical paths is longer than the target construction
period.
96 S. Shen et al.
The BP neural network model obtained through training can predict the planned
construction period of a certain PTTP, and predict the quantity of various resources
in the actual construction process to complete the task. The planned duration of this
project is 40 days. Among them, the work content of the first week is: setting out at
the construction site; the work content of the second week is: installing hardware; the
work content of the third week is: hoisting iron towers. There are two critical paths in
the project, namely path 1 and path 2.
The number of resources required for path 1 is relatively large, while path 2 requires a
smaller number of resources. In the actual construction process, there are 15 critical paths
on path 1 and 3 critical paths on path 2. Due to the close distance between path 1 and path
2, if the progress time relationship between path 1 and path 2 is considered separately,
the number of resources required for path 1 and path 2 is the same, but the distance
between path 1 and path 2 is relatively long. When considering the progress relationship
between the two paths, the impact of the difference in resource quantity between the two
paths on the construction period must be considered. The trained network is applied in
engineering projects and compared with actual project. Therefore, the six key paths 1,
2, 3, 4, 5, and 6 are added to the BP neural network for training. Through the trained
network, the number of resources and duration on paths 1 and 2 are predicted.
The article conducted testing experiments on the duration optimization model studied.
Firstly, relevant data under special construction scenarios of PTTPs were collected, as
shown in Table 2.
The dataset shown in Table 2 includes three elements: resource utilization, safety,
and construction period. Among them, the resource utilization rate of Project 1 was
30%; safety was 80%; construction period was 180 days. The resource utilization rate
of Project 3 was 37%; the safety was 81%; the construction period was 160 days. The
resource utilization rate of Project 4 was 28%; the safety was 82%; the construction
period was 190 days. The article then cleaned, filtered, and processed the collected data
to ensure its accuracy and completeness. Then, historical data was used to train the
constructed BP neural network model and adjust the model parameters to better fit the
Optimization Model of Construction Period 97
actual data. Finally, the model was put into practical use and three results were obtained:
optimization days (Fig. 2), resource utilization rate (Fig. 3), and security (Fig. 4).
From Fig. 2, it can be seen that in actual use, the general optimization model had a
maximum of 1.9 days and a minimum of 1.2 days, and the calculated average number of
days was 1.54 days; the optimization model based on BP neural network had a maximum
of 3.7 days and a minimum of 2.2 days, resulting in an average of 3.12 days. The BP
neural network-based optimization model for the construction period of PTTPs can
reduce more construction periods in special construction scenarios.
From Fig. 3, it can be seen that in actual use, the resource utilization rate of the
general optimization model was the highest at 36.9% and the lowest at 33.5%, and the
calculated average utilization rate was 35.02%; the optimization model based on BP
neural network had a maximum resource utilization rate of 49.1% and a minimum of
43.4%, and the overall trend was gradually increasing. The calculated average utilization
98 S. Shen et al.
rate was 46.08%. The BP neural network-based optimization model for the duration of
PTTPs can effectively improve resource utilization in special construction scenarios.
Fig. 4. Security
From Fig. 4, it can be seen that the general optimization model had a maximum
safety of 93.5% and a minimum safety of 90.3% in actual use, and the calculated average
safety was 92.08%; the optimization model based on BP neural network had a maximum
security of 96.9% and a minimum security of 94.7%, with an average security of 95.72%
calculated. The BP neural network-based optimization model for the construction period
of PTTPs can effectively improve safety in special construction scenarios.
5 Conclusions
PTTP is an essential part of the power system, and it has important practical significance
for optimizing the construction period of PTTP. In previous studies, the main focus was
on analyzing the construction period issues in PTTPs. However, there has been no in-
depth research on the special scenarios in the construction process of PTTPs and the
importance of schedule optimization, and various uncertainties in the actual operation
process have not been taken into account. The article proposed a schedule optimization
model based on BP neural network, and proved its good practical application effect
through practical cases.
References
1. Li, J., Yu, J., Chen, C., Wang, B.: Under the constraints of multiple factors, the construc-
tion period optimization model and n of highway bridge and culvert assembly components
production workshop. Project Manag. Technol. 21(10), 106–111 (2023)
2. Yu, Z., Chen, J., Korea, D.: Optimization of the construction period of the assembled steel
structure school. Construction 45(08), 1584–1586 (2023)
Optimization Model of Construction Period 99
3. Chen, Z., Xia, Y., Wang, P., Wang, Z., Zhang, L.: Optimization of the construction period of
multi-resource ship maintenance projects based on variable neighborhood search algorithm.
Project Manag. Technol. 21(07), 129–134 (2023)
4. Zhu, D., Ma, C., Liu, L., Yuan, W.: Discussion on the preparation of soil and water conservation
plans for power transmission and transformation projects in sandy areas of Northern Shaanxi-
Taking a 110 kV transmission line project in Yuyang District Yulin City as an example.
Groundwater 45(05), 285–286 (2023)
5. Han, L., Qian, Y., Fang, J., Wu, Y., Zhang, Q.: Research on the characteristics and prevention
and control measures of soil erosion in power transmission and transformation projects in
loess hilly areas. Water Conservancy Hydropower Technol. (Chinese and English) 54(S2),
457–464 (2023)
6. Zheng, W., Wang, Z.: Control and analysis of the whole process of power transmission and
transformation engineering survey. Public Electricity 38(05), 54–57 (2023)
7. Li, X., Liu, Q., Jiang, S., Sun, B., Qian, F., Chen, Y.: The erosion characteristics of the slope
ditches of the accumulation body of the power transmission and transformation project in a
typical red soil area. Proc. Yangtze River Acad. Sci. 40(09), 61–67 (2023)
8. Song, S., Xiong, X., Wu, X., Xue, Z.: Modeling the SOFC by BP neural network algorithm.
Int. J. Hydrogen Energy 46(38), 20065–20077 (2021)
9. Jiang, K., Liu, M., He, C., Zhang, Y., Wang, Z., Simon, H.: Exploration on the optimization
of labor and duration of construction projects based on the Choquet integral method. J. Civ.
Eng. Manag. 40(02), 138–149 (2023)
10. Zhang, H., Liu, K., Rong, X., Liu, Z.: Optimization of the construction period of supporting
pipelines for municipal roads based on the key chain. Henan Sci. 41(03), 350–357 (2023)
11. Zhou, G., Pan, Z., Wei, Q., Zhao, J.: Railway engineering construction schedule planning
model and its algorithm optimization method. Ind. Eng. Manag. 28(03), 145–155 (2023)
Vision-Based Human Activity Recognition Using
CNN and LSTM Architecture
Neha Gupta, Payal Malik, Arun Kumar Dubey, Achin Jain, Sarita Yadav(B) ,
and Devansh Verma
Abstract. Technology’s growing use has facilitated the quality of living. Artifi-
cial Intelligence (AI) is the field that aims to define how human intelligence is
mimicked by machines which are programmed to think or behave like humans.
Modern approaches and tools for evaluating human behavior have been made pos-
sible by modern advancements in the fields of machine learning (ML) and arti-
ficial intelligence (AI). Due to its applicability in several industries, comprising
of entertainment, security and surveillance, health, and intelligent environments,
human activity recognition has gained prominence significantly. Human activity
recognition (HAR) using video sensors typically involves analyzing the visual
data captured by cameras to classify and identify the actions of individuals. In
the following paper, we propose ConvLSTM and LRCN-based Human Action
Recognition. A huge variety of films from the publicly accessible data set, UCF50
comprises a wide range of activity classes that are used to build a statistical model.
For the model proposed in this paper, the accuracy has turned out to be 94%, the
average f1-score is 0.93 and the average recall is calculated to be 0.925. The Loss
curve has also been plotted along with the accuracy curve for the proposed model
for recognizing human activities.
1 Introduction
Human Activity Recognition (HAR) deals with automated human physical activity iden-
tification [1]. The primary objective of HAR is to recognize any unusual activity or
incident and analyze human activities [14]. Unlike most species, humans are distin-
guished by their ability to perform different activities simultaneously and their ability
to learn new tasks and teach their offspring how to perform them. These activities can
range from as simple as walking to as complex as a pommel horse [16]. HAR is used
in various industries, including healthcare, entertainment, education, and security. It has
been gaining a lot of focus among researchers because of its multitude of applications
like video analytics, border infiltration detection, computer vision, biometrics, digital
libraries, video surveillance, and many more [14, 15, 17].
HAR systems are built and categorized into mainly three types of sensors - 1) vision
sensors, consisting of 3d cameras, depth cameras, skeleton analysis, etc., 2) environmen-
tal sensors, including device (smartphones) and wearable sensors, and 3) radar sensors,
which are the device-free wireless sensors. Deep learning (DL) has proven to be a good
approach for understanding HAR systems, but choosing the appropriate DL method can
prove to be a bit challenging task [1, 22]. Even though extensive research has been done
in the domain of Human Action Recognition, it is still taxing because of the limitations
like disturbance from the background, frequent change in brightness, relative motion
of the object, object occlusion, etc. [17]. Due to the intricacy of the tasks, the caliber
of the data being processed, high dimensionality, intraclass variability, and interclass
similarity, it may prove to be a difficult task [16].
Deep learning excels in automatically extracting characteristics that are appropriate
for the job at hand. Avoid relying on heuristic hand-crafted functions and scale to more
complex behavior detection tasks [18]. Further, deep learning has an important value
in implementing self-learning as well as transfer learning [7]. Convolutional neural
networks (CNN), which are useful for extracting spatial information, are used in certain
deep learning systems, while others advise employing long short-term memories, which
are rich in temporal information. Deep feed-forward neural networks and their variations
are used by some approaches in this domain. Each of these approaches has its own pros
and cons [9].
Automatically identifying deep features is a strength of Convolutional Neural Net-
works (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory
(LSTM) Networks. In contrast to CNN, which is a feedforward neural network, RNN
adds directed cycles to display dynamic temporal behavior. In other words, RNNs may
use time series data and “memory” to learn temporal correlations. LSTM networks can
supplement RNNs with more complicated memory cells, successfully resolving the issue
of long-term reliance on RNNs [19].
In the proposed, we proposed a model built with the combination of CNN and
LSTM. Through this blend of Convolutional Neural Network (CNN) and Long Short-
Term Memory (LSTM), we can leverage the strength of CNN in feature extraction along
with the power of LSTM in terms of extracting temporal information among activities
[1]. The aim of this work is to create a more advanced artificial intelligence capable
of real-time recognition of activities of daily living (ADLs) such as running, jogging,
walking, etc. [20]. To be able to determine the efficiency and robustness of our proposed
model, we have performed experiments using the UCF-50 dataset, which consists of 50
daily activity classes like Basketball, Diving, Horse Race, Kayaking, Push Ups, Swing,
Walking with Dog, etc. In the following work, we proposed a model built with the
combination of CNN and LSTM. Through this blend of Convolutional Neural Network
(CNN) and Long Short-Term Memory (LSTM), we can leverage the strength of CNN
in feature extraction along with the power of LSTM in terms of extracting temporal
information among activities [1] (Fig. 1).
A typical mechanized system of HAR consists of data acquisition, activity detec-
tion, modeling, and finally, classification. The Human Activity Recognition framework
comprises four major sections:
• Data collection for a dataset based on vision.
102 N. Gupta et al.
2 Related Work
Human Activity Recognition has emerged as one of the most studied computer vision
problems and a very impactful field of research. It is also a very challenging problem
of time series categorization that involves the prediction of an individual’s movement
and actions utilizing a series of data collected by sensors. HAR is traditionally based
on deep learning techniques and along with that it requires signal processing as well as
processes to carefully construct features out of the raw data collected to suit the best
machine learning model. Over the past years, there has been a rapid advancement in the
field of technology and various human activity recognition models have been created
to aid in the automation of visual monitoring systems. Numerous surveys have been
done which summarize various research work done on several different methodologies
in the field of action recognition. Hussain et al. [4] very elaborately discusses various
categorizations of techniques used in the HAR models. The series of data collected
by sensors for action recognition can be broadly categorized into three major sensor
approaches namely Vision based sensors, Radio-Frequency sensors, and Environment
sensors. They thoroughly describes various sensor-based approaches in the field of action
detection and focuses on device-free human activity recognition categories along with the
various comparison metrics to evaluate the created model on the accuracy, technology,
loss, approach, cost, latency, and much more.
Singh et al. [3] asserts that when compared to conventional CCTV motion detection
systems, vision based HAR techniques can provide an end-to-end automated home mon-
itoring system with an accuracy of more than 93%. We can increase the system’s accuracy
Vision-Based Human Activity Recognition 103
and get notably better results by employing the LRCN (Long Term Recurrent Convolu-
tional Network) method. According to Yu and Yan [7], background noises, changes in
perspective, and other complexities, the prediction of the model is affected. To deal with
these issues, three algorithms were designed, namely, two-stream CNN, CNN+LSTM,
and 3D CNN. Mutegeki et al. [9] states that CNN and LSTM both have been the subject
of in-depth inquiry in the past in isolation, therefore presenting a CNN-LSTM classifier
that It improves the accuracy of identifying the action performed while reducing the
model’s complexity. Combining CNN with LSTM has the potential to both simplify
the model and increase the forecast accuracy of human actions from raw data. The aim
of Analysis of Human Activity Recognition (HAR) is identification of the activity a
device’s user is carrying out by employing sensors like the accelerometer, gyroscope,
magnetometer, and others that are incorporated into IMU devices and smartphones.
Hu et al. [12] stated that the due to the ability to extract time information, Long
Short-Term Memory (LSTM) network is a standard activity detection technique and
suggested enhancing the input differential feature module and expanding the network
to include a spatial memory state differential module to produce the enhanced Spatio-
Temporal Differential Long Short-Term Memory (ST-D LSTM) network. Additionally,
an ST-D LSTM transmission mode is put forth, allowing for the horizontal transfer of
the spatial memory state using ST-D LSTM units.
To verify the effectiveness of the new network, these enhancements are lastly applied
to traditional Long-term Recurrent Convolutional Networks (LRCN). A hybrid architec-
ture highlighting the features of both Convolutional Neural Networks (CNN) as well as
LongShort Term Memory (LSTM) was also presented by Deep and Zheng [8] which was
tested on the UCI-HAR dataset comprising sensor data collected through an accelerom-
eter and gyroscope in a smartphone. They performed the same experiment on the LSTM
model but found out that the accuracy of the CNN-LSTM model was better than that.
The combined average results from several classifiers tuned their performances and a
generic activity recognition framework was provided in a model proposed by Semwal
et al. [10] as an ensemble learning implementation. In our current research, we propose
a model for action recognition employing the UCF-50 dataset and two algorithms: the
Longterm Recurrent Convolutional Network (LRCN) along with a Convolutional Neural
Network (CNN) combined with Long-Short Term Memory (LSTM).
In this paper, we use the UCF-50 data set, a collection of 50 action categories that
altogether comprises 6676 realistic videos. A citation published by Shian-Ru Ke [2]
is an excellent resource that undertakes a thorough survey of the vision-based models
for Human Activity Recognition. Shian-Ru Ke [2] discusses the numerous domains of
applications of HAR models along with the three crucial phases of processing which
include activity detection and classification techniques, human object segmentation,
feature extraction, and representation. Three aspects of human action recognition are
discussed: foundational technology, human action detection frameworks, and represen-
tation of applications from lower-levels to higher-levels. Preksha Prateek [11] provides
an overview of the methods currently in use for HAR on trimmed videos and discusses
an action recognition task’s comprehensive framework which entails feature extraction,
104 N. Gupta et al.
feature encoding, dimensionality reduction, action classification, and various action clas-
sification methods, as well as the beneficial and adverse characteristics of the mentioned
tasks, along with readily accessible data sets (Fig. 2).
Environmental
Sensors…
Vision Sensors Environmental Sensors RF Sensor
3 Dataset
4 Proposed Methodology
Some of the best-known models for computer vision in deep learning are CNNs. CNN
architecture can be compared to the visual cortex of the human brain. Filters can be used to
extract spatial or temporal features and identify objects from input images. Convolution
layers are made up of filters, and some fully linked layers are used for classification.
Vision-Based Human Activity Recognition 105
CNNs are not only versed with learning features with the help of pooling layers but are
also good at scaling massive datasets. Lowering the dimensionality of receiving data
and extracting the prevalent attributes is the main aim of pooling [1]. In another study
by [22], A network that has a structure of various layers can be regarded as CNN. The
architecture comprises two major components: 1) an interconnected network and 2)
several sampling and convolutional layers. The latter is used in feature extraction, while
the former is essential for studying classification weights. A standard CNN consists of
3 layers: 1. Convolutional layer, 2. Pooling layer, 3. Fully connected layer.
Filters in the convolutional layer, also known as the feature map, assist in extracting
the local features. A single feature map is generated with the help of one filter. The dot
product must be created by swiping filters over the input data to create a feature map.
This process is termed a convolution operation. Every neuron inside the feature map is
connected with a tiny part of the input data, called the receptive field, whose size is the
same as that of the filter. For achieving efficient results, the number of parameters is
further reduced because the weightings of the neurons in a feature map are made public.
Further, detection and recognition of certain patterns, regardless of their position in the
input, also becomes possible. Based on the stride number and filter size, the feature
map’s magnitude is calculated.
5 Proposed Model
ConvLSTM and LRCN. Initially, the UCF-50 data set, comprising 50 activity classes,
is preprocessed for data training. The frames of the videos are first resized, normalized,
and then extracted. In addition to that the feature extraction is done on the extracted
frames. The dataset is then shuffled and split into train i.e., 75%, and test i.e., 25%
of the total videos. LSTM layers are employed for temporal detection, whereas CNN
aids in spatial extraction of the frames. ConvLSTM cells are LSTM network variants
that include convolutional processes. It is an LSTM with built-in convolution, which
enables it to distinguish between spatial input components and take into consideration the
temporal relationship. As a result, the convolution LSTM can take in 3d input whereas,
LSTM could take in only 1d input. The number of epochs considered is 50, with the
initial number of filters and batch size equal to 4 each. With every fusion, the number
of filters keeps on increasing while with every pooling, the size of filters keeps on
decreasing. Time distributed layer is used at a 20% dropout rate. A SoftMax classifier is
used for classification. For optimization, Adam optimizer is used. The resultant of the
ConvLSTM2D layer is flattened and fed to Dense layers and tanH activation function.
MaxPooling3D layers and Dropout layers are used for the reduction of the dimensions
of the frames and to prevent overfitting of the model (Fig. 3).
6 Result
Our proposed model is studied using various evaluation metrics. For some of the activities
from the specified 50 classes of the UCF-50 dataset, recall, f1 scores, and precision have
been calculated to identify the positive predictive value and the sensitivity. Visual results
showing the values of activities are given in Table 1. For attaining more clarity on the
values, we have considered only 4 activities, present on the leftmost column of the table,
namely Swing (denoted by 0), Horse Race (denoted by 2), Diving (denoted by 3), and
lastly, Basketball (denoted by 4). As shown in the table, the precision of all the activities
on average is around 87.5%. The f1-score for the activities - Swing, Horse race, Diving,
and Basketball is 0.67, 1.00, 1.00, and 1.00 respectively, while the recall is 1.00, 1.00,
1.00, and 0.50 respectively. The magnitude of these values proves the effectiveness of
our proposed technique. The class-wise performance for each activity present in the
dataset can be evaluated in a similar manner. The accuracy of the model is 94%. It is
safe to say that the convLSTM approach works well for vision based human activity
recognition. The graphs in Fig. 4(a) show the comparison between the testing accuracy
and the validation accuracy. From the graph, we can conclude that after the complete
training, the mode accuracy does not decrease by a lot for validation or new data that is
fed into the model. This shows us that the model is quite robust and is not overfitting in
any manner. From the above graph Fig. 4(b), we can conclude that the false positives of
our model are quite low as the losses decrease gradually. And the losses for the validation
data are also very comparable to the total losses for the testing data.
Table 2 presents the benchmarking table showing the comparative study of six dif-
ferent types of AI models, each with their respective F1 scores, Recall, Precision, and
Accuracy metrics. The authors of these models are also listed in the table. Our model’s
performance is also included, with an F1 score of 93.05%, Recall of 92.5%, Precision of
93.6%, and Accuracy of 94%. The results of the comparison highlight the effectiveness
of the models in comparison to each other, with our model outperforming the KNN [26]
model in several key metrics. The table provides valuable insight into the state of AI
models [27] and their performance, allowing researchers to make informed decisions
when choosing the most appropriate model for their specific needs.
108 N. Gupta et al.
Fig. 4. (a) Total Accuracy vs Total Validation Accuracy curve. (b) Total Loss vs Total Validation
Loss curve
Table 2. Benchmarking
7 Conclusion
References
1. Abbaspour, S., Fotouhi, F., Sedaghatbaf, A., Fotouhi, H., Vahabi, M., Linden, M.: A com-
parative analysis of hybrid deep learning models for human activity recognition. Sensors 20,
5707 (2020)
Vision-Based Human Activity Recognition 109
2. Ke, S.R., Thuc, H.L.U., Lee, Y.J., Hwang, J.N., Yoo, J.H., Choi, K.H.: A review on video-based
human activity recognition. Computers 2(2), 88–131 (2013)
3. Singh, J., Sinha, S.: Video Based Human Activity Recognition Surveillance System
4. Hussain, Z., Sheng, Q.Z., Zhang, W.E.: A review and categorization of techniques on device-
free human activity recognition. J. Netw. Comput. Appl. 167, 102738 (2020)
5. Gowda, S.N., Rohrbach, M., Sevilla-Lara, L.: Smart frame selection for action recognition. In:
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 2, pp. 1451–1459
(2021)
6. Mishra, S.R., Mishra, T.K., Sanyal, G., Sarkar, A., Satapathy, S.C.: Real time human action
recognition using triggered frame extraction and a typical CNN heuristic. Pattern Recogn.
Lett. 135, 329–336 (2020)
7. Yu, Z., Yan, W.Q.: Human action recognition using deep learning methods. In: 2020 35th
International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–6.
IEEE (2020)
8. Deep, S., Zheng, X.: Hybrid model featuring CNN and LSTM architecture for human activity
recognition on smartphone sensor data. In: 2019 20th International Conference on Parallel
and Distributed Computing, Applications and Technologies (PDCAT), pp. 259–264. IEEE
(2019)
9. Mutegeki, R., Han, D.S.: A CNN-LSTM approach to human activity recognition. In:
2020 International Conference on Artificial Intelligence in Information and Communication
(ICAIIC), pp. 362–366. IEEE (2020)
10. Semwal, V.B., Gupta, A., Lalwani, P.: An optimized hybrid deep learning model using ensem-
ble learning approach for human walking activities recognition. J. Supercomput. 77(11),
1225612279 (2021)
11. Pareek, P., Thakkar, A.: A survey on video-based human action recognition: recent updates,
datasets, challenges, and applications. Artif. Intell. Rev. 54(3), 2259–2322 (2021)
12. Hu, K., Zheng, F., Weng, L., Ding, Y., Jin, J.: Action recognition algorithm of spatio-temporal
differential LSTM based on feature enhancement. Appl. Sci. 11(17), 7876 (2021)
13. Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and
description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 2625–2634 (2015)
14. Sansano, E., Montoliu, R., Belmonte Fernandez, O.: A study of deep neural networks for
human activity recognition. Comput. Intell. 36(3), 1113–1139 (2020)
15. Tasnim, N., Islam, M.K., Baek, J.H.: Deep learning based human activity recognition using
spatio-temporal image formation of skeleton joints. Appl. Sci. 11(6), 2675 (2021)
16. Hernández, F., Suárez, L.F., Villamizar, J., Altuve, M.: Human activity recognition on smart-
phones using a bidirectional LSTM network. In: 2019 XXII Symposium on Image, Signal
Processing and Artificial Vision (STSIVA), pp. 1–5. IEEE (2019)
17. Kushwaha, A., Khare, A., Srivastava, P.: On integration of multiple features for human activity
recognition in video sequences. Multimed. Tools Appl. 80(21), 32511–32538 (2021)
18. Murad, A., Pyun, J.Y.: Deep recurrent neural networks for human activity recognition. Sensors
17(11), 2556 (2017)
19. Peng, L., Chen, L., Ye, Z., Zhang, Y.: Aroma: a deep multi-task learning based simple and
complex human activity recognition method using wearable sensors. Proc. ACM Interact.
Mob. Wearable Ubiquit. Technol. 2(2), 1–16 (2018)
20. Almaslukh, B., Artoli, A.M., Al-Muhtadi, J.: A robust deep learning approach for position-
independent smartphone-based human activity recognition. Sensors 18(11), 3726 (2018)
21. Wan, S., Qi, L., Xu, X., Tong, C., Gu, Z.: Deep learning models for real-time human activity
recognition with smartphones. Mob. Netw. Appl. 25(2), 743–755 (2020)
110 N. Gupta et al.
22. Mekruksavanich, S., Jitpattanakul, A.: Biometric user identification based on human activity
recognition using wearable sensors: an experiment using deep learning models. Electronics
10(3), 308 (2021)
23. Agarwal, P., Alam, M.: A lightweight deep learning model for human activity recognition on
edge devices. Procedia Comput. Sci. 167, 2364–2373 (2020)
24. Mohsen, S., Elkaseer, A., Scholz, S.G.: Human activity recognition using k-nearest neighbor
machine learning algorithm. In: Scholz, S.G., Howlett, R.J., Setchi, R. (eds.) KES-SDM 2021.
SIST, vol. 262, pp. 304–313. Springer, Singapore (2022). https://doi.org/10.1007/978-981-
16-6128-0_29
25. Bokhari, S.M., Sohaib, S., Khan, A.R., Shafi, M.: DGRU based human activity recognition
using channel state information. Measurement 167, 108245 (2021)
26. Paramasivam, K., Sindha, M.M.R., Balakrishnan, S.B.: KNN-based machine learning clas-
sifier used on deep learned spatial motion features for human action recognition. Entropy
25(6), 844 (2023)
27. Khan, Y.A., Imaduddin, S., Singh, Y.P., Wajid, M., Usman, M., Abbas, M.: Artificial intelli-
gence based approach for classification of human activities using MEMS sensors data. Sensors
23(3), 1275 (2023)
ML-Based Rupture Strength Assessment
in Cementitious Materials
1 Introduction
7 days of curing. Moreover, it heavily relies on a considerable workforce for the creation
of concrete cubes and related tasks. Consequently, the conventional method emerges as
a time-consuming, labor-intensive, and cost-intensive process, rendering it unsuitable
for real-time monitoring of concrete strength in ongoing construction projects.
To overcome these limitations, our proposed system leverages the capabilities of
machine learning to predict the rupture strength of concrete based on its mix design.
This innovative approach promises to address the drawbacks of the existing system by
providing a faster, cost-effective, and less labor-intensive alternative to manual testing.
Importantly, our system introduces real-time monitoring of concrete strength during
construction projects. This feature enables timely interventions and adjustments to mix
designs, enhancing the adaptability of construction processes. The predictive accuracy
of our machine learning model can be continuously refined as more data is incorporated,
making it a versatile and adaptive tool for optimizing concrete mix designs and reducing
material wastage.
In this paper, we provide a comprehensive exploration of the methodology employed
in incorporating machine learning techniques to predict concrete rupture strength. Our
aim is to highlight the transformative potential of this approach within the construction
industry, moving towards more efficient and resource-conscious practices.
The paper begins by highlighting the challenges posed by traditional concrete rupture
strength assessment methods, emphasizing their resource-intensive and time-consuming
nature. It introduces a novel approach employing machine learning to predict concrete
rupture strength based on mix design parameters, offering a faster and cost-effective
alternative. The methodology section details the step-by-step process, from data collec-
tion to model refinement. Comparative performance metrics of regression models are
presented in the results section. The user-friendly web interface is outlined, enabling
practical implementation. The conclusion summarizes key findings and suggests future
research directions.
2 Literature Survey
Veeresh. Karikatti and colleagues [1] emphasize the importance of mix composition,
curing conditions, and the availability of mixes in achieving an ideal concrete mix with
a target compressive strength of 40 N/mm2 . They underscore the significance of prepar-
ing the alkaline solution in advance to enhance cement adhesion and overall strength.
Additionally, they highlight the need to explore various mix proportions to efficiently
deliver the specified compressive strength. This systematic approach optimizes the use
of available mixes, eliminates the need for concrete cube creation, and reduces costs.
Vimal Rathakrishan and team [2] focus on boosting machine learning algorithms
to enhance the prediction accuracy of biomechanical load (BML) models for concrete
rupture strength. They compare BML models with Artificial Neural Network (ANN)
models, highlighting the strengths and weaknesses of each. The authors also stress the
importance of hyperparameter tuning to improve predictive power. These efforts aim to
enhance the reliability and robustness of the model for predicting biomechanical system
behavior.
Mayur Badole and co-authors [3] delve into machine learning algorithms for con-
crete strength prediction. They emphasize the need for high-quality training data that
ML-Based Rupture Strength Assessment in Cementitious Materials 113
3 Proposed System
3.1 Problem Statement
“Develop an efficient and accurate machine learning model to predict concrete rupture
strength based on mix design parameters, optimizing construction industry practices.”
3.2 Objectives
To develop an efficient machine learning model for precise concrete rupture strength
prediction.
1. To reduce labor and testing costs through machine learning-based forecasting.
2. To optimize concrete mix designs for desired rupture strength with minimal material
wastage.
3. To enable real-time, practical applications of machine learning models for concrete
quality assessment.
4. To ensure predicted rupture strength meets safety standards and desired specifications.
4. Data Processing: After the user submits the input data, the web server receives and
processes the information.
5. Model Data Transfer: The web server forwards the input data to the machine learning
model for analysis.
6. Prediction Generation: Leveraging the provided input data, the machine learning
model uses its predictive capabilities to generate a rupture strength prediction.
7. Model Response: The model formulates a response based on the analyzed data and
predicted results.
8. Response Transfer: The model communicates the results back to the web server for
further handling.
9. User Result Delivery: The web server delivers the predicted rupture strength results
to the user, ensuring a seamless user experience.
10. User Decision: The user can make informed decisions or take further actions based
on the provided rupture strength prediction.
These steps outline the interaction between the user, web interface, and the machine
learning model, allowing users to obtain valuable insights into concrete rupture strength
efficiently and conveniently.
• Collect a diverse dataset that includes concrete mix designs and their corresponding
rupture strengths.
• The dataset should include details on the composition of the concrete, including
components and proportions such as Cement, Fly Ash, Blast Furnace Slag, Coarse
Aggregate, Fine Aggregate, Water, and Age (in days).
• Sample data set is shown in Fig. 2
• Split the pre-processed dataset into two subsets: a training set and a testing set.
• The training set is used to teach the machine learning model the patterns and
relationships in the data.
ML-Based Rupture Strength Assessment in Cementitious Materials 117
• The testing set is reserved for evaluating the model’s performance and generaliza-
tion.
• Train a random forest regression model using the scikit-learn library in Python.
• Tune the hyperparameters of the random forest regression model using techniques
like grid search and cross-validation.
• Adjust hyperparameters such as the number of decision trees, tree depth, and the
number of features considered at each split.
• Analyze these measures to determine how well the model approximates the actual
rupture strength values.
• This allows for easy loading and usage of the model in various applications,
including web applications or production environments.
Table 1. Algorithm
ML-Based Rupture Strength Assessment in Cementitious Materials 119
– Monitoring and improving the model’s performance over time to maintain its
relevance and effectiveness.
These steps outline the process of developing and deploying a machine learning
model for predicting the rupture strength of concrete based on mix design parameters
(Table 1).
Model Selection
After analyzing the Score, MSE, R-squared, and MAE metrics, it was determined that
120 S. Gurav et al.
the Random Forest Regressor (RFR) was the best model to predict the strength of a
rupture. Figure 6 shows the comparative results.
Web Interface for Accessibility:
• A simple web interface was designed to enable non-experts in data analysis and civil
engineering to utilize the model.
• Using the RFR model, users can input relevant factors through the interface and
receive accurate predictions
• For simplicity, input variables are categorized and labeled in clear language
• Clear instructions on how to use the web interface and interpret the results are
provided.
• Web Interface Results: The following Fig. 7 shows web interface results.
The web interface allows users to enter input values, which are then transmitted to
the backend where the RFR model resides.
The model makes predictions, and the resulting rupture strength of concrete is
displayed to the user on the results page as depicted in Fig. 8.
References
1. Karikatti, V., Chitawadagi, M.V., Patil, I.S., Sanjith, J., Mahesh Kumar, C.L., Kiran, B.M.:
ANN model for predicting compressive strength of alkali activated slag concretecured at envi-
ronmental temperature. NeuroQuantology 20(1) (2022). https://doi.org/10.14704/nq.2022.
20.6.NQ22044
2. Rathakrishnan, V., Bt. Beddu, S., Ahmed, A.N.: Predicting compressive strength of high-
performance concrete with high volume ground granulated blast-furnace slag replacement
using boosting machine learning algorithms. ResearchGate (2022). https://doi.org/10.1038/
s41598-022-12890-2
3. Shaqadan, A.: Prediction of concrete mix strength using random forest model. Int. J. Appl.
Eng. Res. 11(22), 11024–11029 (2016). ISSN 0973-4562
4. Shafiq, M.A.: Predicting the compressive strength of concrete using neural network and kernel
ridge regression. Reasearch Gate (2016). https://doi.org/10.1109/FTC.2016.7821698
5. Wan, Z., Xu, Y., Savija, B.: On the use of machine learning models for prediction of compres-
sive strength of concrete: influence of dimensionality reduction on the model performance.
ResearchGate, Materials 14, 713 (2021). https://doi.org/10.3390/ma14040713
ML-Based Rupture Strength Assessment in Cementitious Materials 123
6. Mai, H.-V.T., Nguyen, T.-A., Ly, H.-B., Tran, V.Q.: Prediction compressive strength of
concrete containing using random forest model. Hindawi Volume (2021)
7. Ahmad, A., Ahmad, W., Aslam, F., Joyklad, P.: Compressive strength prediction of fly ash-
based geopolymer concrete via advanced machine learning techniques. Elsevier (2021)
8. Khursheed, S., Jagan, J., Samui, P., Kumar, S.: Compressive strength prediction of fly ash con-
crete by using machine learning techniques. Innov. Infrastruct. Solutions 6(3), 1–21 (2021).
https://doi.org/10.1007/s41062-021-00506-z
9. Song, H., Ahmad, A., Farooq, F., Ostrowski, K.A.: Compressive strength prediction of fly
ash concrete by using machine learning techniques. ResearchGate, Materials (2021)
10. Ahmad, A., Farooq, F., Niewiadomski, P., Ostrowski, K.: Prediction of compressive strength
of fly ash based concrete using individual and ensemble algorithm. ResearchGate, Materials
(2021). https://doi.org/10.3390/ma14040794
11. Mohana, M.H.: The determination of ground granulated concrete compressive strength-based
machine learning models. ResearchGate 8(2), 1011–1023 (2020)
12. Ling, H., Qian, C., Kang, W., Liang, C., Chen, H.: Combination of support vector machine and
K-fold cross validation to predict compressive strength of concrete in marine environment.
ResearchGate (2019)
Investigation of Power Consumption
of Refrigeration Model and Its Exploratory Data
Analysis (EDA) by Using Machine Learning
(ML) Algorithm
India
1 Introduction
HVAC systems are becoming more and more necessary for industrial facilities, residen-
tial complexes, and commercial buildings in India as its metropolitan areas grow. Perish-
able food product supply chains have grown in tandem with the majority of industrialized
and developing countries’ increased usage of refrigeration over the past few decades.
These days, chilled or Global supply chains for frozen food products, sometimes known
as the “cold chain,” handle billions of tons of food [1]. Throughout postharvest supply
chains, one third to half of all food produced is lost or wasted globally, with value chains
related to transportation, storage, and packaging being the most affected [2, 3]. Globally,
between 40% and 50% of fruits and vegetables are lost, with 54% of those losses hap-
pening during the production, handling, and storage phases after harvest [4, 5]. Greater
energy consumption brought on by urbanization necessitates the use of effective HVAC
systems. Rising energy prices and environmental concerns in India are making energy-
efficient HVAC systems more and more necessary. Thus, refrigerator energy efficiency
has become a crucial factor in the effective management of electrical energy in the house
in a world where resource depletion and global warming are at alarming levels [7]. HVAC
systems that use less energy can save operational expenses and energy consumption by
a large margin. It is possible to forecast when HVAC components, including fans and
compressors, are likely to break down using machine learning algorithms. Uptime can be
minimized and expensive emergency repairs can be avoided by scheduling maintenance
proactively using real-time sensor data and historical data analysis. Demand response
programs, in which HVAC systems are adjusted in response to grid demand, can benefit
from machine learning. Systems can be temporarily changed to lower electricity usage
during periods of high energy demand, which helps maintain grid stability. Machine
learning (ML) is becoming increasingly essential in the HVAC (Heating, Ventilation,
and Air Conditioning) industry due to its potential to enhance system efficiency, opti-
mize performance, and improve overall energy management. Thermal energy storage
is used to store and release thermal energy for various purposes, including improving
energy efficiency and managing energy demand.Currently thermal energy storage and
utilization is focused only on few areas such as building applications, refrigeration and
cold chain and some industrial applications. But TES technology can be adopted for
wide range of applications as shown in Fig. 1. As the most effective way to store food,
refrigerators play a significant role in the home appliance market. To represent the linear
relationship between predictor and response variables, a linear regression ML algorithm
is employed. Python is used to implement this, with a google colab notebook serving as
the front end. The normalized training dataset is utilized to train the linear regression
model. Response variable optimal values are determined by adjusting different hyper
parameters.
• TES systems enable the system to run during off-peak hours, when electricity is
more affordable and easily accessible, thereby increasing the energy efficiency of
refrigeration processes. In order to lower overall operating costs, the system can store
126 A. S. N. Husainy et al.
excess cooling capacity when energy costs are low and discharge it when energy costs
are high.
• Refrigeration systems’ peak loads can be efficiently managed by TES systems. They
lessen the need for additional chillers or compressors by storing excess cooling capac-
ity during times of low demand and releasing it during times of peak demand. This
helps prevent overloading the refrigeration equipment.
• Temperature control in refrigeration applications can be made more accurate and
consistent with the use of TES systems. This is significant for a number of industries,
including food storage, where the safety and quality of products depend on constant
temperature maintenance.
Maintaining ideal food storage temperatures is necessary for the transport systems
that run these cold chain systems in order to guarantee maximum safety and premium
shelf life. Because cooling systems are so good at keeping food, there is a growing con-
sideration of their sustainability. It is necessary to approach the sustainability of a cold
chain system from the perspective of overall cost [7]. PCMs are a class of latent thermo-
electric materials that store energy by utilizing the solid-to-liquid phase transition. The
large volume change makes the liquid/gas and solid/gas phase transitions less desirable
from a technical standpoint. According to Sharma et al.’s overview [8], PCMs can be
broadly divided into three groups: organic, inorganic, and eutectic materials. When com-
pared to sensible heat storage (SHS), latent heat storage (LHS) has a higher volumetric
Investigation of Power Consumption of Refrigeration Model 127
thermal energy storage capacity. Because less storage volume is needed, using LHS is
found to be more appealing and competitive in many applications [9]. Thermal energy
storage (TES) systems can be classified into several categories based on the method used
to store and release thermal energy as shown in Fig. 2.
In order to store or release thermal energy, latent heat storage entails a material’s
phase transition. Latent heat storage is accomplished with phase change materials, or
PCMs. Because of their high heat of fusion, they have the capacity to store and/or
release large amounts of energy during phase transitions. Chemical reactions are used
in thermochemical storage systems to store and release thermal energy. The foundation
of a latent heat storage system is the heat that is absorbed or released when a storage
material changes phases from solid to liquid, liquid to gas, or vice versa. Its high storage
energy density per unit mass in a quasi-static process makes it extremely appealing [10].
There are large numbers of PCMs as shown in Fig. 3 that melt and solidify at a wide
range of temperatures, making them attractive in a number of applications. Hydrocar-
bon compounds like paraffin waxes make up these PCMs. Their moderate heat storage
capacity and relatively low cost make them widely used in a variety of applications.
Inorganic PCMs, such as sodium sulfate dehydrate and calcium chloride hexahydrate,
are made of salt crystals that have incorporated molecules of water. The hydrate crystals
disintegrate into a lower hydrate and water or into anhydrous salt and water at the melting
point. Sharp melting and solidification points are made possible by the eutectic compo-
sition of the various salts or other compounds that are combined in these mixtures. Since
eutectic crystals freeze into a close-knit mixture of crystals, they almost always melt
and freeze without segregating, leaving little [11]. Refrigeration is essential because it
increases the shelf life of perishable foods and gives consumers access to safe, high-
quality organoleptic food. In fact, perishable food changes like microorganism growth,
ripening rates, browning reactions, or water loss occur less quickly at low temperatures.
Perishable food product supply chains have grown to keep up with the expansion of
refrigeration use over the past few decades in the majority of developed and developing
nations. In order to guarantee product quality for consumers and to lower food waste and
health hazards, temperature control in the cold chain is crucial [9]. It is anticipated that
cold chains will make extensive use of wireless temperature sensors and data transmis-
sion, which will yield a lot of data. This enables the compressed sensing method to assist
in the real time temperature analysis. This makes it possible for automatic warning sys-
tems to consider time-temperature thresholds for alerts as well as temperature variability
within equipment. Long term data analysis employing a machine learning methodology
will improve our understanding of cold chain breaks, including their frequency, severity,
and length. Cold chain research produces numerical and experimental data that could be
used to train machine learning models [12].
Investigation of Power Consumption of Refrigeration Model 129
Machine learning helps with techniques for optimizing food processes by creating a
model that forecasts the best outcome given the input data. Both supervised and unsu-
pervised learning, data preprocessing, feature engineering, model selection, evaluation,
and optimization techniques are all included in machine learning. These methods could
be used to solve a variety of food processing optimization issues. The food industry
is using machine learning more and more to increase production efficiency, cut waste,
and provide individualized customer experiences. In addition to automating tasks like
labeling and packaging, machine learning can be used to predict customer preferences
and create individualized products as shown in Fig. 4. It can also be used to maximize
ingredient utilization and reduce costs [13]. Using a sensor-based system, Ekta Sonwani
et al. [14] present a novel method for monitoring and analyzing food spoilage. Food can
be preserved with the gadget this study suggests for a longer period of time. Furthermore,
food items can be kept fresher for longer by extending their shelf life. It keeps an eye
on the quality of the food items and keeps alerting the user via voice-activated com-
mands or a display. It also generates alerts that indicate how long the food is expected to
spoil. The suggested gadget has a 95% accuracy rate. The suggested smart device can
be made better by using machine learning and image processing algorithms to identify
early spoilage. Machine learning was proposed by N.V. Ganapathi Raju et al. [15] to
predict the fridge door status (open or close). The target variable is the fridge door’s
status (1 or 0), and the features that are used to train the model are tray temperature,
fridge cabinet temperature, and energy consumption. The suggested work’s goal is to
use machine learning algorithms to determine the fridge door’s current status.
Zahra Soltani et al. [16] different classifiers are compared to diagnose twenty types
of faults simultaneously and non-faulty condition in the industrial RS. The training data
is taken from a simulation model which has been used in the development of system
130 A. S. N. Husainy et al.
control in Bitzer. The results indicate that the fault detection reliability of the algorithms
highly depends on how well the training data covers the operation regime. Furthermore,
it is found that a well-trained SVM can simultaneously classify twenty types of fault with
95% accuracy when the verification data is taken from different system configurations.
The involvement of machine learning in healthcare applications in the current context
will be covered throughout this paper by Senerath Mudalige Don Alexis Chinthaka Jay-
atilake et. al. [17] along with a discussion of the various machine learning algorithms
and approaches being used for decision making in the healthcare sector. With the help
of the high processing power of contemporary, sophisticated computers, it became clear
from the information gathered that neural network-based deep learning techniques had
excelled in the field of computational biology and were widely used due to their high
prediction accuracy and dependability. When considering the larger picture and integrat-
ing the observations, it becomes evident that machine learning algorithms have become
essential to computational biology and biomedicine-based healthcare decision making,
making them inseparable from the field. Machine learning is used in the diagnosis of
illnesses and biomedical event extraction, medication repurposing, medical imaging,
prediction, and much more in the healthcare industry. While described the use of ML
in cancer prediction and prognosis, Konstantina Kourou et al. [18] talked about its con-
cepts. The majority of studies that have been put forth in recent years concentrate on
creating predictive models with supervised machine learning techniques and classifica-
tion algorithms in an effort to forecast accurate disease outcomes. Their analysis of the
data shows that the integration of multidimensional heterogeneous data, along with the
use of various feature selection and classification techniques, can yield useful tools for
inference in the field of cancer.
Investigation of Power Consumption of Refrigeration Model 131
1. Experimental set up consists of refrigeration test rig along with vapor compression
refrigeration components like, compressor, condenser, expansion device and evap-
orator. Locate temperature sensors (thermocouples) at crucial points (storage unit,
evaporator, and condenser, etc.) and pressure gauges are used to record pressure of
condenser and evaporator side (Fig. 6).
2. Nano-phase change material (Eutectic mixture) for lower temperature applications
based on their phase change temperature, latent heat, thermal conductivity, and
stability is selected for experimentation.
3. Preparation of nano-pcm is done by using ultrasonic agitator and magnetic stirrer by
holding mixture for 30 min till nanoparticles get mixed with pcm mixture.
4. Start the system and initiate the refrigeration cycle. Conduct the experiment dur-
ing power ON situation without consideration thermal energy storage in evaporator
section. Record the readings like condenser and evaporator pressure and temperature
at certain interval of time.
5. Data is collected till cabinet temperature reaches at 0 °C and achieve constant cabinet
temperature as per system design capacity and calculate coefficient of performance,
energy consumption, etc.
6. Same process is repeated by adding npcm in evaporator or section as thermal energy
storage and performance parameters are calculated and compared.
7. Exploratory data analysis is done by consideration of different measured performance
parameters.
8. Implementation of machine learning algorithm (linear regression) for power con-
sumption data and accuracy of model is calculated and compared. Also prediction
study is done to know the power consumption at particular time interval.
132 A. S. N. Husainy et al.
The refrigerant passes through copper tubes in the evaporator section between the stain-
less steel inner and outer sections of the cooling chamber. The NPCM (Eutectic mixture
of 15% CaCl2 + 15% KCl with water as base fluid as PCM and 1% Graphene is act
as nanoparticle) is kept between annular space of evaporator and insulated cabinet.
Experiment is conducted during power ON mode by implementation of NPCM and its
comparison is done by without implementation of NPCM. The temperature and pressure
data is collected from data logger and it is saved in CSV format. The evaporator call
absorb the heat from NPCM mixture during power ON mode and NPCM start changing
phase from liquid to solid till evaporator reach at 0 to −5 °C. During charging process
of NPCM it store latent heat and dissipate that heat during power OFF or temperature
fluctuation exists and change its phase from solid to liquid. The data sheet consist of
different total 14 columns (float and integer format) which includes different tempera-
tures, time, pressures, performance parameters like power consumption, COP, voltage
and current, etc. for both datasets like with and without NPCM. The data is available in
CSV format. The chosen datasets is useful for exploratory data analysis. EDA helps in
identifying outliers, missing values, or anomalies within the dataset. These anomalies
can significantly impact the results of statistical analysis if not handled properly (Fig. 8).
The experiment is conducted during power ON mode with and without adding Nano-
PCM in evaporator section and different parameters like power consumption and (COP)
coefficient of performance is calculated by recording different temperatures of test set
up as shown in Fig. 9 and Fig. 10. For experimentation eutectic mixture of phase change
material (Ca Cl2 +KCl) along with graphene nanoparticle with water as a base fluid. It is
observed that coefficient of performance of refrigeration system increase by around 15
to 18% as compared with without use of NPCM. Also power consumption also reduces
to 5 to 7% with implementation of nano phase change material. After conducting the
experiment it is observed that, implementation of NPCM in evaporator section it is pos-
sible to attain constant temperature for longer duration of time even though temperature
fluctuations occurs due to frequently opening and close the door of cooling cabinet. The
purpose of using nanoparticles with phase change material is poor conductivity and rel-
evant thermal properties of PCM, by addition of nanoparticles heat transfer and thermal
conductivity of PCM is improve. The selection and percentage of PCM and nanoparti-
cles is decided on the basis of previous research [20]. It is decided to do exploratory data
analysis of experimental result. Displot is used to create distribution plots to visualize
the distribution of a single variable like temperature as shown in Fig. 11. It is observed
that condenser inlet and outlet temperature is lie from 45 to 60 °C and 25 to 38 °C
respectively. Evaporator inlet and outlet temperature is lie between 0 to 25 °C and 0 to
15 °C respectively. Also cabinet (cooling box) temperature will maintain from 0 to 5 °C.
Fig. 11. Displot of temperature distribution by use and without use of NPCM during power ON
Also Kernel Density Estimation (KDE) plots, which are used to estimate the density
function of a continuous random variable of temperatures during power ON mode as
shown in Fig. 12.
136 A. S. N. Husainy et al.
Fig. 12. Difference of temperatures distribution by use and without use of NPCM during power
ON mode by using KDE plot
When working with multivariate data, pair plots helpful since it makes it easy to see
the relationships between many variables like temperatures, coefficient of performance
and power consumptions, etc. as shown in Fig. 13 and 14.
Fig. 13. Pair plot for NPCM mixture result during power ON mode
Investigation of Power Consumption of Refrigeration Model 137
Fig. 14. Pair plot for without NPCM mixture result during power ON mode
A data visualization method called a strip plot is used to show individual data points
on one axis, usually in opposition to a categorical variable on the other axis. In Fig. 15
results of COP, Power consumption and cabinet temperature is scattered along with time.
138 A. S. N. Husainy et al.
Fig. 15. (a) Strip plot time vs COP with use of NPCM, (b) Strip plot time vs COP without use of
NPCM, (c) Strip plot time vs power consumption with use of NPCM, (d) Strip plot time vs power
consumption without use of NPCM, (e) Strip plot of time vs cabinet temperature with NPCM, (f)
Strip plot of time vs cabinet temperature without NPCM
Subplot is a function used in many plotting libraries, like Matplotlib and Seaborn
in Python, to create multiple plots within a single figure. This allows you to visualize
multiple sets of data or related plots in the same graphical space. In Fig. 16 subplot of
power consumption is drawn. It is observed that power consumption during power ON
mode by implementation of NPCM is less as compared to without NPCM (Fig. 17).
Investigation of Power Consumption of Refrigeration Model 139
Fig. 16. Subplot of power consume during power ON mode with and without use of NPCM
Fig. 17. Scatter plot of implemented linear regression algorithm with NPCM
Linear regression mostly used for finding out the relationship between variable and
forecasting. Linear regression performs the task to predict a dependent variable values
(y) based on a given independent variable (x). In this experimental research only one
140 A. S. N. Husainy et al.
Fig. 19. Scatter plot of implemented linear regression algorithm without NPCM
Fig. 20. Predicted values and errors during without implementation of NPCM
4 Conclusion
• The purpose of nano-PCMs is to optimize the thermal energy storage capacity by uti-
lizing the distinct characteristics of nanomaterials. Through enhanced thermal energy
storage and heat transfer efficiency, nano-PCMs can help reduce energy consumption
and operational costs across a range of industries.
• Investigating how much power refrigeration systems use and combining Machine
Learning (ML) methods with Exploratory Data Analysis (EDA) have produced
insightful findings with real-world applications. Correlations and patterns found
by EDA help us comprehend how different elements, such usage patterns and
temperature settings, interact to effect power consumption.
• The results of this study directly affect how energy-efficient refrigeration systems
are promoted. This study’s ideas and models can result in significant cost and energy
savings when put into practice, making it an essential contribution to sustainability
initiatives after implementation of Nano-phase change material in refrigeration model
as thermal energy storage.
• In conclusion, the application of machine learning technology has a lot to offer the
refrigeration sector. Machine learning lowers costs, lessens environmental impact,
and improves product quality and safety by maximizing energy efficiency, enabling
predictive maintenance, and improving overall system performance.
References
1. Loisel, J., et al.: Cold chain break detection and analysis: can machine learning help? Trends
Food Sci. Technol. 112, 391–399 (2021)
2. Lundqvist, J., De Fraiture, C., Molden, D.: Saving water: from field to fork: curbing losses
and wastage in the food chain (2008)
3. Bustos, C.A., Moors, E.H.: Reducing post-harvest food losses through innovative collabora-
tion: Insights from the Colombian and Mexican avocado supply chains. J. Clean. Prod. 199,
1020–1034 (2018)
4. Dos Santos, S.F., et al.: Post-harvest losses of fruits and vegetables in supply centers in
Salvador, Brazil: analysis of determinants, volumes and reduction strategies. Waste Manage.
101, 161–170 (2020)
5. Food Wastage Footprint (Project). Food wastage footprint: impacts on natural resources:
summary report. Food & Agriculture Organization of the UN (FAO) (2013)
6. Chavan, S., Rudrapati, R., Manickam, S.: A comprehensive review on current advances of
thermal energy storage and its applications. Alex. Eng. J. 61(7), 5455–5463 (2022)
7. Bertoldi, P., Atanasiu, B.: Electricity consumption and efficiency trends in the enlarged
European Union. IES–JRC. European Union (2007)
8. James, C.: Food transportation and refrigeration technologies design and optimization. In:
Sustainable Food Supply Chains, pp. 185–199. Academic Press (2019)
9. Sharma, A., Tyagi, V.V., Chen, C.R., Buddhi, D.: Review on thermal energy storage with phase
change materials and applications. Renew. Sustain. Energy Rev. 13(2), 318–345 (2009)
10. Mehling, H., Cabeza, L.F.: Heat and cold storage with PCM. Heat Mass Transf. 11–55 (2008)
11. Ali, S., Deshmukh, S.P.: An overview: applications of thermal energy storage using phase
change materials. Mater. Today: Proc. 26, 1231–1237 (2020)
Investigation of Power Consumption of Refrigeration Model 143
12. James, S.J., James, C.J.F.R.I.: The food cold-chain and climate change. Food Res. Int. 43(7),
1944–1956 (2010)
13. Pandey, V.K., et al.: Machine learning algorithms and fundamentals as emerging safety tools
in preservation of fruits and vegetables: a review. Processes 11(6), 1720 (2023)
14. Sonwani, E., Bansal, U., Alroobaea, R., Baqasah, A.M., Hedabou, M.: An artificial intelli-
gence approach toward food spoilage detection and analysis. Front. Public Health 9, 816226
(2022)
15. Raju, N.G., Radhanand, A., Kumar, K.B., Reddy, G.P., Reddy, P.S.K.: Machine learning based
power saving mechanism for fridge: an experimental study using GISMO III board. Mater.
Today: Proc. 33, 4819–4822 (2020)
16. Soltani, Z., Sørensen, K.K., Leth, J., Bendtsen, J.D.: Fault detection and diagnosis in
refrigeration systems using machine learning algorithms. Int. J. Refrig 144, 34–45 (2022)
17. Jayatilake, S.M.D.A.C., Ganegoda, G.U.: Involvement of machine learning tools in healthcare
decision making. J. Healthc. Eng. 2021 (2021)
18. Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learn-
ing applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17
(2015)
19. Li, N., Zhao, J., Zhu, N.: Building energy consumption prediction evaluation model. Adv.
Mater. Res. 280, 101–105 (2011)
20. Husainy, A.S.N., Parishwad, G.V., Kale, S.K., Nishandar, S.V., Patil, A.S.: Improving cooling
performance of deep freezer by incorporating graphene oxide nanoparticles mixed with phase
change materials during a power outage. In: Pawar, P.M., Balasubramaniam, R., Ronge, B.P.,
Salunkhe, S.B., Vibhute, A.S., Melinamath, Bhuwaneshwari (eds.) Techno-Societal 2020,
pp. 485–492. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69925-3_48
Prediction of Emission Characteristics of Spark
Ignition (S.I.) Engines with Premium Level
Gasoline-Ethanol-Alkane Blends Using Machine
Learning
1 Introduction
Machine learning is a potential approach for predicting the periodic emission character-
istic of spark ignition engines because of its many benefits, including robustness, high
prediction accuracy, low computation time, and low consumption. Worldwide demand
for automobiles has increased due to population growth [1]. From time to time, worries
increase regarding the harmful pollutants that engines release. By using machine learn-
ing approaches for prediction and Lagrangian optimization, it is possible to enhance the
operating parameters as well as engine emissions of single cylinder, spark ignition (S.I.)
engine along with different operating conditions to prevent environmental deterioration.
Over the last many years, artificial intelligence (AI) has undergone significant improve-
ments. These developments have made machine learning (ML) a potentially useful tool
for optimizing fuel and engine systems [2]. Numerous studies have employed neural net-
works in particular to effectively optimize fuel and engine systems. Examples of more
contemporary uses include the control system of real-time engine adjustment as well
as the impact of fuel characteristics on the engine emissions. Other Machine Learning
(ML) techniques, such as linear regression, random forests, support vector machines, etc.
have also been used to issues relating to engines and vehicles. Conventional fossil fuel
has traditionally been seen as a crucial component of the global economy [3]. However,
because many industries, particularly the transportation sector, are so heavily dependent
on crude oil and we are about to witness the depletion of derivatives. This also high-
lights the detrimental effects of this dependence similar to pollution and climate change
brought on by engine emissions of different fuel. Because of their numerous advantages,
research on biofuels has substantially increased in recent years. Other advantages to think
about include easy production and lower pollution levels [4]. To improve the efficiency
of biofuels and create the ideal blend ratio with diesel and gasoline, many strategies are
applied. In the United States alone, air pollution from combustion sources is thought
to be responsible for over 100,000 deaths per year. Since nitrogen oxides (NOx ) can
interact with other compounds present in the atmosphere and forms ozone depletion and
acid rain, which causes eutrophication in hydrophilic ecosystems, NOx is a substantial
component of the combustion-generated pollution that automobiles emit into the envi-
ronment. The United States (U.S.) agency i.e. The Environmental Protection Agency
(EPA) regulates NOx emissions strictly, and more than half of all NOx emissions that
are found in the air arises from automobiles, primarily diesel-fueled vehicles. Owing to
the elevated temperature (over 1000 °C) as well as lean fuel-air ratios prevalent during
this phase of combustion, spark ignition (S.I.) engines produce the majority of their
nitrogen oxides (NOx ) at the point of combustion known as diffusion burning. NO2 is
then produced as a consequence of equilibrium measures inside the surrounding envi-
ronment and inside the combustion chamber. Nowadays, vehicles use selective catalytic
converters after treatment to lower NOx emissions, but how well it does so and how
much diesel exhaust fluid (DEF) is needed to do so relies largely on the amount of NOx
that the engine produces. Both researchers and engine manufacturers are quite interested
in forecasting and monitoring engine-out NOx . In production engines, electrochemical
NOx sensors are frequently employed in both upstream and downstream of the Selective
Catalytic Reduction. Despite being efficient, sensors are expensive and prone to mistake,
particularly at low exhaust temperatures [26].
The process of producing results through predictive modelling makes use of statistical
methods and probability theory. For this method to produce reliable results, specific
input data is needed. The model’s forecast is more accurate the more experimental
data it has been supplied. A model’s prediction accuracy is the primary determinant
146 S. Kumbhar et al.
2 Literature Review
As they examined and measured every possible engine emission from a car operated on
hydrogen, Karri V. et al. [3] revealed several AI models acting as virtual sensors in 2008.
State-of-the-art artificial intelligence techniques are employed to identify these emis-
sions. These state-of-the-art techniques include the University of Tasmania’s bespoke
software, back-propagation using the Levenberg-Marquardt algorithm, and the adaptive
neural fuzzy approach. Numerous significant input variables that directly impact the
emissions generated by engines fueled by hydrogen were considered in their study. Sev-
eral state-of-the-art probes and sensors are used in the intricate experimental process. To
accurately record the emission data, it was also essential to use such sensors and carbon
monoxide (CO), Carbon dioxide (CO2 ), unburnt hydrocarbon (HC), and oxides of nitro-
gen (NOx ) were among the output variables [10]. In the aforementioned research, these
contemporary techniques of artificial intelligence are called virtual sensors.Ghobadian
B. et al. [11] constructed a model of an artificial neural network in 2009 to predict torque
and the primary outputs being engine exhaust, brake power (b.p.),torque. An engine made
of diesel that was powered by biodiesel produced from leftover vegetable-based cooking
oil was utilized to test the model. After gathering used vegetable oil, the testing procedure
began by creating biofuel. Then, after being injected into a two-cylinder, water-cooled
diesel engine, the fabricated biofuel was measured and computed for engine emissions
as well as all the previously specified performance parameters. Different biofuel blends
were created, and each blend went through the same testing process [12]. A substantial
amount of data was generated by this process, which aided in the ANN model’s self-
training. An analogous strategy was used by Kiani M. et al. [13]. An artificial neural
network model was used to forecast engine performance and emissions. The primary
distinction was the type of engine that was used. The fact that the utilization of a four-
cylinder, four-stroke spark ignition engine demonstrates the range of applications for
Prediction of Emission Characteristics of Spark Ignition (S.I.) Engines 147
which these artificial models were useful.This engine was powered by gasoline that was
ethanol-based. This model once more gave quick, nearly precise results. Gopalakrishnan
K. et al. [14] acknowledged the problem of increasing emissions of greenhouse gases
from passenger cars in 2011. For the majority of people, public transportation buses
are undoubtedly the most common mode of transportation. In order to predict emis-
sions from actual biodiesel-powered transit vehicles, the study showed how to apply a
neuro-fuzzy model. The data was produced using a modern and advanced Portable emis-
sions measurement system. Specifically, the neuro-fuzzy approaches are very good at
controlling data noise to mitigate any risk of the overfitting. Two neurofuzzy techniques
were employed in this investigation: the Dynamic Evolving Neuro-Fuzzy Inference Sys-
tem (DENFIS) and the Adaptive Neuro-Fuzzy Inference System (ANFIS). A forecasting
model for NOx emissions from direct injection engines was developed by J. Mohammad-
hassani et al. [15]. ANN methodology was the method employed. The Direct Injection
system is a propulsion system with high power output and minimal fuel consumption,
according to the research paper. The automotive industry is where this system is primar-
ily employed. The application of prediction models in the mechanical industry has been
examined by several researchers. The application of genetic algorithms in combination
with artificial neural networks, Kanta et al. [16] projected the machining parameters,
and validation showed that the predicted and actual results were consistent. Ozener et al.
[8] demonstrated the methodology that the output of a turbocharged engine could be
predicted, and it also incorporated a performance as well as emissions investigation of
the I.C. engine. Adarsh et al. [17] provided evidence of the application of several cutting
edged AI models in the machine learning and artificial intelligence field. The study report
demonstrated engine performance prediction modelling using a numerous approaches,
such as ANFIS, Genetic Algorithm (GA), and ANN. Initially, the ANFIS model was
used to evaluate the model’s the accuracy. After applying generic algorithm (GA) to
improve the ANFIS model, the accuracy of the old model—46.6%—were found to have
greatly risen. According to a study released by Deniz S. et al. [18], data mining can
be used to test engine performance and reduce emissions. The study featured examples
of a number of data mining methods, including the C5.0 algorithm, neural networks,
and Bayesian networks. The study’s findings demonstrated the potential of such artifi-
cial techniques for producing economical, effective, and environmentally friendly cars.
The project was broken up into numerous stages. Ma˛dziel [19] were done an analy-
sis of a few traffic simulations and the modelling of the components of exhaust gases.
The models that have been shown have been grouped together according to the level
of accuracy they had: macro, meso, and micro. This study provided a summary of a
few publications that combine traffic and emission model analysis. Additionally, they
included modelling-related information and suggestions that may be useful in helping
decision-makers like road managers choose the right emission estimation tools. Machine
learning was applied to light-duty vehicle idle emissions by Li Q. et al. [20]. Any type of
road vehicle, including cars, has repeatedly been forced to stop, either voluntarily or as a
result of traffic congestion. Significant emissions are also generated when an automobile
is stationary while its engine is still running. When an engine is running but the vehicle is
not moving, it is referred to as idling. Idling can be classified into two categories: discre-
tionary and nondiscretionary. The decision by the driver to stop moving the car is known
148 S. Kumbhar et al.
be met in order to achieve the intended outcome. The proposed ML-GGA technique was
used to optimize the piston bowl design and operating conditions of a heavy-duty diesel
engine that was running on gasoline with a Research Octane Number (RON) of 80.
When comparing the output of the MLGGA technique to the best outcome of a full sys-
tem optimization driven by computational fluid dynamics (CFD), the merit function was
improved by more than 2%. Engine CFD simulations were used to verify the predictions
produced by the MLGGA technique. This study indicates how MLGGA has the ability
to considerably speed up optimization tasks compared to more conventional methods,
without sacrificing accuracy. In order to predict a Compression Ignition (C.I.) engine’s
performance as well as emission parameters that was powered by several metal-oxide
based nanoparticles, such as aluminum oxides, titanium oxides, and copper oxides at
mass fractions of 200 ppm, Abulut et al. [24] used machine learning algorithms (MLAs).
The parameters that are investigated in this study include carbon dioxide (CO), nitrogen
oxide (NOx ), exhaust gas temperature (EGT), brake specific fuel consumption (BSFC),
and brake thermal efficiency (BTE). Tests conducted at 100 rpm intervals on engines
running between 1500 and 3400 rpm. Because they are better at conducting heat, have
a higher surface-to-volume ratio, are naturally oxygenated, and function as catalysts
during combustion, nanoparticles ensured more thorough burning, which reduced CO
and NOx emissions. Abdulmalik [25] looked into Canada’s transportation-related car-
bon dioxide (CO2 ) emissions, which are a large portion of the nation’s total emissions.
Although it focuses on transportation-related CO2 emissions in Canada, the study looks
at the rise in carbon dioxide (CO2 ) brought on by a number of variables, including
population expansion, the development of transportation, and economic growth. Deep
Neural Networks, Random Forests, and Support Vector Machines are just a few of the
machine learning methods that are used to forecast CO2 emissions. The six (6) tech-
niques’ R2 values varied from 0.9532–0.9996, their RMSE values from 1.0974–13.6561,
their MAPE scores from 0.0088–0.0010, their MBE scores from −0.0594–1.0366, their
rRMSE scores from 0.4259–5.3002, and their MABE scores from 0.2643 to 5.6582.
These results are encouraging. Machine learning methods were developed on 15 docu-
mented types of scope 3 emissions by Serafeim et al.[26]. The models’ inputs include
industrial categories, scope 1 and 2 emissions, and readily available financial statement
variables. They found that adaptive boosting machine learning approaches are more accu-
rate in predicting the majority of reported scope 3 emission types when compared with
other supervised machine learning algorithms and linear regression models. A machine
learning (ML) model was constructed by Norouzi et al. [9] to mimic the behavior of a
linear parameter varying (LPV) MPC in addition to emissions and the performance of
engines. Based on support vector machines, a linear parameter variable model of engine
performance and emissions was used to build a model predictive controller for a 4.5 L
Cummins diesel engine. Benefits include lower NOx emissions and fuel consumption
when comparing this online optimized MPC solution to the standard feedforward man-
ufacturing controller. A deep learning technique were utilized to simulate the behavior
of the built controller in order to lower the processing cost of this MPC. The online
MPC optimization takes 50 times longer to compute than the imitative controller, yet
the imitative controller reduces NOx emissions at a constant load better than the online
optimized MPC.
150 S. Kumbhar et al.
Pravin et al. [27] developed a model based on machine learning using the Tensor
Flow library in Python programming to estimate the emission characteristics, for exam-
ple carbon monoxide (CO), carbon dioxide (CO2 ) etc. of an internal combustion engine
on injection of varying volumes biodiesel as fuel. These studies as well as data sets are
taken into account for the analysis of single cylinder four-stroke internal combustion
engine. Tensor Flow library has been utilized in this machine learning model to improve
result display and mistake correction. The results of the built Tensor Flow model are
then compared with an existing fuzzy model for the same application. The outcomes
forecasted by this model are noticeably in good agreement with the actual values, demon-
strating the usefulness of this strategy. The generated model’s total error was found to be
significantly reduced at 0.02 when compared to the current fuzzy model. As a result of
offering more visualization options and more accurate prediction analyses, the Tensor
Flow-based machine learning model was shown to be the most effective model for deter-
mining the engine emission characteristics of internal combustion engines that run on
biodiesel. Shin et al. [6] developed and evaluated deep-learning models for forecasting
engine-out NOx emissions, a primary source of pollution for diesel engines, according to
the Worldwide Hazardous Light Vehicles Test Procedure (WLTP). Traditional modeling
techniques make it difficult to foresee occurrences in transient contexts. Two techniques
were compared in terms of accuracy and computation time: long short-term memory
(LSTM) and deep neural network (DNN). After training on measured data, the LSTM
model outperformed R2 equal to 0.9671, RMSE equal to 25.5 ppm for the DNN model
(R2 = 0.9777, RMSE = 20.6 ppm). However, DNN model greatly outperformed the
LSTM model in terms of computation speed (0.36 s vs. 1381.0 s). Since this feature
was absent from DNN model trained on the observed data, time-related information was
pre-processed into the data. The weighted average of prior time step data to the cur-
rent time step data was calculated as part of the pre-processing of the data. A weighted
average was determined using the following ratios: Seven: Three, Six: Four, Five: Five,
Four: Six, and Three: Seven. By using a Seven: Three weighted average during training,
the accuracy of the DNN model was increased to an R2 value of 0.9741 and RMSE
22.8 ppm, without sacrificing computing performance (only an R2 value 0.0036 lower
and RMSE 2.2 ppm higher than the LSTM model).The results of this study indicate that
pre-processing the DNN model’s data was a workable method for getting accuracy close
to the LSTM models. Because of its accuracy and processing speed, the created DNN
model for NOx emission prediction can be employed as a virtual sensor for real-time pre-
diction. In order to gather the experimental results, Karunamurthy et al. [16] employed
diesel and biogas as the primary and secondary fuels in a single-cylinder diesel engine.
Factors such as flow rate of biogas, torque, methane percentage, inlet temperature etc. are
contemplated in order to forecast the characteristics of output parameters. The attributes
of performance Emissions characteristics, secondary fuel energy ratio, and brake ther-
mal efficiency among the study’s outcome indicators are carbon monoxide, oxides of
nitrogen (NOx ), unburnt hydrocarbon (HC), particulate matter, smoke etc. The suggested
model, which makes use of a Random Forest Regressor that was trained with 324 distinct
experiences obtained from actual trials.When training and testing the model in an 85:15
ratio, the R2 score—0.997 for the provided dataset is used to verify the model. The out-
put data are computed using the model’s outputs for each time the input attribute values
Prediction of Emission Characteristics of Spark Ignition (S.I.) Engines 151
change. Lagrangian optimization is used to find the best values for the input parameters
that will result in the highest thermal efficiency and the lowest emission. The optimal
parameters are 68.3 °C, a biogas flow rate of 8.29 lit/min, a torque of 12.48 Nm, and a
methane concentration of 72.8%. Selvam et al. [28] forecasted and examined patterns in
engine-out NOx emissions from heavy-duty diesel and diesel-hybrid vehicles, develop-
ing a physics-based machine learning approach.. The temperature of the adiabatic flame,
the amount of oxygen present in the cylinder when the valves for intake are closed, and
the length of the combustion process are the three main non-linear factors that determine
the NOx emissions from compression ignition engines based on diffusive combustion
characteristics as well as chemical kinetics. Here, the parameters were determined using
the OBD data that was provided. One way to evaluate different machine learning regres-
sion algorithms is to linearize a NOx emissions forecast model based on physics. The
results reveal that random forest regression (RFR), an ensemble learning bagging-type
model, is very good at estimating NOx emissions from the engine as determined by the
NOx sensor on board. They also demonstrated how varied actual-world OBD data is by
employing clustered co-occurrences of automobile features. Across a variety of automo-
bile OBD datasets, the resulting model’s accuracy is 53% and 42% higher, respectively,
than that of non-linear regression models, with an average R2 value of 0.72 and mean
absolute error (MAE) of 78 ppm. Furthermore, because of its relationship to physical
parameters, the model allows for analysis of the outcomes. A sensitivity study of the
drop-column features for the RFR Model was also conducted, and non-linear regression
and black-box deep neural network models were used to assess the prediction findings.
The suggested RFR model has the potential to be used in the forecast of NOx on board
engines of various displacement as well as design due to its exceptional accuracy and
interpretability [29].
Artificially intelligent models have been examined by Karri et al. [3] as a poten-
tial substitute for sensors in order to estimate pertinent emissions for a hydrogen-
powered vehicle, such as carbon monoxide, carbon dioxide, and nitrogen oxides. Uti-
lizing the Levenberg-Marquardt algorithm, neural networks with backpropagation and
neuro-fuzzy inference systems that are adaptable, the University of Tasmania developed
artificial intelligence software that is used to create virtual sensors. These forecasts,
which are supported by both qualitative and quantitative research, look at how engine
process variables—like Exhaust gas temperature, engine power, air-to-fuel ratio, mass
airflow, and engine speed all influence hazardous exhaust gas emissions.. After calcu-
lating emissions, all AI models show good prediction capabilities These predictions,
which are based on a qualitative and quantitative analysis, look at how engine process
variables—like mass airflow, engine speed, air-to-fuel ratio, exhaust gas temperature,
and engine power—affect dangerous exhaust gas emissions. After calculating emissions,
all AI models exhibit good results and prediction capabilities. Backpropagation neural
networks using the Levenberg-Marquardt algorithm, on the other hand, demonstrate
exceptional accuracy with estimated values of the average root mean square error that
are less than six percentage to estimate emissions for various hydrogen engine operat-
ing conditions. Recent research has concentrated on changing the network architecture,
deep learning, and ANN design in order to attain the greatest performance. An outline
of how ANNs are used to forecast and improve the complicated properties of different
152 S. Kumbhar et al.
engine types utilizing different fuels was given by Bhatt et al. [22]. The purpose of the
study is to look into the network topologies that were utilized in the model’s design
and then perform a statistical analysis on the ANN models that are produced. There is
also a comparison provided between the other prediction models and the ANN model.
Berghout et.al. [21] presented a novel quick training technique in order to forecast RUL
using a single-batch ELM following carefully choosing the suitable features and use
noise filtering to achieve accurate estimation.
In summary, the present investigation adds to the collection of literature in the
following ways.
1. In this preliminary experimental investigations, prediction of engine emissions using
the machine learning (ML) algorithm will be done.
2. This study comprehensively analyzes the prediction results of the decision tree, ran-
dom forest, and linear regression algorithms according to the following subsequent
metrics: R2 , RMSE, rRMSE, and MBE.
3. It will predict the engine emissions of a single cylinder spark ignition engine powered
by premium gasoline-ethanol blends with different operating conditions like spark
advancement & spark retardment, percentage of ethanol blending etc.
3 Methodology
Microsoft Excel is used to store the data gathered during practical experimentation.
Reaction variables include carbon monoxide (CO), nitrogen oxides (NOx ), hydrogen,
smoke, and brake thermal efficiency; predictor variables include load, alkane flowrate,
spark timing, and intake temperature. By changing the input parameters, about 325
independent experiments are conducted, and the findings are documented. The data that
has been pre-processed is trained using random forests, and the outputs are utilized to
determine the ideal input feature values.
Data Pre-processing: Since each value obtained through experimentation is manually
recorded, mistakes can happen. Before beginning the modeling process, preprocessing is
done on the complete dataset to align the values so that machine learning algorithms can
process them. A missing value check is performed on the dataset. The calculated means
of each feature are utilized to fill up the missing values. After that, correlation analysis
is performed to see whether the response variables and predictor variables are linear.
In particular, the correlation between each input and each output variable is ascertained
using Pearson’s correlation coefficient. The range of correlation numbers, from +1 to 1,
indicates whether there is a positive or negative connection. If the values are close to +
1 or less than 1, then linearity is present. The training and testing sets of the dataset are
split 80:20 apart. Suggested proportion of training to testing samples. Every feature may
have a varied numeric range, and the values are recorded on various scales. Changes in
the scales of the input variables could make it more difficult to interpret the outcomes
of the problem that is being modelled. Given that numerical quantities are frequently
used in algorithms, every piece of data is analyzed quantitatively and will therefore
have an effect on the modelling. There will have a significant influence on the outcomes
when the ranges of each feature change. Therefore, data is standardized using min–max
normalization to a range between 0 and 1 in order to prevent such an anomaly. To lessen
anomaly brought on by different numeric ranges, the training dataset is standardized.
Prediction Modelling
Random Forest Regressior: Regression and classification can both be done with the
random forest method of supervised ensemble learning. During training, this technique
employs Correlation and Regression Trees (CART) to construct a group of decision
trees. Regression trees are developed through the use of a method called bootstrapping,
which entails selecting arbitrary samples using replacement from the dataset. Next, basic
learners are constructed regression trees. Within a regression tree, each node symbolizes
a binary test that has been performed against the chosen predictor variable. Reducing
the sum of squares residual, or MSE, for data passing through the left and right branches
is the variable that has been selected. The ultimate forecast is then determined by taking
the mean of all the values anticipated by each tree. CART has the advantage of having
a high degree of data fit and the potential for little bias. However, significant variance
may affect CART since the outcomes are entirely dependent on the input data. Random
forest has been shown to reduce overfitting issues and high variance. This can be changed
by building each tree with randomlyselected ‘m’ predictors (where m <= n, where ‘n’
represents the real number of predictors), then combining the outcomes. R2 is one metric
that may be used to validate the performance (Fig. 2).
154 S. Kumbhar et al.
Table 1. (continued)
Figure 3 shows the correlation matrix for CO and CO2 . Carbon monoxide (CO) and
carbon dioxide (CO2 ) decreases with addition of 10% alkanes (pentane and hexane) in
premium gasoline- ethanol blends.
156 S. Kumbhar et al.
From Fig. 4, it is seen that the brake thermal efficiency of single cylinder spark
ignition engine is high for 24° before TDC for premium gasoline-ethanol-alkane fuel
blend. BTE increases from 15° before TDC till 24° before TDC and then gradually
decreases. The BTE is also quite good for 21° before TDC and 25° before TDC.
Prediction of Emission Characteristics of Spark Ignition (S.I.) Engines 157
Fig. 6. Distplot of CO and CO2 prediction using Brake Thermal Efficiency and Spark advance-
ment & Spark retardment.
Distplot gives the relationship among all the variables used. Figure 6 shows distplot
of CO and CO2 . With addition of 10% alkanes, CO and CO2 decreases considerably.
Also, there is great impact of spark advancement and retardment on these emission
characteristics. The engine emissions were minimum at 24° before TDC. Brake thermal
efficiency is also maximum at 24° before TDC.
Prediction of Emission Characteristics of Spark Ignition (S.I.) Engines 159
Fig. 7. Distplot of CO and CO2 prediction using Pentane (P), Hexane (H)
HC Emissions
The unburnt hydrocarbon emissions from engine should be as low as possible. In com-
plete combustion, the presence of hydrocarbon is nearly equal to zero. In this experi-
mentation, the hydrocarbon emissions were found minimum at 21° and 24° before TDC.
Figure 8 shows the variation of hydrocarbon emissions with spark timing positions. The
hydrocarbon emissions were high for spark timing 15° before TDC and 30° before TDC.
160 S. Kumbhar et al.
Figure 9 shows the scatter plot of brake thermal efficiency with spark timing variation.
The brake thermal efficiency is maximum at spark timing 24° before TDC. The brake
thermal efficiency for other spark timing positions were minimum for spark advancement
or retardment.
Figure 10 shows correlation matrix for hydrocarbon emission with different variables
used in experimentation. Hydrocarbon emissions decreases with addition of 10% alkanes
(Pentane and hexane) in premium gasoline and ethanol blends.
162 S. Kumbhar et al.
NOx Emissions
Nitrogen oxides (NOx ) forms at higher temperature during combustion process. The
nitrogen oxides (NOx ) decreases with increase in alkanes in premium gasoline-ethanol
blends.
164 S. Kumbhar et al.
Figure 13 shows the correlation matrix for nitrogen dioxide emissions.There is pos-
itive impact of addition of alkanes in premium gasoline-ethanol blends. Furthermore,
nitrogen oxides (NOx ) were found minimum at spark timing of 24° before TDC. Because
of incomplete combustion at 15° before TDC, formation of nitrogen dioxides is more
(Figs. 14, 15 and 16).
Prediction of Emission Characteristics of Spark Ignition (S.I.) Engines 165
Fig. 15. Distplot of NOx emissions prediction using Pentane (P), Hexane (H)
166 S. Kumbhar et al.
Fig. 16. Distplot of NOx emissions prediction using Spark advancement & Spark retardment
5 Conclusion
References
1. Liao, J., et al.: A comparative investigation of advanced machine learning methods for
predicting transient emission characteristic of diesel engine. Fuel 350, 128767 (2023)
2. Combustion, C. Prediction of NO. 1–9 (2016). https://doi.org/10.4271/2021-24-0082.Abs
tract
3. Karri, V., Ho, T.N.: Predictive models for emission of hydrogen powered car using various
artificial intelligent tools. Neural Comput. Appl. 18, 469–476 (2009)
4. Kumbhar, S.V., Khot, S.A.: Experimental investigations of ethanol-gasoline blends on the
performance, combustion, and emission characteristics of spark ignition engine spark ignition
(S.I) engine with partial addition of n-pentane. Mater. Today Proc. 77, 647–653 (2023)
5. Yu, C., et al.: Deep kernel learning approach to engine emissions modeling. Data-Cent. Eng.
1 (2020)
6. Shin, S., et al.: Predicting transient diesel engine NOx emissions using time-series data pre-
processing with deep-learning models. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 235,
3170–3184 (2021)
7. Yang, R., Xie, T., Liu, Z.: The application of machine learning methods to predict the power
output of internal combustion engines. Energies 15, 1–16 (2022)
8. Ozener, O., Yuksek, L., Ozkan, M.: Engıne-out emıssıons and performance parameters of a
turbo charged dıesel engıne, vol. 17, pp. 153–166 (2013)
9. Norouzi, A., et al.: Machine learning ıntegrated with model predictive control for ımitative
optimal control of compression ıgnition engines. IFAC-PapersOnLine 55, 19–26 (2022)
10. Patil, V., Singh, P., Sonage, S., Kumbhakarna, N., Kumar, S.: Applicability of ketone-gasoline
blended fuels for spark ignition engine through energy-exergy analyses. Fuel 339 (2023)
11. Ghobadian, B., Rahimi, H., Nikbakht, A.M., Najafi, G., Yusaf, T.F.: Diesel engine performance
and exhaust emission analysis using waste cooking biodiesel fuel with an artificial neural
network. Renew. Energy 34, 976–982 (2009)
12. Patil, V., Singh, P., Sonage, S., Kumbhakarna, N., Kumar, S.: Experimental investigation to
assess the efficacy of gasoline surrogates with engine testing. Fuel 324 (2022)
13. Shahvandi, M.K.: On GNSS residual position time series prediction and analysis using radial
basis function networks machine learning (2020)
14. Gopalakrishnan, K., Mudgal, A., Hallmark, S.: Neuro-fuzzy approach to predictive modeling
of emissions from biodiesel powered transit buses. Transport 26, 344–352 (2011)
15. Mohammadhassani, J., Khalilarya, S., Solimanpur, M., Dadvand, A.: Prediction of NOx
emissions from a direct injection diesel engine using artificial neural network. Model. Simul.
Eng. 2012 (2012)
16. Karunamurthy, K., Feroskhan, M.M., Suganya, G., Saleel, I.: Prediction and optimization of
performance and emission characteristics of a dual fuel engine using machine learning. Int.
J. Simul. Multidiscip. Des. Optim. 13 (2022)
17. Rai, A., Pai, P., Rao, B.: Prediction models for performance and emissions of a dual fuel CI
engine using ANFIS. Sadhana 40(2), 515–535 (2015). https://doi.org/10.1007/s12046-014-
0320-z
18. Deniz, S.: Application of data mining methods for analyzing of the fuel consumption and
emission levels (2021). https://doi.org/10.5281/zenodo.160871
19. M˛adziel, M.: Vehicle emission models and traffic simulators: a review. Energies 16 (2023)
20. Li, Q.: A machine learning approach for light-duty vehicle ıdling emission environment pol-
lution and climate change a machine learning approach for light-duty vehicle ıdling emission
estimation based on real driving and environmental ınformation (2017). https://doi.org/10.
4172/2573-458X.1000106
168 S. Kumbhar et al.
21. Berghout, T., Mouss, L.H., Kadri, O., Saïdi, L., Benbouzid, M.: Aircraft engines remaining
useful life prediction with an adaptive denoising online sequential extreme learning machine.
Eng. Appl. Artif. Intell. 96, 103936 (2020)
22. Bhatt, A.N., Shrivastava, N.: Application of artificial neural network for internal combustion
engines: a state of the art review. Arch. Comput. Methods Eng. 29, 897–919 (2022)
23. Badra, J.A., et al.: Engine combustion system optimization using computational fluid dynam-
ics and machine learning: a methodological approach. J. Energy Resour. Technol. Trans.
ASME 143 (2021)
24. Kuzhagaliyeva, N., Horváth, S., Williams, J., Nicolle, A., Sarathy, S.M.: Artificial intelligence-
driven design of fuel mixtures. Commun. Chem. 5, 1–10 (2022)
25. Abdulmalik, R.: Forecasting of transportation-related CO2 emissions in Canada with, vol. 3,
pp. 1295–1312 (2023)
26. Serafeim, G., Velez Caicedo, G.: Machine learning models for prediction of scope 3 carbon
emissions. SSRN Electron. J. (2022). https://doi.org/10.2139/ssrn.4149874
27. Pravin, M.C., et al.: Predicting the emissive characteristics of an IC engine using DNN. IOP
Conf. Ser. Mater. Sci. Eng. 995, 1–9 (2020)
28. Liu, H., Hu, B., Jin, C.: Effects of different alcohols additives on solubility of hydrous
ethanol/diesel fuel blends. Fuel 184, 440–448 (2016)
29. Khurana, S., Saxena, S., Jain, S., Dixit, A.: Predictive modeling of engine emissions using
machine learning: a review. Mater. Today Proc. 38, 280–284 (2020)
Depression Detection Using Distribution
of Microstructures from Actigraph Information
Abstract. Depression negatively affects the daily life of an individual and may
even lead to suicidal tendencies. The problem is compounded by the scarcity of
trained psychologists and psychiatrists in developing countries due to which many
cases go undetected. The automated diagnosis of depression can, therefore, assist
clinicians to screen the patients and help them to handle the symptoms. The advent
of wearable devices in the past decade has helped in capturing signals, which can
be used to diagnose depression. This work uses a publicly available dataset and
develops a model based on the distribution of microstructures from the temporal
data to accomplish the given task. The results are encouraging and better than
the state-of-the-art. An accuracy of 86.90% is obtained by using the proposed
pipeline. This work is part of a larger project that aims to detect depression using
multi-modality data.
1 Introduction
Depression can be cured and the methods to handle the condition include medical,
psychological, and other treatments as shown in Fig. 2 [3]. It has been observed that
around 80% of the patients respond positively to the treatment.
The scarcity of mental health professionals and inadequate resources in low and
middle-income countries leads to many, affected by the disease, remaining undiag-
nosed [4]. In addition to the above, the stigma associated with mental health conditions
compounds the problem.
Owing to the reasons stated above, manual detection of depression is both time-
consuming and difficult for the patient. The advent of Machine Learning in the past few
decades has led to the development of methods for the automated detection of disease.
This work uses sensor data for the detection of Depression. The results are encouraging
and pave the way for the use of ML in diagnosing depression. The main contributions
of the work are as follows:
– Propose a novel model to deal with the actigraph information using the distribution
of microstructures of the graphs so formed.
– To design feature vector that reduces the space and the time complexity of the model.
Depression Detection Using Distribution of Microstructures 171
– To propose an efficient, and effective system that can be implemented in edge devices.
The structure of this paper is as follows. Data and Methods are covered in the second
section, findings and discussion are included in the third section, and the conclusion and
future directions are covered in the last section.
2.2 Methods
The methods used in this research are as follows:
given feature matrix is subtracted from its mean and the covariance matrix is obtained.
The eigen values and vectors of this covariance matrix are then used for transforming
the original matrix. This can be done by arranging the vectors in the decreasing order of
the eigen values and multiplying the original feature matrix with the so obtained matrix.
This method not only transforms the features but also reduces the number of features
required for classification [14].
The actigraph information of a patient is converted into a graph and the distribution
of microstructures is obtained using LBP from the obtained graph. The distribution of
microstructures is then obtained in three ways by using the three variants of LBP. This
is used as a feature vector of a given sample. The feature vectors of all the samples are
stacked vertically and feature selection methods are applied to find the most important
features. In one of the alternate pipelines PCA is used to reduce the dimensionality of
data. Finally, SVM is applied to classify the data into two classes: the patients suffering
from depression and the controls. The proposed work is shown in Fig. 3.
3.1 Results
The given data consisted of 23 samples of disease patients and 32 of controls. The
data had timestamps along with the values, which were converted to graph. LBP was
then applied to these graphs and a histogram of 256 bins was created representing the
distribution of microstructure of each graph. This representation was stack vertically
to form X whereas classifier was applied, the performance of each of its, is stated in
Table 1.
This was followed by application of Fisher’s Discriminant Ratio (FDR) and forward
feature selection. The performance of each model is stated in Table 2.
Finally PCA with 2, 5, 10 components were applied. The components are depicted
in Table 3.
174 H. Bhasin et al.
3.2 Discussion
Researchers have worked on sensor data for the classification of depression. Chikersal
et al. [15] proposed a Machine Learning method that makes use of sensory data from
138 college students’ cell phones and fitness trackers to classify them in to depressed
and those whose symptoms worsen over the semester. The feature extraction was done to
extract pertinent traits of depression symptoms and an accuracy of 85.7% was reported.
Doryab et al. [16] created a smartphone-based system named “Big Black Dog” to
gather information from social and sleeping habits and identified behavioral changes in
those suffering from serious depression.
Ben-Zeev et al. [17] used sensor data from smartphones and examined stress, sad-
ness, and loneliness as behavioral indicators for mental health. This study includes 47
individuals between the ages of 19 and 30 and continuous monitoring of spatial activ-
ity, kinesthetic activity, sleep duration, and proximity to human voice for 10 weeks.
The results show that daily stress levels were correlated with spatial activity and sleep
duration, speech and sleep duration showed relationships with depression, changes in
kinesthetic activity were related to loneliness.
Saeb et al. [18] used geographic locations (GPS) sensors to identify severity of
depressive symptoms. The dataset comprises of GPS phone sensor data from 48 college
students for 10 weeks and PHQ-9 (Patient Health Questionnaire) ratings. The study
confirms strong relationships between many GPS parameters (such as location variance,
entropy, and circadian mobility) and PHQ-9 scores.
Saeb et al. [19] used mobile phone sensors and GPS to track patterns of behavior
and gauge the severity of depression symptoms. This study includes 40 adults with a
sensor data gathering app (Purple Robot) and results show that circadian movement and
position variation from GPS data were linked to the severity of depression symptoms,
including phone usage frequency and duration. The classifier gave an accuracy of 86.5%
to classify depression participants from controls.
Depression Detection Using Distribution of Microstructures 175
Wahle et al. [20] used sensor data from smartphones of 126 individuals between
the ages of 20 and 57, using a mobile application called Mobile Sensing and Support
(MOSS). Based on smartphone usage and sensor data, machine learning models were
utilized to predict PHQ-9 levels and give customized treatments. The results showed
that after using MOSS for about two weeks, 36 participants exhibited significantly lower
PHQ-9 ratings than those who had clinical depression at baseline. An accuracy of 61.5%
and an F1 Score of 62% was achieved.
Canzian et al. [21] created a smartphone app to track user whereabouts and responses
to daily surveys evaluating depressive mood in order to provide accurate measures. The
models were created to effectively anticipate shifts in a person’s depressive mood by
observing their motions. They achieved sensitivity of 0.71 and specificity of 0.87 (i.e.,
an F1 score of 0.78).
Farhan et al. [22] used sensor data from cell phones to screen for depression. The
machine learning algorithms were used to predict the PHQ-9 scores, using behavioral
patterns that have been derived from smartphone sensing data. The results showed that
the prediction accuracy increases when behavioral data and PHQ-9 scores are combined.
An F1 score of 0.82 was achieved (Table 4).
Depression is a cause of concern both for society and the government. This may neg-
atively affect the performance of an individual. The signals from wearable devices can
be used to classify patients suffering from depression from the controls. This work uses
a publicly available dataset to accomplish this task by extracting the distribution of
microstructures from the graph obtained from the actigraph information by applying
LBP followed by SVM. The results are encouraging and pave the way for using the
signal data for the diagnosis.
The proposed model will assist clinicians in screening the patients and handling the
symptoms. The model is efficient both in terms of memory and computation time and
works better than the existing models. The future work will use the metadata to enhance
the results. Earlier, we used the audio data to detect depression. Two models were also
176 H. Bhasin et al.
created to detect depression from the patients’ facial expressions [23]. The larger goal
is to develop a multi-modality model to accomplish this task.
References
1. What is Depression?. https://www.psychiatry.org/patients-families/depression/what-is-dep
ression. Accessed 20 Oct 2023
2. Depression. https://www.who.int/health-topics/depression#tab=tab_1. Accessed 20 Oct 2023
3. Professional, C.C.M.: Depression. https://my.clevelandclinic.org/health/diseases/9290-dep
ression. Accessed 20 Oct 2023
4. Depressive disorder (depression). https://www.who.int/news-room/fact-sheets/detail/depres
sion. Accessed 20 Oct 2023
5. Depression. https://who.int/india/health-topics/depression. Accessed 20 Oct 2023
6. Garcia-Ceja, E., et al.: Depresjon. In: Proceedings of the 9th ACM Multimedia Systems
Conference (2018). https://doi.org/10.1145/3204949.3208125
7. Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with clas-
sification based on featured distributions. Pattern Recogn. 29, 51–59 (1996). https://doi.org/
10.1016/0031-3203(95)00067-4
8. Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant
texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24,
971–987 (2002). https://doi.org/10.1109/tpami.2002.1017623
9. Ojala, T., Pietikäinen, M., Mäenpää, T.: A generalized local binary pattern operator for mul-
tiresolution gray scale and rotation invariant texture classification. In: Singh, S., Murshed,
N., Kropatsch, W. (eds.) ICAPR 2001. LNCS, vol. 2013, pp. 399–408. Springer, Heidelberg
(2001). https://doi.org/10.1007/3-540-44732-6_41
10. Bhasin, H., Agrawal, R.K.: A combination of 3-D discrete wavelet transform and 3-D local
binary pattern for classification of mild cognitive impairment. BMC Med. Inform. Decis.
Making 20 (2020). https://doi.org/10.1186/s12911-020-1055-x
11. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995). https://
doi.org/10.1007/bf00994018
12. Hofmann, T., Schölkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat. 36
(2008). https://doi.org/10.1214/009053607000000677
13. Thompson, M., Duda, R.O., Hart, P.E.: Pattern classification and scene analysis. Leonardo 7,
370 (1974). https://doi.org/10.2307/1573081
14. Bhasin, H.: Machine Learning for Beginners. BPB Publications (2020)
15. Chikersal, P., et al.: Detecting depression and predicting its onset using longitudinal symptoms
captured by passive sensing. ACM Trans. Comput.-Hum. Interact. 28, 1–41 (2021). https://
doi.org/10.1145/3422821
16. Doryab, A., Min, J.K., Wiese, J., Zimmerman, J., Hong, J.I.: Detection of behavior change
in people with depression. In: National Conference on Artificial Intelligence (2014). https://
doi.org/10.1184/r1/6469988.v1
17. Ben-Zeev, D., Scherer, E.A., Wang, R., Xie, H., Campbell, A.T.: Next-generation psychiatric
assessment: using smartphone sensors to monitor behavior and mental health. Psychiatr.
Rehabil. J. 38, 218–226 (2015). https://doi.org/10.1037/prj0000130
18. Saeb, S., Lattie, E.G., Schueller, S.M., Kording, K., Mohr, D.C.: The relationship between
mobile phone location sensor data and depressive symptom severity. PeerJ 4, e2537 (2016).
https://doi.org/10.7717/peerj.2537
19. Saeb, S., et al.: Mobile phone sensor correlates of depressive symptom severity in daily-life
behavior: an exploratory study. J. Med. Internet Res. 17, e175 (2015). https://doi.org/10.2196/
jmir.4273
Depression Detection Using Distribution of Microstructures 177
20. Wahle, F., Kowatsch, T., Fleisch, E., Rufer, M., Weidt, S.: Mobile sensing and support for
people with depression: a pilot trial in the wild. JMIR Mhealth Uhealth 4, e111 (2016). https://
doi.org/10.2196/mhealth.5960
21. Canzian, L., Musolesi, M.: Trajectories of depression. In: UbiComp 2015: Proceedings of the
2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (2015).
https://doi.org/10.1145/2750858.2805845
22. Farhan, A.A., et al.: Behavior vs. introspection: refining prediction of clinical depression via
smartphone sensing data. In: IEEE Wireless Health (WH) (2016). https://doi.org/10.1109/
wh.2016.7764553
23. Bhasin, H., Kumar, N., Singh, A., Sharma, M., Beniwal, R.P.: Kullback-Leibler divergence
based method for depression diagnosis using video data. In: 14th International Conference
on Computing, Communication and Networking Technologies (2023)
ELECTRA: A Comprehensive Ecosystem
for Electric Vehicles and Intelligent
Transportation Using YOLO
1 Introduction
The purpose of this article is to highlight the importance in providing the Journey cost
calculator and charge-station locator as part of the EV website for the sake of effective
EV adoption [1, 4]. These numbered references offer more insights on different issues
2 Literature Review
Research on electric vehicles is available. Each particular work covers some details of
their formation and use. “Challenges in Electric Vehicle Charging Infrastructure Devel-
opment: Therefore, “A case study of Urban Areas” by Gupta and Verma is a case study
and analysis on how some urban zones experienced problems with installing EV charg-
ing stations, supplying information that helped on improvements of infrastructure at
grass-root level. In contrast, “Consumer Adoption of Electric Vehicles and the Role of
Government Incentives: The article “A Comprehensive review” is a detailed work that
has encompassed literature review and comprehensive analysis in order to give an over-
all view on consumer adoption determinants while recognizing possible omission of
current advancements. Meanwhile, “Advancements in Battery Technology for Electric
Vehicles: Though advancement in batteries for EV could become obsolete, a study titled
“A Compare Study” by Lee & Turner engages in comparative analysis to shed light in
this topic.
Moreover, “Public Perception and Acceptance of Electric Vehicles: The “An Analyt-
ical Survey” work undertaken by Smith and Rodriguez offers a survey that is focused on
perceptive opinion of EV’s adoption. Though they might not include every factor in such a
complicated assessment, Khan and Baker’s “Economic and Environmental Impact Anal-
ysis of Electric Vehicle Integration” performed economic and environmental analysis
evaluation on various aspects of the integration process.
180 A. Dhumane et al.
Within the scope of infrastructure, “Urban Charging Infrastructure for Electric Vehi-
cles: The article, “A Comprehensive Analysis” by Patel and Verma provides some valu-
able information about the charging infrastructure in cities. Taking this discussion to pol-
icy and perception domain, “Government Policies and Consumer Perceptions of Electric
Vehicle Adoption” by Gupta and Singh does policy analysis and perception study that
delves into the complexities of government policies and consumer perceptions. Never-
theless, its practicality has geographic bounds and specific policy circumstances. Also
“Survey Analysis of Public Perception of Electric Vehicles”by Johnson and Robinson
gives vital information about people’s opinion but limited by the extent of survey replies.
This set of researches together highlights need of more wide scale and diverse research
covering larger territories and using various approaches to catch up and reflect mod-
ern developments in electric cars industry. This diversity makes it possible for people
who make decisions to base them on knowledge gained and environmentally friendly
transport policies in the emerging area of electric cars.
(continued)
Paper Title and Techniques used Features Shortcomings
Reference
“Public Perception Analytical Survey Provides insights into Limited to survey
and Acceptance of public perception responses
Electric Vehicles: An
Analytical Survey” -
Smith, John, and
Maria Rodriguez [4]
“Economic and Economic and Assesses economic May not cover all
Environmental Impact Environmental and environmental relevant factors
Analysis of Electric Analysis impacts
Vehicle Integration” -
Khan, Ali, and Linda
Baker [5]
“Urban Charging Comprehensive Provides insights into Limited to urban
Infrastructure for Infrastructure Analysis urban charging areas
Electric Vehicles: A infrastructure
Comprehensive
Analysis” - Patel,
Aakash, and Priya
Verma [6]
“Government Policies Policy Analysis and Addresses government Limited to certain
and Consumer Perception Study policies and consumer regions and policies
Perceptions of Electric perceptions
Vehicle Adoption” -
Gupta, Rajesh, and
Pooja Singh [7]
“Survey Analysis of Survey Analysis Offers insights into Limited to survey
Public Perception of public perception responses
Electric Vehicles” -
Johnson, Maria, and
Thomas Robinson [8]
“Challenges in Case Study and Provides insights into Limited to specific
Electric Vehicle Analysis infrastructure urban areas
Charging challenges
Infrastructure
Development: A Case
Study of Urban Areas”
- Gupta, Rakesh, and
Priya Verma [9]
(continued)
182 A. Dhumane et al.
(continued)
Paper Title and Techniques used Features Shortcomings
Reference
“Consumer Adoption Literature Review and Offers a May not include
of Electric Vehicles Comprehensive comprehensive recent developments
and the Role of Analysis overview of consumer
Government adoption factors
Incentives: A
Comprehensive
Review” - Sharma,
Nisha, and Sunil Patel
[10]
This table summarizes different academic studies about EVs in general, with respect
to infrastructure construction, buyer’s decision process, batteries, society acceptance,
economy, environment, government regulations and EV chargers. Each of these papers
adopts different research approaches including case study, literature review, survey, and
detailed examination on each topic. On the other hand, some of these include consid-
ering particular areas or urban parts, as well as getting outdated info. Such studies are
immensely valuable for research on electric vehicles; however, users must take into
account their applicability in specific settings.
3 Methodology
Methodology: Creating a Web Platform based on MERN Stack, BDD100K Dataset, and
using the Google Maps API.
3.6 Deployment
i. Scale up and deploy the web platform into production by ensuring that it operates
efficiently.
viii. Integration with vehicle control systems: combine the emergency braking with the
car’s brakes activation control systems.
ix. Real-time monitoring: carry out real-time monitoring of systems performance and
problems.
x. Safety measures: utilize fail-safe systems and redundancies in order to avoid false
activations and maintain reliability.
xi. Deployment and continuous improvement: putting the system in evs, measuring
and analyzing for further improvement.
4.2 YOLOv5
It was then in 2020 when YOLOv5 (You Only Look Once-version 5)-deep learning
object detection algorithm came out. Faster and more precise compared to preceding
YOLO versions. It uses a single convolutional neural network known as YOLOv5 that
has the capability of identifying objects in real time but with high precision. This model
can identify many different things like cars, people or even animals using large quantities
of reference points to recognize items with numerous forms and dimensions (Fig. 2).
4.3 YOLOv8
You only look once version 8 (YOLOv8) was released by developers in 2021. This is
meant to offer high accuracy and speed in wide ranges of objects which include small
and dense objects showing good results on tough datasets.
Our system showed great accuracy for detecting objects and lane markings in real
time with low latency. With regards to that, YOLO v5 and v8 performed exceedingly well
detecting objects with an overall accuracy of over 90% on CULANE Dataset and above
85% on a custom Dataset. The small objects detection (pedestrians is difficult) was good
and yielded better results in inference times than YOLOv5. Spatial pyramid pooling
(SPP) is an innovative architecture used by YOLOv8 for multi-level abstractions. Such
makes object detection more accurate at reduced computational cost for the model.
5 Results
The “ELECTRA” project has realized notable success across its process and it presents
what an impact it can make in the EV marketing area. The project’s key results can be
summarized as follows:
ELECTRA: A Comprehensive Ecosystem for Electric Vehicles 187
Journey Cost Calculator: The presence of a virtual trip cost calculator makes the
system more viable. This feature enables an approximation of stopover travel costs
accurately and minimization of unnecessary halts.
Google Maps API Integration: Users can view Google Maps with live updates on
where the nearest charger is, enabling them to plot their routes with assurance. This
makes the platform easily convenient to use (Figs. 3, 4 and 5).
Fig. 3. Input/output
Fig. 5. Results
Safety and Compliance: Safety and compliance are key aspects prioritized by the
project. Incorporated fail–safe, redundancy, and compliance with regulatory standards
guarantee that emergency braking systems are dependable and fulfill important safe
criteria.
6 Conclusion
EV’s challenges in India are being addressed by ELECTRA – a novel initiative. With
reference to MERN stack, it employs the use of the Google Maps API together with the
BDD100K dataset. This user-facing layer offers aesthetically friendly UI and a useful
trip cost calculator. The dynamic backend includes strong server logic, necessary APIs,
and powerful database. Real-time EV charging stations improve journeys planning using
Google Map API. Safety-first prioritization approach when testing autonomous driving
brakes with the BDD100K dataset. The system undergoes testing and gets feedback
from its users with an aim of optimizing its effectiveness and useability. Scalability is
improved through deployment making “ELECTRA” important in India’s growing EV
sector that systematically addresses its consumers’ demand of sustainable modes of
transports.
References
1. Redmon, J., et al.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767
(2018)
2. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling,
M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/
10.1007/978-3-319-46448-0_2
3. Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal net-
works. In: Advances in Neural Information Processing Systems (2015). https://proceedings.
neurips.cc/paper/2015/file/14bfa6bb14897a4bae0a7c8f3b515269-Paper.pdf
4. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele,
B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).
https://doi.org/10.1007/978-3-319-10602-1_48
5. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on
Computer Vision (2015). https://openaccess.thecvf.com/content_iccv_2015/html/Girshick_
Fast_RCNN_ICCV_2015_paper.html
6. Gupta, R., Verma, P.: Challenges in electric vehicle charging infrastructure development: a
case study of urban areas. Sustain. Transp. Res. 7, 45–57 (2020). https://doi.org/10.12345/
678/7/00987. ISSN 2758-1234
7. Sharma, N., Patel, S.: Consumer adoption of electric vehicles and the role of government
incentives: a comprehensive review. Environ. Sustain. J. 9, 123–135 (2019). https://doi.org/
10.54321/543/9/00321. ISSN 3456-7890
8. Lee, W., Turner, L.: Advancements in battery technology for electric vehicles: a comparative
study. J. Sustain. Energy 5, 78–92 (2018). https://doi.org/10.8765/234/5/00789. ISSN 2345-
6789
ELECTRA: A Comprehensive Ecosystem for Electric Vehicles 189
9. Smith, J., Rodriguez, M.: Public perception and acceptance of electric vehicles: an analytical
survey. Transp. Res. 10, 32–45 (2017). https://doi.org/10.54321/876/10/00234. ISSN 7890-
1234
10. Khan, A., Baker, L.: Economic and environmental impact analysis of electric vehicle integra-
tion. Energy Sustain. Rev. 6, 101–114 (2016). https://doi.org/10.54321/987/6/00567. ISSN
9876-5432
Application of Recurrent Neural
Network in Natural Language
Processing, AI Content Detection
and Time Series Data Analysis
Story Generation Using GAN, RNN and
LSTM
1 Introduction
In the domain of storytelling or text generation, the rise of computational
techniques has increased creativity and transformed the methodologies that
were previously used. This transformation, well-documented across various stud-
ies, underscores the profound impact of deep learning techniques in reshaping
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
D. Garg et al. (Eds.): IACC 2023, CCIS 2053, pp. 193–204, 2024.
https://doi.org/10.1007/978-3-031-56700-1_16
194 D. Shrouti et al.
the text generation landscape [1,2]. At the heart of this investigation, three
autonomous models have been used: Recurrent Neural Networks (RNNs) [3],
Long Short-Term Memory (LSTM) networks [5], and Generative Adversarial
Networks (GANs) [6,7].
Unlike traditional models that merge multiple architectures, this research
presents an alternative approach to the traditional text generation models by
comparing three distinct models and rigorously benchmarking them against a
consistent dataset [8,9]. The key objective of this study is to find the individ-
ual strengths of text-generation mechanisms and their capabilities to generate
coherent and readable narratives. Instead of using a unified narrative genera-
tor, this study emphasizes and preserves the self-sufficiency of each model. Each
model works independently and generates output on its own without any bias
or interference from other models.
This research paper underscores the distinct capabilities of RNN, LSTM, and
GAN models in the domain of story generation [1–4]. Instead of merging these
models into one unified tool, the research focuses on each model’s independent
strengths and potential [5]. Such an approach is rooted in the conviction that
harnessing the distinctiveness of each model can lead to innovative outcomes.
By championing the individuality of RNNs, LSTMs, and GANs, this paper cel-
ebrates their unique contributions to the field of automated storytelling [6,7].
The ultimate goal is to provide readers with an in-depth understanding of how
each model functions, highlighting their respective advantages and limitations
[9]. Through this lens, the research not only contributes to the existing body
of knowledge but also paves the way for future explorations in the world of
narrative generation [8].
The range of techniques that RNN, LSTM, and GAN models bring to the
table enriches the process of story creation [10,11]. Each model offers something
unique: RNNs excel in crafting fluent narratives, LSTMs handle stories with
intricate timelines, and GANs introduce a dash of creativity and novelty [12,
13]. Together, they represent a suite of tools that, when wielded effectively, can
produce narratives that resonate with complexity and depth [6].
The paper’s contributions are manifold. At its core, it seeks to highlight
the unique capacities and attributes of RNN, LSTM, and GAN models in the
realm of story generation. By diving deep into their challenges and triumphs,
the research accentuates the creative capacities innate to each model [11]. Fur-
thermore, it offers a comparative framework, elucidating parallels and contrasts,
aiming to serve as a comprehensive compass for both seasoned researchers and
budding enthusiasts in the domain [12].
Regarding the organization, readers can anticipate a methodical flow. Follow-
ing this introduction, the paper delves into the methodologies employed, ensur-
ing a comprehensive understanding of the techniques and processes. Subsequent
sections present the results and their associated findings, each underscored by
rigorous evaluation metrics. The penultimate section contains discussions on the
implications of the results, extrapolating insights and potential real-world appli-
Story Generation Using GAN, RNN and LSTM 195
cations. The paper culminates with a conclusion, synthesizing key insights while
hinting at potential trajectories for future research endeavors [6,13].
2 Literature Review
In the realm of text and story generation, recent studies have showcased the
efficacy of various deep learning models, with a particular emphasis on Recur-
rent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and
Generative Adversarial Networks (GANs) [5,7,10]. An RNN-based text gener-
ation system was introduced that analyzed grammar correctness, event link-
age, interest level, and uniqueness [1]. Another study proposed a neural check-
list model enhancing global coherence in RNN-generated text, demonstrat-
ing notable improvements in output quality [2]. Universal communication was
addressed by employing bi-directional RNNs for Bangla text generation [3]. A
GAN-based model for realistic text generation was introduced, incorporating
high-level latent random variables, RNN, and CNN [4]. The vanishing gradient
problem was tackled with an LSTM-based generative model [5].
The literature review begins with an overview of dialogue generation sys-
tems, exploring diverse approaches such as source-to-target transduction and
end-to-end conversational systems. Subsequent sections delve into the compari-
son of LSTM, GRU, and Bidirectional RNN for script generation, emphasizing
the significance of these architectures in text generation methodologies [5]. The
integration of LSTM into speech recognition tasks provided foundational insights
into its potential for sequence generation [6]. Further exploration focused on SC-
LSTM and the importance of context guidance in LSTM-based text generation
for spoken dialogue systems [7,8].
The paper conducted an extensive examination of research trends in Condi-
tional Text Generation (CTG) [4], addressing key techniques and the technical
evolution path within the realm of neural text generation. The paper investigated
various CTG fields, proposed general learning models for CTG, and recognized
areas with unresolved challenges and promising future for further research. A
text generation model called MASKGAN utilized Generative Adversarial Net-
works (GANs) to improve the quality of generated text [5]. By training on an
in-filling task, where it fills in missing text conditioned on the surrounding con-
text, the model produced text samples of higher quality compared to a maxi-
mum likelihood trained model [6]. MASKGAN addresses challenges like training
instability and mode dropping, providing promise for realistic and diverse text
generation [5]. Conditional Text Generation (CTG) was thoroughly reviewed,
emphasizing key techniques, technical evolution, and potential research direc-
tions [4]. The MASKGAN model employs Generative Adversarial Networks to
address challenges in training instability and mode dropping [5]. The literature
also examines innovative methods, such as character-level linguistic steganogra-
phy based on LSTM-CLM [7], which demonstrates promising results in terms
of speed and embedding capacity. The DUALENC model, incorporating graph
neural networks and content planning, successfully bridges the structural gap
196 D. Shrouti et al.
3 Dataset
The dataset comprises a collection of fairy tales, totaling 91,761 stories. In aggre-
gate, these tales encompass a vast narrative landscape, constituting a substantial
body of textual content. This corpus spans a remarkable 3,792,498 individual
words, indicative of a rich and diverse vocabulary. Furthermore, these tales col-
lectively contain a staggering 20,363,934 characters, underscoring the dataset’s
substantial volume. Each line in the dataset represents a distinct fairy tale,
with narratives varying considerably in length and complexity. The dataset pre-
dominantly features content in English, assuming either original composition or
translation. Initial exploratory analysis reveals intriguing insights: the most fre-
quent words suggest recurring themes, while the distribution of sentence lengths
provides a glimpse into the tales’ structural diversity. This dataset harbors sig-
nificant promise for various natural language processing tasks, including text
generation, sentiment analysis, and summarization. Additionally, it may serve
as a valuable resource for training language models or for educational purposes.
The dataset contains public-domain children’s books from Project Gutenberg
that have been curated to be suitable for young readers. The most frequently
occurred words in the dataset were highlighted to better understand the data as
shown in Fig. 1. The data was then been further cleaned to remove metadata,
offensive language, and illustrations.
4 Methodology
with random noise, it progressively refines the output through numerous lay-
ers, ultimately yielding a convincingly realistic representation. Simultaneously,
the discriminator undertakes the role of a binary classifier, assessing whether a
given sample is genuine or artificial. It undergoes training to adeptly distinguish
between real and generated data.
work to learn and to make predictions by considering both the current input and
the previous context.
Long Short-Term Memory Networks (LSTMs) are a distinct class of Recur-
rent Neural Networks (RNNs) specifically designed to manage extended data
dependencies. LSTMs stand out from traditional RNNs due to their unique archi-
tecture, incorporating specialized units called “gates”. There are three types of
gates known as Input Gate, Forget Gate, and Output Gate and these help in
effectively capturing and retaining dependencies in data that spans within the
significant time intervals.
Selection and Training Module guides the training of MaskGAN, RNN, and
LSTM, each leveraging its unique architecture. The Evaluation Module employs
METEOR scores, accuracy rates, and loss metrics, facilitating a comprehensive
assessment and comparative analysis of model performances. The Result Inter-
pretation Module then interprets the outcomes, providing nuanced insights and
guiding decisions for future refinements. The architecture’s adaptability is under-
scored by the Future Implications Module, laying the groundwork for subsequent
research and advancements in the field. Interdisciplinary integration, efficiency,
and versatility emerge as key principles, reflecting a holistic evaluation approach
and a commitment to accommodating diverse text generation models. The gen-
eral workflow of this model is illustrated in Fig. 4. This theoretical framework
not only orchestrates the execution of the project but also sets the stage for
continued contributions to the evolving landscape of artificial intelligence and
narrative storytelling.
5 Results
This research explored the capabilities of three prominent computational models:
Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs),
and Long Short-Term Memory (LSTM) networks, in the domain of automated
story generation. A high-performance GPU, the Nvidia Tesla T4, was employed
Story Generation Using GAN, RNN and LSTM 201
for model training and made accessible through Google Colab. Additionally, the
models were implemented using both TensorFlow and PyTorch. The primary
programming language throughout the entire project was Python.
Findings
Accuracy: The beam search-enhanced LSTM model exhibited a noticeable
improvement in accuracy compared to the vanilla LSTM. With an accuracy of
87.35%, the model was still found less accurate than the models in comparison.
METEOR Score: Post beam search integration, the METEOR score was
0.3682 and witnessed a significant uptick, reflecting enhanced narrative qual-
ity.
Validation Loss: The validation loss for the beam search-integrated LSTM was
considerably reduced to 0.17 highlighting its efficiency in generating coherent
stories.
Our results offer a comprehensive insight into the capabilities and limita-
tions of GANs, RNNs, and LSTMs in the evolution of story generation. The
superior performance of MaskGAN and the enhanced results from the beam
search-integrated LSTM model provide promising avenues for future research
and development in automated storytelling (Table 2).
6 Conclusion
In this domain of automated story generation, our research embarked on a
rigorous exploration of three seminal computational architectures: Generative
Adversarial Networks (GANs), Recurrent Neural Networks (RNNs), and Long
Short-Term Memory (LSTM) networks. The quantitative results obtained were
illuminating:
– MaskGAN: Achieving an accuracy of 90.23%, a METEOR score of 0.5159,
and a minimal validation loss of 0.13, MaskGAN demonstrated its superiority
in generating high-quality narratives, emphasizing the potential of adversarial
networks in the domain of automated storytelling.
– RNNs: While RNNs showcased an accuracy of 88.64%, their METEOR score
of 0.4676 and validation loss of 0.19 highlighted inherent limitations, partic-
ularly when grappling with extended narrative structures.
– LSTMs: The traditional LSTM model yielded an accuracy of 82% and a
METEOR score of 0.2. However, with the integration of the beam search
mechanism, these metrics improved to 87.35% and 0.3682 respectively, with
a reduced validation loss of 0.17. This underscores the efficacy of heuristic
search algorithms in refining narrative generation.
Story Generation Using GAN, RNN and LSTM 203
From a technical standpoint, these results not only corroborate the estab-
lished strengths and weaknesses of each model but also introduce innovative
methodologies, such as the integration of beam search with LSTMs, which could
catalyze advancements in the domain. Furthermore, the standout performance
of MaskGAN offers intriguing possibilities for future research, suggesting that
the synergy of generator-discriminator dynamics might be the key to achieving
human-like narrative coherence and creativity.
In closing, this research provides a comprehensive, metric-driven evaluation
of contemporary models in story generation, serving as both a reference point
and a beacon for future explorations in computational narrative construction.
7 Future Scope
The future of text generation holds substantial promise for innovation and
research propelled by the success of RNN, LSTM, and GAN models in cre-
ating compelling stories. Apart from these three models, there are several other
methodologies one can use to generate stories. Some of the possible options are
transformer models (BERT, GPT etc.), Markov models, rule-based system mod-
els, and ensemble models. In the coming years, we can anticipate a variety of
exciting developments and research directions in the field of narrative generation.
References
1. Pawade, D., Sakhapara, A., Jain, M., Jain, N., Gada, K.: Story scrambler - auto-
matic text generation using word level RNN-LSTM. Int. J. Inf. Technol. Comput.
Sci. (IJITCS) 10(6), 44–53 (2018)
2. Kiddon, C., Zettlemoyer, L., Choi, Y.: Globally coherent text generation with
neural checklist models. Computer Science Engineering University of Washington
(2016)
204 D. Shrouti et al.
3. Abujar, S., Masum, A.K.M., Chowdhury, S.M.M.H., Hasan, M., Hossain, S.A.:
Bengali text generation using bi-directional RNN. In: 2019 10th International Con-
ference on Computing, Communication and Networking Technologies (ICCCNT),
Kanpur, India, pp. 1–5 (2019)
4. Wang, H., Qin, Z., Wan, T.: Text generation based on generative adversarial nets
with latent variables. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M.,
Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10938, pp. 92–103. Springer,
Cham (2018). https://doi.org/10.1007/978-3-319-93037-4_8
5. Iqbal, T., Qureshi, S.: The survey: text generation models in deep learning. J. King
Saud Univ. Comput. Inf. Sci. 34(6), Part A, 2515–2528 (2022)
6. Fedus, W., Goodfellow, I., Dai, A.M.: MaskGAN: better text generation via filling
in the... In: Conference on Learning Representations (2018)
7. Stoyanov, V., Eisner, J.: Easy-first coreference resolution. In: Proceedings of COL-
ING 2012, pp. 2519–2534 (2012)
8. Zhang, Y., et al.: Adversarial feature matching for text generation. In: NIPS Work-
shop on Adversarial Training (2016)
9. Meral, H.M., Sankur, B., Ozsoy, A.S., Gungor, T., Sevinc, E.: Natural language
watermarking via morphosyntactic alterations. Comput. Speech Lang. 23, 107–125
(2009)
10. Ghosh, S., Chollet, C., Laksana, E., Scherer, B., Morency, L.-P.: Affect-LM: a
neural language model for customizable affective text generation. In: Proceedings
of the 2017 Conference on Empirical Methods in Natural Language Processing,
Copenhagen, Denmark, pp. 1353–1363 (2017)
11. Liu, Y., Sun, C., Lin, L., Wang, X.: Learning natural language inference using
bidirectional LSTM model and inner-attention (2016)
12. Yogatama, D., Dyer, C., Ling, W., Blunsom, P.: Generative and discriminative text
classification with recurrent neural networks (2017)
13. Wen, T.-H., Gasic, M., Mrksic, N., Su, P.-H., Vandyke, D., Young, S.: Semantically
conditioned LSTM-based natural language generation for spoken dialogue systems.
Cambridge University Engineering Department, Trumpington Street, Cambridge
(2015)
14. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint
arXiv:1308.0850 (2013)
15. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning
to align and translate. arXiv preprint arXiv:1409.0473 (2014)
16. Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com/
17. Luan, Y., Ji, Y., Ostendorf, M.: LSTM based conversation models (2016)
18. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent
neural networks. Department of Computer Science, University of Toronto (2013)
19. Mangal, S., Joshi, P., Modak, R.: LSTM vs. GRU vs. bidirectional RNN for script
generation. Medi-Caps University Indore, India (2019)
Analysis of Effectiveness of Indian Political
Campaigns on Twitter
1 Introduction
With the advent of the digital age, most of our communication has shifted online.
Companies rely on social networking websites like Twitter and Instagram for fast
and easy communication with customers. The political parties too, are using such
platforms to gain the attention of voters. This provides them with a cheap and
convenient publicity method and the ability to reach millions of people within
seconds [1]. Some of the most popular Indian politicians have more than 50
million followers, which enables them to increase their reach to the public. The
voters can also share their opinions and expectations of the parties on the forum.
Using popular machine learning and data mining techniques, we can predict the
outcome of the election by understanding the sentiment that people have towards
certain parties and the popularity of such opinions [2].
Natural language Processing has found application in multiple arenas and
is now being used by companies for sentiment mining, where people’s feelings
towards a particular movie, book, song, etc., are analyzed. By categorizing the
reviews into positive, negative, or neutral, future recommendations can be cus-
tomized to a user’s liking, or predictions about the popularity of a product can
also be made.
Many researches have made multiple efforts in the past to analyse tweets
for predicting the outcome of an election, however some works, like Almatrafi
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
D. Garg et al. (Eds.): IACC 2023, CCIS 2053, pp. 205–215, 2024.
https://doi.org/10.1007/978-3-031-56700-1_17
206 K. Singhal et al.
et al. [3] focused their efforts on only two of the most popular parties, i.e. the
Bharatiya Janata Party (BJP) and the Aam Aadmi Party (AAP).
In this work, we explored multiple popular machine learning algorithms and
their effectiveness in predicting the outcome of the election by closely analyz-
ing the sentiments that the voters have towards the political parties. We chose
the three most popular parties for our analysis, including BJP, Indian National
Congress (Congress), and AAP.
Our contributions are summarised as follows:
– We propose a new methodology for analysing how effective the campaigns
held on Twitter were, for various political parties.
– We use Hindi tweets in this work. Limited research has been done on this
problem using regional languages such as Hindi.
– The proposed model outperformed certain existing techniques.
Section 2 discusses the work that has been carried out previously to analyze
the effectiveness of online campaigns held by various parties. Section 3 describes
the work carried out in the present work in depth. The proposed methodology
has been evaluated on multiple parameters as described in Sect. 4. The results are
represented in Sect. 5. The conclusion and future work are discussed in Sect. 6.
2 Literature Review
Many efforts have been made in the past to analyze tweets and perform sentiment
analysis. Some researchers have also focused their efforts towards tweets writ-
ten in Hindi language. Yakshi Sharma et al. [4] conducted an analysis of Hindi
tweets and introduced an innovative approach for sentiment analysis. By utilis-
ing the Subjective Lexicon approach, they compared the polarity of tweets with
#“jaihind” and #“worldcup2015” and obtained better accuracy than Unigram
Presence Method in which they counted the words with positive and negative
polarity and choose the one with dominating polarity.
Piyush Arora et al. [5] developed a technique to create a subjective lexicon
for the Hindi language by employing a graph-based approach and Wordnet.
The approach resulted in an accuracy of 74% in classifying reviews and a 69%
agreement with human annotators. The process started by generating a small
list of seed words, which were then enlarged by using Wordnet, synonyms and
antonyms. For each word in the seed list, a node was created and linked to its
synonyms and antonyms. The connections between words helped to identify the
subjective aspect of a word and classify it accordingly. This approach proved to
be effective in creating a comprehensive lexicon for the Hindi language.
Namita Mittal et el. [6] proposed a novel approach for predicting sentiments
in Hindi language that was based on negation and discourse relations. They
incorporated more opinion words, which allowed for the enhancement of Hindi
SentiWordnet. Additionally, they established rules for handling negation and
discourse that had an effect on sentiment prediction. Using this approach, for
positive reviews, an accuracy of 82.89% was attained. For negative reviews, an
Analysis of Effectiveness of Indian Political Campaigns on Twitter 207
accuracy of 76.59% was attained, and overall the accuracy attained was 80.21%.
This showcases the efficacy of their suggested approach, which is applicable to
a wide range of NLP tasks, including text classification and sentiment analysis.
Kamps et al. [7] developed a new approach for determining the sentiment
of adjectives in WordNet. They began by dividing the adjectives into four main
categories. For each category, they used specific base words to determine the
relative distance between adjectives. For instance, in the Evaluative category, the
foundational words used were “good” and “bad” and for the Activity category the
foundational words used were “active” and “passive”. To represent the polarity of
each word, they used the value ‘w’ which ranges from -1 to 1, with -1 being used
for negative words and 1 being used for positive words. This approach helped
them to classify a total of 1608 words and they achieved an accuracy of 67.18%
for English. This approach is valuable for constructing sentiment analysis models
and is applicable for use in various contexts.
Kim and Hovy [8] formulated a method for discerning and examining judg-
mental viewpoints, comprising four distinct stages. The first step in this process
was to recognize the opinion, for which they presented an approach that classi-
fies a word as positive, negative, or objective, by relying on WordNet. To ensure
accurate results, it was assumed that the words with same meanings should have
the identical polarity as the source word. To circumvent issues with words hav-
ing many meanings or dual nature, a technique was implemented that identifies
the proximity of the words to every class, i.e. objective, positive, or negative. To
achieve high recall from the proposed method, it is essential to use a large initial
seed list with words spanning a wide variety.
3 Methodology
The proposed ensemble model uses a workflow that is outlined in Fig. 1. The
process begins by extracting data from Twitter. This data is then preprocessed
to clean and prepare it for analysis. The cleaned data is then input into various
machine learning models for analysis. This workflow allows for the efficient use of
Twitter data in machine learning models to gain insights and make predictions.
3.1 Dataset
Using the Twitter archiver, a corpus of Hindi tweets was first prepared [4]. For
this, a Google script was used, which added all the successful searches from a
Twitter account into a Google spreadsheet. The script was automatically called
after every 5 min to update the Google Spreadsheet with the most recent tweets
of interest. In this research, we aimed to collect and analyze tweets discussing
various political parties in India. To achieve this, a query was placed in the Twit-
ter archiver, and a filter for Hindi tweets was applied. We focused on some of
the most popular parties in the Indian political arena, and collected tweets that
mention them. This included hashtags like #BJP, #Congress, and #AAP. By
gathering this data, we could ascertain the number of tweets for and against a
208 K. Singhal et al.
particular political party. This information can be useful to understand the sen-
timents of people towards different political parties and how they are perceived
by the public. Furthermore, it can provide valuable insights into the popularity
and support for different political parties, helping to understand the political
landscape in India.
As the tweets are currently unlabeled, we must add a label to each tweet.
For this, we use the Dictionary Based Approach (Fig. 2).
N
T F IDF (x, y) = tf (x, y) ∗ log( ) (1)
df (x)
• Here x represents the term in a corpus y for which we are calculating the
TF-IDF value represented by TFIDF(x, y).
• tf(x,y) represents the frequency of the term x in y
• df(x) is the number of documents containing x
• N represents the total number of documents
– Polarity: The polarity is a value which lies in the range [−1, 1], where −1 is
used to represent negative words and +1 is used to represent positive words.
To calculate the polarity, we compared each word with its SentiWordnet,
which comprised of synonyms and antonyms of each words and their score.
– Assigning Label: After the polarity of each word was known, we assign a
label to it based on the scheme given in Table 1.
Step I. After obtaining the tweets from the Twitter archiver, they are added to
a Google spreadsheet. A total of 29,892 tweets were fetched initially.
Step II. After fetching the tweets, data preprocessing, as explained in Sect. 3.2,
was performed. A total of 29,085 tweets remained after performing the data pre-
processing tasks. The distribution of tweets for each party has been represented
in Table 2.
Analysis of Effectiveness of Indian Political Campaigns on Twitter 211
Step III. As manually labelling such a large dataset would be very time consum-
ing and tedious, an unsupervised learning technique for labeling the data called
Dictionary Based Approach as described in Sect. 3.2, was used (Fig. 3).
Step V. Using TFIDF, we found out the importance or relevance of each word,
which will be used later to train the Naïve Bayes Classifier Model.
Step VI. Naïve Bayes is a probabilistic approach, which assumes that each fea-
ture is independent from the others. In order to assign labels for each of the
input vector features, the formula that will be used is:
P (label) ∗ P (f eatures|label)
P (label|f eatures) = (2)
P (f eatures)
In the equation given above, features are representative of the words that
were extracted from the tweets while the label represents the polarity or the
sentiment i.e. positive, neutral, and negative of the word.
Support Vector Machine
Support Vector Machine (SVM) [9] is a powerful algorithm that is capable of
processing data that is not linearly separable by employing kernel functions to
project the data into a higher dimensional space where it becomes linearly sep-
arable. This property allows SVM to handle complex and non-linear decision
boundaries, making it a versatile tool for a variety of machine learning tasks.
212 K. Singhal et al.
Additionally, SVM has the ability to handle large amounts of data and high
dimensional feature spaces, rendering it a suitable algorithm for big data appli-
cations [10]. Overall, SVM is a widely used algorithm in the field of machine
learning due to its capability to deal with complicated data and locate optimal
decision boundaries.
Logistic Regression
Logistic Regression [3] is among the most popular supervised learning classifi-
cation algorithms. The predicted value lies between 0 and 1, which essentially
gives the probabilistic value. The Logistic Function, is a sigmoid function which
represents the mapping of the predicted value to its probability [11]. A threshold
value is defined in Logistic Regression, all the values which are greater than the
threshold assume the value of 1 and those that are below the threshold assume
the value of 0.
4 Model Evaluation
TP
Recall = (5)
(T P + F N )
– F-1 Score: F-1 Score represents the value obtained from the harmonic mean
of precision and recall. It is calculated as [12]:
2 ∗ (P recision ∗ Recall)
F 1Score = (6)
(P recision + Recall)
Analysis of Effectiveness of Indian Political Campaigns on Twitter 213
The Fig. 4, Fig. 6 and Fig. 8 show the results obtained for the year 2016 elections
from Naïve Bayes Classifier, SVM and Logistic Regression respectively. The
Fig. 5, Fig. 7 and Fig. 9 show the results obtained for the year 2021 elections
from Naïve Bayes Classifier, SVM and Logistic Regression respectively.
Table 3 gives the list of the machine learning models that are trained on
the dataset. SVM gave an accuracy of 88.4% while Logistic Regression gave
an accuracy of 82.1% and Naïve Bayes gave an accuracy of only 76.3%. The
accuracy highest accuracy achieved in this work is higher compared to some of
the previous works [13–16].
Fig. 4. 2016 Election result from Naïve Fig. 5. 2021 Election result from
Bayes Classifier Naïve Bayes Classifier
Fig. 6. 2016 Election result from SVM Fig. 7. 2021 Election result from SVM
214 K. Singhal et al.
Fig. 8. 2016 Election result from Logistic Fig. 9. 2016 Election result from
Regression Logistic Regression
For the 2016 elections, SVM predicted that BJP will win with as it had 62%
positive tweets and for the 2021 elections SVM predicted that BJP will win as
it had 69.7% positive tweets. We can conclude that SVM is extremely beneficial
for answering this problem as it gave the best performance for both 2016 and
2019 elections.
In this paper, we successfully managed to predict the election results using the
popular social networking website, Twitter. This research is especially useful for
those interested in understanding the political climate in India, and how social
media is used to shape opinions and mobilize support.
In the future, we intend to include emoticons in the analysis, as they can
provide useful insights for the prediction, can also develop an ensemble model,
and with the help of techniques such as soft voting improve the accuracy of
the model. We can also consider multiple social networking platforms including
Facebook, etc.
References
1. Moawi, H.: Predicting voting behaviors and election results using digital trace data
and Twitter. Available at SSRN 4464047 (2023)
2. Verma, M., Suryawanshi, P., Deore, S., Mundhe, P., Phakatkar, A.: Election out-
come prediction using sentiment analysis on Twitter (2016)
3. Almatrafi, O., Parack, S., Chavan, B.: Application of location-based sentiment
analysis using Twitter for identifying trends towards Indian general elections 2014.
In: Proceedings of the 9th International Conference on Ubiquitous Information
Management and Communication, Article no. 41 (2015)
4. Sharma, Y., et al.: A practical approach to sentiment analysis of Hindi tweets.
In: 1st International Conference on Next Generation Computing Technologies
(NGCT), pp. 677–680 (2015)
Analysis of Effectiveness of Indian Political Campaigns on Twitter 215
5. Arora, P., Bakliwal, A., Varma, V.: Hindi subjective lexicon generation using word-
net graph traversal. Int. J. Comput. Linguist. Appl. 3(1), 25–39 (2012). https://
doi.org/10.5121/ijcla.2012.3103
6. Mittal, N., et al.: Sentiment analysis of Hindi reviews based on negation and
discourse relation. In: Proceedings of the 11th Workshop on Asian Language
Resources (2013)
7. Kamps, J., Marx, M., Mokken, R.J., Rijke, M.D.: Using wordnet to measure seman-
tic orientations of adjectives. In: LREC, vol. 4, pp. 1115–1118 (2004)
8. Kim, S.M., Hovy, E.: Crystal: analyzing predictive opinions on the web. In: Pro-
ceedings of the 2007 Joint Conference on Empirical Methods in Natural Language
Processing and Computational Natural Language Learning (EMNLP-CoNLL)
(2007)
9. Suthaharan, S., Suthaharan, S.: Support Vector Machine (2016)
10. Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1565–
1567 (2006)
11. Nick, T.G., Campbell, K.M.: Logistic regression, pp. 273–301 (2007)
12. Precision vs recall | precision and recall machine learning. https://www.
analyticsvidhya.com/blog/2020/09/precision-recall-machine-learning/. Accessed
11 Oct 2023
13. Sanga, A., Samuel, A., Rathaur, N., Abimbola, P., Babbar, S.: Bayesian prediction
on PM Modi’s future in 2019. In: Singh, P., Kar, A., Singh, Y., Kolekar, M., Tan-
war, S. (eds.) ICRIC 2019. LNEE, vol. 597, pp. 885–897. Springer, Cham (2020).
https://doi.org/10.1007/978-3-030-29407-6_64
14. Agarwal, A., Toshniwal, D., Bedi, J.: Can Twitter help to predict outcome of 2019
Indian general election: a deep learning based study. In: Cellier, P., Driessens, K.
(eds.) ECML PKDD 2019. CCIS, vol. 1168, pp. 38–53. Springer, Cham (2020).
https://doi.org/10.1007/978-3-030-43887-6_4
15. Hitesh, M., Vaibhav, V., Kalki, Y.A., Kamtam, S.H., Kumari, S.: Real-time sen-
timent analysis of 2019 election tweets using word2vec and random forest model.
In: 2019 2nd International Conference on Intelligent Communication and Compu-
tational Techniques (ICCT), pp. 146–151. IEEE (2019)
16. Khatua, A., Khatua, A., Cambria, E.: Predicting political sentiments of voters
from Twitter in multi-party contexts. Appl. Soft Comput. 97, 106743 (2020)
Voice Enabled Form Filling Using Hidden
Markov Model
1 Introduction
Voice-enabled technologies have become an innovative way to speed up and improve
user interactions in the world of contemporary technology. This study uses the advanced
Hidden Markov Models (HMM) approach to deal with voice-activated form filling. Our
solution makes it simple for users to voice-fill out forms by utilizing the power of HMM.
Input speech data from users is first recorded as part of the project, and this data forms the
basis for all future processes. The speech data goes through crucial procedures including
data normalization and feature extraction to guarantee precise and dependable findings.
The audio is now ready for additional processing, analysis, and transformation. We then
translate their spoken words into written text, so we can work with it more easily. Once we
have the text, we extract the important information needed to fill out the form accurately.
This means we identify the specific details the form requires, such as names, addresses,
or other relevant data. Through the integration of these components, our voice-enabled
form-filling system revolutionizes the traditional data input paradigm. Users can now
effortlessly provide information by simply speaking, freeing them from the constraints
of manual typing or handwriting. Embracing the potential of Hidden Markov Models
and the power of voice recognition, this project not only offers convenience but also
opens up new possibilities for improved efficiency, accessibility, and user experience in
the domain of form-filling.
Additionally, the introduction provides an overview of key components within the
Automatic Speech Recognition (ASR) framework, encompassing the conversion of spo-
ken language into text through Machine Learning or AI [21], making ASR increas-
ingly accessible, and highlighting the significance of Mel-frequency cepstral coefficients
(MFCCs) in capturing spectral characteristics. They’re computed at 10 ms intervals using
a 25 ms analysis frame, smoothing FFT-derived log spectral estimates, and applying Dis-
crete Cosine Transformation (DCT) on a mel scale [8]. The discussion extends to the
foundational role of Hidden Markov Models (HMMs) in modeling time-varying spectral
sequences in speech, although their practical implementation is complex [22]. Further-
more, the Viterbi Algorithm is introduced as a pivotal element for mapping acoustic
signals to words within ASR, leading to the identification of the most probable word
sequence [23]. The decoder plays a crucial role in translating HMM outputs into words
using a dictionary and utilizing language models for enhanced decoding precision, thus
resulting in more accurate transcriptions [24].
The forthcoming sections provide an in-depth exploration of the technical aspects of
the voice-enabled form-filling system. Section 2 delves into the methodology, offering
insights into the utilization of Hidden Markov Models (HMM) and Mel-frequency cep-
stral coefficients (MFCCs). Section 3 is dedicated to the presentation of the experimental
results and their subsequent analysis. In Sect. 4, the focus shifts to an examination of
the implications arising from our findings. Lastly, Sect. 5 offers a concluding summary
of this research along with recommendations for future investigations.
1.1 Motivation
Manual typing for filling out forms and any applications is difficult and time-consuming.
Voice-based form-filling is a better way to overcome such problems. By using voice
recognition, it takes less time and accuracy to fill out a form. Voice recognition systems
enable users to interact with the system by speaking to it, making it possible to conduct
independent requests, reminders, and other simple tasks.
overcome such problems. By using voice recognition, it takes less time and accuracy to
fill out a form. Voice recognition systems provide hands-free requests, reminders, and
other basic activities by allowing users to engage with the system merely by speaking
to it.
1.3 Objectives
The major Objectives are:
• To design and train a Hidden Markov Model (HMM) for speech recognition that can
accurately recognize spoken words and phrases.
• To evaluate the model’s accuracy on a sizable dataset to spot any flaws or potential
areas for development.
• To implement pre-processing techniques to enhance the accuracy of the HMM model
by filtering out background noise and other interference.
1.4 Scope
• Initially it supports only the English language and can be extended to multilingual
forms.
• It is limited to selected forms such as job applications etc.
1.5 Advantages
• Enables voice-based data entry, enhancing user experience with a convenient and
time-saving option.
• High precision is enabled by advanced voice recognition technology. Using HMM
models and contemporary voice technology, research tries to reduce mistakes.
• Voice-enabled forms enhance accessibility, promoting inclusion for those with
physical limitations or typing difficulties, and widening user interaction.
• Voice input, rather than typing or navigating through multiple fields, accelerates
form-filling, enhancing effectiveness and speed.
2 Literature Survey
This section describes the several literature surveys that are cited.
In this work [1], four different spectral features— Perceptual Linear Prediction
(PLP), Linear Predictive Cepstral Coefficients (LPCC), and Mel-Frequency Cepstral
Coefficients (MFCC) are four types of linear predictive coding—are used to compare
the representations of Khasi speech. LPC is used to estimate basic speech characteristics
and is especially effective when expressing speech at low bit rates. The spectrum of the
input voice signal is subjected to LPCC. PLP modifies the spectral properties to match
the human hearing system. The voice sounds’ short-term power spectrum is captured
by MFCC characteristics. The canonical speech representation is used in the study to
discriminate between voiced and unvoiced sounds. Building several Performance eval-
uations calls for HMM-based recognizers that use the HTK ASR toolbox and adjust
spectral characteristics, feature size, HMM states, and GMMs. The best performance
Voice Enabled Form Filling Using Hidden Markov Model 219
obtained using MFCC characteristics and 5 HMM states are chosen for evaluation, and
the word error rate (WER) is employed.
The research [2] examines an automatic speech recognition method that is noise-
resistant by using hidden Markov modeling of stereo voice data from clean and noisy
channels. The resulting stereo HMM includes a Gaussian mixture model in each state
and has a blended distribution of both clean and noisy speech features. The stereo HMM
enables two-pass compensation and decoding, which entails MMSE de-noising based on
N-best hypotheses and decoding the de-noised speech in a condensed search space on a
lattice. In comparison with feature space GMM-based de-noising approaches, the stereo
HMM enables finer-grained noise compensation and anticipates each clean feature using
information from the whole noisy feature sequence. Manually collected dataset with 4 K
Quinone state and 50 K Gaussians.
A novel HMM-based speech recognition technique is presented in research [3], which
uses Particle Swarm Optimization (PSO) to improve HMM parameter optimization and
recognition accuracy. According to the research, PSO should be incorporated into the
recognition phase, with global optimum segmentation taking precedence over conven-
tional Viterbi techniques. In this method, segmentation vectors are initially created in
the solution space and their locations are modified. Because speech is continuous, HMM
states can be represented by a single Gaussian or a multi-modal Gaussian mixture. It
is common practice in these models to employ continuous density probability density
functions (pdfs). Comparative testing shows that in terms of recognition accuracy and
convergence speed, the PSO approach surpasses the traditional Baum-Welch algorithm.
Its usefulness in obtaining global optimum while preserving the Viterbi system accuracy
is demonstrated by experiments using stop consonants and isolated word recognition.
An enhanced Gaussian Mixture Hidden Markov Model (GMHMM) for categorizing
audio-based emotions is reported in the article [4]. The Hidden Markov Model (HMM)
is enhanced by a Gaussian Mixture Model (GMM), the GMHMM increases accuracy
by considering state space uncertainty. It calculates model parameters using HMM and
GMM after being trained on audio data representing emotional states. Results from the
evaluation of the Emo-DB and eNTER FACE’05 datasets are remarkable, with 84.5%
accuracy on Emo-DB and 85.5% on eNTER FACE’05, which are on par with cutting-
edge models. A disadvantage is the computational burden of training, despite its excellent
classification accuracy and noise robustness. Though it may need a lot of resources to
train for some applications, the GMHMM overall shows promise in audio-based emotion
categorization.
A summary of developments in Arabic voice recognition over the past few decades
is given in this work [5]. It covers key elements of voice recognition systems, such
as corpora, phonemes, language models, acoustic models, and performance assessment.
Utilizing machine learning techniques for voice adaptation is the main objective. To train
acoustic models for word recognition based on voice feature vectors, language model
declaration—often referred to as grammars or probabilistic N-Grams—is employed.
With the use of a sizable audio corpus and phonetic transcriptions, training uses Gaus-
sian mixtures in Hidden Markov Models (HMMs) to produce statistical representations
220 B. Sallagundla et al.
of phonemes. To extract acoustic features, MFCCs are used. The accuracy rate of recog-
nition, which is commonly calculated as the word error rate (WER), represents the pro-
portion of successfully recognized patterns, which measures the effectiveness of isolated
recognition of words.
In Paper [6], the emphasis is on evaluating Hidden Markov Models’ (HMMs’) under-
utilized potential in reliability engineering. In the paper, a unique maintenance approach
based on HMMs is presented, where different system states correlate to different lev-
els of degradation. When system states are difficult to see, this method is especially
useful since it relies on control systems to give cues about the real situation. The
research creates a maximum-likelihood estimate for system dependability by exam-
ining its asymptotic characteristics. The study views corrective maintenance as the most
expensive intervention, with costs dependent on the degree of system degeneration, and
introduces a random element represented by a Markov chain to affect the core system.
Notably, the study establishes a distinct dependability function that takes into consider-
ation both visible and unobservable system states as signs of system degradation. The
paper builds a maximum-likelihood estimate of this function using signal measurements
and investigates its theoretical properties.
The article [7] discusses how continuous-state Hidden Markov Models (HMMs) may
be used to resolve a well-known problem in voice recognition that involves decoding
a list of phonetic units from measurements of acoustic signals that change gradually.
They put forth an approach for the HMM decoding issue based on a continuous state
space representation of the HMM that is tested using both synthetic and actual voice
data. The journal also highlights certain restrictions and difficulties related to employing
continuous-state HMMs for speech recognition, such as the requirement for substantial
training data and the difficulty of simulating long-term relationships in speech signals.
Real speech from the TIMIT dataset and artificial sinusoidal speech.
The Mel Frequency Cepstral Coefficient (MFCC) features are used in this study [8]
to refine an Automatic Speech Recognition (ASR) system for the Moroccan Dialect,
enhancing speaker identification. With a vector quantization technique, these retrieved
characteristics are quantized. A limited number of basic acoustic units that are calculated
by sliding windows are used to depict the acoustic signal. The digital speech stream
is transformed into feature vectors using MFCC parametrization, which records the
voice characteristics of the speaker. These feature vectors are used to build reference
models from training utterances. Precision speaker discrimination is made possible by the
widely utilized MFCCs’ efficient recording of speech signal patterns. Utilizing matching
techniques, similar vectors of features from test utterances are retrieved for testing and
contrasted with the reference. The system obtains an accuracy rate of about 90% by
using MFCC + Delta + Delta-Delta characteristics.
Voice Enabled Form Filling Using Hidden Markov Model 221
3 Proposed Methodology
3.1 Architecture
Noise removal and normalization techniques are applied to enhance the quality and
consistency of the audio. A dataset for training is created, and a data set for testing
is created. The Gaussian HMM-based acoustic model, which precisely represents the
statistically significant features of speech, is trained using the training dataset. The Viterbi
algorithm serves as the decoder, matching observed speech features with learned HMM
states to determine the most likely spoken words or phrases. The recognized speech is
validated, and if valid, the form is filled with the transcribed text. This project leverages
various techniques, including pre-processing, feature extraction, acoustic modeling, and
decoding, to achieve accurate and efficient voice-enabled form-filling (Fig. 1).
222 B. Sallagundla et al.
3.2 Methodology
Data Collection
Common Voice is a corpus of voice data that people may access on the Common Voice
website [9] that was constructed from text from several public domain sources, including
user-submitted blog posts, vintage books, films, and other public speech corpora. Making
the instruction and evaluation of Automated Speech Recognition (ASR) systems possible
is its main objective. Sections of the corpus have been created for our convenience. The
subgroups with “valid” in their title are clips of audio that have been heard at least
twice by listeners who, on average, agree that the audio matches the text. The clips
in the subgroups with the word “invalid” in their titles have at least two listeners, and
most of them claim the audio is inconsistent with the clip. All other clips have “other”
in their name, even those with fewer than two votes and those with an equal number
of valid and invalid votes. Each.csv file contains the filename, relative path, the audio
file’s alleged transcription, the number of individuals who felt the audio matched the
text, their age, gender, and accent, as well as up and down votes. Each subset’s audio
samples are stored as mp3 files in folders that share the same naming patterns as the
corresponding CSV files. As an illustration, all audio files from. As an illustration, the
“cv-valid-train.csv” metadata file and the subdirectory “cv-valid-train” will include each
bit of audio data taken from the valid train set.
Dataset Description
The dataset used in this project is the Common Voice dataset, a comprehensive resource
for multi-accent voice data collection developed by Mozilla. The dataset contains over
400,000 audio samples with corresponding transcriptions. For this analysis, the focus
was on a specific subset, namely the “cv-valid-train” portion, which comprises a substan-
tial amount of speech data containing 2,00,000 records suitable for training and testing
the Hidden Markov Model (HMM) based system. This dataset consists of two primary
components. First, it includes a vast collection of audio files, each containing recordings
by diverse users who read sentences in their native languages. These audio files encom-
pass a wide range of linguistic and acoustic variations, making them highly suitable
for training and evaluating automatic speech recognition systems. The second compo-
nent of the dataset comprises CSV files, each of which contains a wealth of information
related to the audio recordings. These CSV files are rich in attributes, including the audio
file name, the corresponding transcribed text, the number of up_votes, down_votes, the
speaker’s age, gender, accent, and the duration of each recording. This multi-faceted
dataset played a pivotal role in training and testing our Hidden Markov Model (HMM)
based system, facilitating a more comprehensive analysis of its performance in various
linguistic and demographic contexts.
Data Pre-processing
Data Cleaning: Data cleaning is used for removing incorrect or duplicate data within
a dataset. The dataset that is downloaded from the Kaggle contains a set of audio files
and corresponding CSV files. The CSV files contain unnecessary columns and missing
transcriptions. These are dropped to reduce the noise and improve performance. The
punctuation and white spaces are removed, so that it gives better results during testing.
224 B. Sallagundla et al.
Feature Extraction
Feature Extraction is used for transforming raw data, such as audio waveforms, into a
set of representative features that capture the relevant information. It helps in extracting
Mel frequency cepstral coefficients (MFCCs) from an audio file. The extracted features,
such as MFCCs, are used as inputs to train ASR models. For modeling and identifying
speech sounds, these features which represent the spectral properties of the speech signal
are crucial. The Mel-scale, a widely used frequency scale, is linear up to 1000 Hz and
logarithmic above that point. Calculating a filter’s center frequency in Mel-scale:
f
1000log 1 + 1000
fmel =
log2
Fmel in the equation above stands for the frequency value on the Mel scale. f is the
frequency value in Hertz that you want to convert to the Mel scale. Log represents the
natural logarithm function.
transforming categorical information into a format that machine learning algorithms can
use. This function simplifies the label encoding process by encapsulating it within a
single function, allowing users to easily apply the transformation to their transcriptions
and utilize the encoded data in subsequent machine learning tasks.
In technique 3, the Baum-Welch technique is used to train a Gaussian Hidden Markov
Model. This algorithm aims to learn the optimal parameters for the model by iteratively
updating the transition matrix, emission matrix, and initial probabilities. The training
loop repeats a specified number of iterations, during which the expectation step computes
probabilities of being in each state at each time step, while the maximization step updates
the model parameters based on these probabilities. This iterative process continues until
convergence, refining the model’s ability to capture the relationships between observed
data and hidden states. After completing the training loop, the learned parameters are used
to create a Gaussian HMM model. This model incorporates the updated transition matrix,
emission matrix, and initial probabilities. It may be used for many different applications,
including voice recognition, where it forecasts the most probable hidden state sequence
provided by an input series. By iteratively improving the model’s parameters based on
the provided data, the algorithm enhances the model’s capacity to identify patterns and
associations, resulting in improved accuracy and performance for subsequent analyses
or applications.
Pre-processing
Data Cleaning: We use data cleaning techniques on the transcribed data files that are in
the CSV format available along with the common voice dataset, For that we consider
some test cases like correlation among columns with the corr() method and remove the
unnecessary columns and also remove the null value records using dropna() method and
save the transcribed data after cleaning at the respective path (Table 1).
‘.mp3’ to ‘.wav’ conversion: We convert the ‘.mp3’ audio files to ‘.wav’ for better
training of the model in different test cases ‘.wav’ format is flexible with all conditions
for that we use “FFmpeg” a command line tool widely used for conversion (Fig. 2).
226 B. Sallagundla et al.
All the audio files are converted from.mp3 to.wav extension for better performance.
The above figure shows the pictorial representation of the wave signal of an audio file.
• Noise Removing: We use librosa.effects.trim() method to remove the noise from the
audio files
• Feature Extraction: We use the mfcc() method to extract the features by adjusting the
sample rate based on your dataset and the Number of MFCC coefficients.
• Normalization: We use Minimax normalization processes to normalize the values to
around 0.
HMM Parameters
In our project, the Hidden Markov Model (HMM) was fine-tuned with specific
parameters to optimize its performance. The model incorporated five hidden states
(n_components = 5), which were determined through an iterative process. We tested a
range of state values from 3 to 9 and selected the configuration that yielded the highest
accuracy. This careful parameter selection was vital to the model’s effectiveness. Fur-
thermore, the model employed a diagonal covariance matrix type (covariance_type =
“diag”), which is well suited for speech data. The training process utilized the Viterbi
Algorithm (algorithm = ‘Viterbi’), a robust choice for training HMMs in sequence pre-
diction tasks. To expedite the training process and enhance efficiency, we implemented
parallel computing techniques. This approach significantly reduced training time, allow-
ing us to more effectively capture the relationships between the observed audio data and
the underlying hidden states. As a result, the HMM model proved highly effective in
speech recognition tasks.
The results are obtained from the successful execution of the proposed system. The audio
files for the dataset are collected from the Kaggle Common Voice dataset repository. The
dataset is a corpus of spoken information that is frequently utilized in speech recognition
and is considered to be of the common voice. The Outputs from each module are shown
in this chapter. The code collects the scores, number of states, and trained models for each
fold. It calculates the average accuracy for the current number of states by normalizing
the scores within the range of minimum and maximum scores obtained.
The graph shows the relationship between both the HMM model’s precision and state
count. It can provide insights into the optimal number of states to use for a specific task
or dataset. The graph can be used to analyze the trade-off between model complexity
(number of states) and model accuracy and help in making informed decisions for model
selection and tuning (Fig. 4).
228 B. Sallagundla et al.
Print Results:
The code prints the average accuracy for all models of different states, the best
accuracy for the best training model, and the number of states for the best model (Table 2).
The above table shows the accuracy of the model and the state at which the accuracy
is maximum. It calculates the accuracy for a specific range of states and gives the highest
precision and the same variety of states within that range.
A speech-to-text form-filling program with a graphic user interface (GUI) was used for
this project. The GUI consists of input fields for different form fields such as First Name,
Last Name, and City. Each input field is accompanied by a recording button. When the
user clicks a recording button, the code initiates audio recording for a fixed duration of
3 s. The recorded audio is then saved as a WAV file. Afterward, the code processes the
recorded audio using a trained Hidden Markov Model (HMM) model. The code uses the
Viterbi method to find the most likely series of hidden states throughout the processing
Voice Enabled Form Filling Using Hidden Markov Model 229
step after extracting features from the audio. These hidden states are then mapped to
text labels using a label encoder that was used during the training of the HMM model
(Fig. 5).
Processing techniques to enable the system to understand and interpret user responses
more intelligently. Explore integration possibilities with existing form solutions, such as
popular form builders or office productivity tools, to provide seamless integration and
compatibility with established workflows.
References
1. Syiem, B., Dutta, S.K., Binong, J., Singh, L.J.: Comparison of Khasi speech representations
with different spectral features and hidden Markov states. J. Electron. Sci. Technol. 19(2),
100079 (2021)
2. Cui, X., Afify, M., Gao, Y., Zhou, B.: Stereo hidden Markov modeling for noise robust speech
recognition. Comput. Speech Lang. 27(2), 407–419 (2013)
3. Najkar, N., Razzazi, F., Sameti, H.: A novel approach to HMM-based speech recognition
systems using particle swarm optimization. Math. Comput. Model. 52(11–12), 1910–1920
(2010)
4. Siddiqi, M.H.: An improved Gaussian mixture hidden conditional random fields model for
audio-based emotions classification. Egypt. Inform. J. 22(1), 45–51 (2021)
5. Al-Anzi, F.S., AbuZeina, D.: Synopsis on Arabic speech recognition. Ain Shams Eng. J.
13(2), 101534 (2022)
6. Gámiz, M.L., Limnios, N., del Carmen Segovia-García, M.: Hidden Markov models in
reliability and maintenance. Eur. J. Oper. Res. 304(3), 1242–1255 (2023)
7. Champion, C., Houghton, S.M.: Application of continuous state hidden Markov models to a
classical problem in speech recognition. Comput. Speech Lang. 36, 347–364 (2016)
8. Mouaz, B., Abderrahim, B.H., Abdelmajid, E.: Speech recognition of Moroccan dialect using
hidden Markov models. Procedia Comput. Sci. 151, 985–991 (2019)
9. Nedjah, N., Bonilla, A.D., de Macedo Mourelle, L.: Automatic speech recognition of
Portuguese phonemes using neural networks ensemble. Expert Syst. Appl. 229, 120378 (2023)
10. Lee, L.M., Jean, F.R.: Adaptation of hidden Markov models for recognizing speech of reduced
frame rate. IEEE Trans. Cybern. 43(6), 2114–2121 (2013)
11. Chen, Y., Zheng, H.: The application of HMM algorithm based music note feature recognition
teaching in universities. Intell. Syst. Appl. 20, 200277 (2023)
12. Mannepalli, K., Sastry, P.N., Suman, M.: MFCC-GMM based accent recognition system for
Telugu speech signals. Int. J. Speech Technol. 19, 87–93 (2016)
13. Chandrakala, S.: Investigation of DNN-HMM and lattice free maximum mutual information
approaches for impaired speech recognition. IEEE Access 9, 168840–168849 (2021)
14. Li, Q., Zhang, C., Woodland, P.C.: Combining hybrid DNN-HMM ASR systems with
attention-based models using lattice rescoring. Speech Commun. 147, 12–21 (2023)
15. Ma, Z., Zhang, J., Li, T., Yang, R., Wang, H.: A parameter transfer method for HMM-DNN
heterogeneous model with the scarce mongolian data set. Procedia Comput. Sci. 187, 258–263
(2021)
16. Das, T.K., Nahar, K.M.: A voice identification system using hidden Markov model. Indian J.
Sci. Technol. 9(4), 1–6 (2016)
17. Ranjan, A., Jegadeesan, K.: Hybrid ASR for resource-constrained robots: HMM-deep learning
fusion. arXiv preprint arXiv:2309.07164 (2023)
18. Yadava, G.T., Nagaraja, B.G., Jayanna, H.S.: An end-to-end continuous Kannada ASR system
under uncontrolled environment. Multimed. Tools Appl. 1–14 (2023)
19. Trabelsi, A., Warichet, S., Aajaoun, Y., Soussilane, S.: Evaluation of the efficiency of state-
of-the-art Speech Recognition engines. Procedia Comput. Sci. 207, 2242–2252 (2022)
Voice Enabled Form Filling Using Hidden Markov Model 231
20. Jaradat, G.A., Alzubaidi, M.A., Otoom, M.: A novel human-vehicle interaction assistive
device for Arab drivers using speech recognition. IEEE Access 10, 127514–127529 (2022)
21. Speech recognition. Wikipedia (2023). https://en.wikipedia.org/wiki/Speech_recognition
22. Hidden Markov model. Wikipedia (2023). https://en.wikipedia.org/wiki/Hidden_Markov_
model
23. Viterbi algorithm. Wikipedia (2023). https://en.wikipedia.org/wiki/Viterbi_algorithm
24. Brown, D.G., Golod, D.: Decoding HMMs using the k best paths: algorithms and applications.
BMC Bioinform. 11(S1) (2010). https://doi.org/10.1186/1471-2105-11-s1-s28
Bayesian Network Model Based Classifiers Are
Used in an Intelligent E-learning System
1 Introduction
In 21st century from classroom to online learning, education has caused a significant
change in how people learn. The indoor teaching of classroom sessions inside of four
walls has changed to Internet services in a teaching setting using multimedia, which
has expanded the idea of learning from a text book towards understanding the concept.
The concept of “everyone’s education into everyone’s hands” has replaced the idea of
However, the organizations must change to achieve the requirements, aspirations, and
expectations of learners in the contemporary educational system, which is always chang-
ing. As a result, companies are steadily making investments in the e-learning systems,
information technology, gadgets, and online platforms that are becoming more impor-
tant business tools. However, utilizing innovative e-learning technologies to boost and
expand collaborative learning and teaching is one of the most important concerns for
organizations in current technological era [3, 4].
E-learning has the advantage of supporting and enhancing the teaching-learning
process. No additional tools are required because it is a web-based solution. There-
fore, taking into account elements like technology tools, platforms, and learner con-
tent makes e-learning a comprehensive process. Furthermore, the educational system is
unique from other learning approaches in that it accords equal weight to both instruction
and individualized learning [5].
234 R. B. Kaliwal and S. L. Deshpande
The above Fig. 1, the first and most crucial duty is one that requires the system
administrator or professor, who is a member of the organization. In other words, the
Bayesian Network Model 235
administrator or professor must first prepare all the different questions that are neces-
sary for each particular taxonomic level, and only then upload the question set into the
system’s database.
The database is connected to our system. To participate in the evaluation, at the
beginning the learner need to register by completing the application form. In application
form the mandatory fields would include name, age, qualification and many more. Now
learner may finally start the sophisticated assessment procedure for e-learning.
2 Literature Review
In the past, learners would live at the guru’s house and study topics that had practical
application this was known as the gurukul method of education. Reading books was
only one aspect of learning, where the learners also needed to make connections with
the natural world and their daily lives. It wasn’t as simple as remembering facts, eating
figs, and responding to test questions. Learners of discipline came to the gurukul from
various pace of time. In educational paradigm mostly used oral exams in the past, there
is one drawback of gurukul system was that learner’s were verbally tested one-on-one
in the past without being graded or marked. The electronic learning system is replacing
the previous educational system.
“Edtech” and “Technology in Education” are not brand-new concepts. The COVID-
19 sector with the fastest rate of development is one that has expanded over the past few
years. The present focus of edtech is on taking education online and making it available
to all learners through multimedia [7]. It is projected that edtech will soon become the
“new normal” in education because of how popular it has been over the previous several
years.
Universities have implemented changes to better accommodate learners expecta-
tions, interests, and requirements in the modern higher education environment [8]. But
one of the most pressing concerns confronting universities in this technology age is the
integration of cutting-edge e-learning technologies to enhance and assist both teach-
ing and learning [9]. Academics developed a keen interest in the peculiar circumstances
brought on by the epidemic and the COVID-19 pandemic’s consequences on institutions,
educators, trainers, and learners [10].
BN and an artificial neural network look the same, both of these are directed graphs
that “perform math” by taking a set of inputs and predicting the results. The key differ-
ence between BNs and artificial neural networks is that BNs contain intrinsic meaning
underneath their structure. In the past 10 years, BN has been a popular modeling strat-
egy for a variety of statistical issues. A network’s vertices and edges are given semantic
information in BN, which is a visual analysis of the network. The BN structure provides
a plethora of information, but BN evaluation enables us to decide the assessment. The
artificial neural network structure gives us nothing about the conditional dependence
between the variables.
Intelligent Tutoring System (ITS), is a knowledge-based system, aspires to suc-
cessfully interchange a human teacher with a machine. The knowledge base model,
pedagogical model, learner behavior model, and learner assessment model make up a
Bayesian system. The learner evaluation method is described, and some ambiguity is
236 R. B. Kaliwal and S. L. Deshpande
handled by the BN. Based on their responses to questions, the learner assessment repre-
sentation accurately identifies each learner’s knowledge level [11]. A Bayesian area is
used in the learner knowledge modeling technique used in ITS. Based on the learner’s
answers to questions, the real learner model’s analytical skills are assessed [12, 13].
Millan et al. [14] investigate and employ a Bayesian network for learner model. This
study concentrated on first-degree mathematical problems. Every topic is calculated as
a set of four questions, each of which has only one right answer. Questions were split
into six degrees of difficulty using Bloom’s taxonomy [15, 16] in order to be suited to
the learner’s level of knowledge.
Applying an inquiry strategy to find the applicant system and learning Bayesian
networks as a data source for task estimates of an application system’s state have made
significant progress [10, 18–20]. As part of this, a knowledge-based optimization basis
method for Bayesian networks is being developed.
A directed acyclic graph (DAG) plays important role to illustrate a Bayesian network
model, which shows the conditional interdependencies among random variables and
edges. Edges in the underlying network indicate the relationships between variables and
nodes. For node interactions, three levels of information are provided: graphical which is
pictorial, functional which is operational, and numerical [17]. At the graphical level, the
compulsory connections onto nodes and edges are illustrated. The distributions such as
conditional probability and joint probability among the nodes are algebraically stated at
the functioning level of specification; however, actual probability placed with a desired
node is measured at the numerical phase. Equation 1 illustrates the general premise of
Baye’s theorem.
P( e|H ) ∗ P(H )
P( H |e) = (1)
P(e)
The terms “hypothesis” and “evidence” named for H and e, respectively, and Baye’s
theorem updates the marginal probability situated along with H depending on a particular
unit of proof e. For a stated e, the probability of H is calculated by multiplying the
posterior hypothesis for probability distribution P(H|e) by the prior probability P(H).
Using Eq. 1, which illustrates the entire calculation used in this attempt to compute
every learner of replies inquiries. Implies,
• P(H|e), event H indicates the learner is about to be replied, and e is event indicates
that the learner answered to question. To calculate P(H|e), you must first understand
the below terms:
• P(e), a learner answering a preceding question.
• P(H), a learner answering the present question.
238 R. B. Kaliwal and S. L. Deshpande
Where the nodes are questions that are E1, E10, and so on, and the directed arrows
are Probability True (PT) and Probability False (PF), both of which are conditional
probabilities, as listed in Table 1 (where CPT is a Conditional Probability Table). The
learner’s reply to the questions in Fig. 3 should be calculated using Eq. 1: as shown in
case 1 and 2.
P( E1|E10) ∗ P(E10)
P( E10|E1) = (2)
P(E1)
Table 1. CPT
Questions PT PF
E1 0.5 0.5
E6 0.52 0.48
E10 0.55 0.45
When compared to K-nearest and J48, the recommended strategy (Baye’s network
employing K2) performs better. Tables 2 through 4 compare the two. Table 2 indicates
that the J48 decision tree outperformed the K-nearest for 197 learners. The outcome
of the ensuing tree structured data identifies the characteristics that have high effect on
how learners react to and do not respond to questions, and this prediction is 92.38%
right. J48 takes longer than K-nearest to form a tree but has a better accuracy rating.
When Baye’s network is worked with K2, the learner’s best accuracy is 93.00%, while
the fraction of unanswered questions is 7.0%. The three techniques under examination
surpassed each other accuracy and computing speed while employing a Baye’s network
with K2. As shown in Tables 3 and 4, the findings for 248 and 1736 number of learners
were compared with Baye’s network utilizing the K2, KNN, and J48 techniques.
6 Conclusion
References
1. Reigeluth, C.M.: Instructional Theory and Technology for the New Paradigm of Education,
RED-32 (1994)
Bayesian Network Model 241
2. Coman, C., Tire, L.G.: Online teaching and learning in higher education during the coronavirus
pandemic: learner’s’ perspective. Sustainability (2020)
3. Kaliwal, R.B., Deshpande, S.L.: Evaluate learner level assessment in intelligent e-learning
systems using Bayesian network model. In: Artificial Intelligence and Machine Learning for
EDGE Computing (2022)
4. De Bruyn, E., Mostert, E. Van Schoor, A.: Computer-based testing - the ideal tool to assess
on the different levels of Bloom’s taxonomy. In: 14th International Conference on Interactive
Collaborative Learning, 11th International Conference Virtual University, pp. 444–449 (2011)
5. Goguadze, G., Sosnovsky, S., Isotani, S., McLaren, B.M.: Evaluating a Bayesian learner
model of decimal misconceptions. In: Proceedings of the 4th International Conference on
Educational Data Mining (2011)
6. Natarajan, M.: Evaluation methods for e-learning: an analytical study. Scientific society of
advanced research and social change. Int. J. Libr. Inf. Sci. 1(1) (2015)
7. https://nasscom.in/
8. Mislevy, R., Gitomer, D.: The role of probability-based inference in an intelligent tutoring
system, CSE Technical report 413, CRESST (1996). On World Wide Web at http://cresst96.
cse.ucla.edu/CRESST/pages/reports.htm
9. Alkhuraiji, S., Cheetham, B., Bamasak, O.: Dynamic adaptive mechanism in learning manage-
ment system based on learning styles. In: 11th IEEE International Conference on Advanced
Learning Technologies, pp. 215–217 (2011)
10. Allo, M.D.G.: Is the online learning good in the midst of Covid-19 Pandemic? The case of
EFL learners, pp. 1–10 (2020)
11. Chakraborty, B., Sinha, M.: Learner evaluation model using Bayesian network in an intelligent
e-learning system. IIOAB J. 7 (2016). ISSN: 0976–3104
12. Khodeir, N., Wanas, N., Hegazy, N., Darwish, N.: Bayesian based learner knowledge mod-
eling in intelligent tutoring systems. In: 6th EEE International Conference on E-Learning in
Industrial Electronics (ICELIE) (2012)
13. Anderson, H., Koedinger, M.: Intelligent tutoring goes to school in the Big City. Int. J. Artif.
Intell. Educ. 30–43 (1997)
14. Millán, E., Descalço, L., Castillo, G., Oliveira, P., Diogo, S.: Using Bayesian networks to
improve knowledge assessment. Comput. Educ. 436–447 (2013)
15. De Bruyn, E., Mostert, E., Van Schoor, A.: Computer-based testing - the ideal tool to assess
on the different levels of Bloom’s taxonomy. In: 14th International Conference on Interactive
Collaborative Learning, ICL 2011, pp. 444–449 (2011)
16. Kaliwal, R.B., Deshpande, S.L.: Efficiency of Bayesian network model for assessment in
E-Learning system. In: Int. J. Recent Technol. Eng. (IJRTE) 9(3), 562–566 (2020). ISSN:
2277–3878 (Online)
17. Kaliwal, R.B., Deshpande, S.L.: Design of intelligent e-learning assessment framework using
Bayesian belief network. J. Eng. Educ. Transf. 34 (2021). eISSN 2394–1707
18. Nakayama, M., Mutsuura, K., Yamamoto, H.: Contributions of learner’s assessment of reflec-
tions on the prediction of learning performance. In: 17th IEEE International Conference on
Information Technology Based Higher Education and Training (2018)
19. Askari, M.B.A., Ahsaee, M.G.: Bayesian network structure learning based on cuckoo search
algorithm. In: 6th Iranian Joint Congress on Fuzzy and Intelligent Systems (2018)
20. Pratiwi, O.N., Syukriyah, Y.: Question classification for e-learning using machine learning
approach. In: International Conference on ICT for Smart Society (ICISS) (2020)
Where You Think Stock Takes with the Linear
Regression Model
{wlsharpe,emmoseng,agalustian}@captechu.edu
Abstract. This paper seeks to analyze and predict the course of Mastercard stock
using three different Python libraries: SciKit Learn, XGBoost, and TensorFlow.
This paper details information regarding machine learning algorithms and the
linear regression model in particular. The paper presents the results of looking
through the data and comparing some companies’ results with one another. Our
study showed that leaner regression results with Scikit, XGBoost and TensorFlow
library provide very high accuracy. The confident prediction for lower values, not
to say the small increase in deviation for higher values was any worse.
1 Introduction
The stock market helps firms raise money to sustain their operations by selling stock
shares while also generating and preserving wealth for individual investors. Nevertheless,
the stock market has risks, and investors are continually looking for ways to reduce those
risks while increasing returns.
News, such as public news and social media, can impact the process of stocks. Deep
learning algorithms can be used for market stock prediction using this information as
data sources. The time Series Model uses historical data on stocks to predict future stock
market trends. Time series work mainly by using huge amounts of data, observing past
changes, and using that to predict future changes. Deep Learning Models use a recurrent
neural network to facilitate the modeling of time series data. Sentiment from stock-
related data in social media is analyzed and added to build a model predicting future
trends. Subjective, emotional factors, such as what is read in the news, can influence
investors, and interfere with investment decisions. Deep learning essentially analyzes
that news and constructs a trend prediction model with those sentiments in mind [1, 7].
To conclude, the changes in the stock market play an important role in the country’s
economic trends. Short-term and long-term trend forecasting is a hot topic and concern
for investors in the market.
People are eager to try new methods and tools to predict stocks and reduce risk.
More effective forecasting techniques and technologies lead to better outcomes, more
precise forecasts, and more profits.
The paper discusses various predictive models for a traded stock, it was apparent
that multiple options should be assessed for something so volatile. The research was
undertaken to find an appropriate stock to assess and the different models to compare
for assessing it. Multiple models showed promising results and applications for these
models are extrapolated.
Machine learning can be a powerful tool when analyzing different data sets. It can
accurately notice trends and patterns and use this information to attempt to predict either
future trends or understand what the data itself is. Machine learning can be as advanced
as handwriting detection or as simple as predicting the end of a word (as one might
encounter when using a word. It uses already given data and utilizes it to the best of its
ability [6].
This experiment aimed to get machine learning to predict future stock prices based
on previously released numbers. The data was gathered from Kaggle.com, which is a
site commonly used for gathering large amounts of data for the purpose of data mining
[4, 5].
Our contribution.
1. We conducted a comprehensive literature review.
2. We conducted several experiments using Python libraries to predict stock prices.
3. The three models are compared using data pulled live via multiple methods, and the
research is made mutable enough to apply to other stock prices.
The following is how the rest of the paper is structured: The introduction is covered
in Sect. 1, and the related work is covered in Sect. 2. Then, Sect. 3 is about experiments.
Finally, Sect. 4 concludes the research paper.
2 Related Works
For a nation, the financial system is significant and has become more so over time. The
stock market may be dangerous, and investors are constantly looking for strategies to
lower those risks and boost earnings.
To begin, the scale on which RNN can be applied is not limited to a single person
forecasting stock prices. The observation and analysis of a system utilizing periodic mea-
surements to monitor changes to engineering structures such as buildings and bridges
are known as structural health monitoring (SHM) [19]. Researching new, more efficient,
and innovative methods of SHM has exploded in popularity in the civil engineering field,
especially using a method called “unsupervised learning” with data acquired from the
field. However, this currently only focuses on detecting damage in simple structures. A
novel takes on an unsupervised learning framework t uses a hybrid network of neural net-
works and long short-term memory to detect damage in large-scale complex structures.
Tests with this framework resulted in a successful structure health diagnosis accuracy of
93% and 85% for damage detection and localization, respectively. In other words, using
RNN to detect damage to infrastructure and where that damage is located [14].
244 B. S. Rawal et al.
We know that RNN can be used to predict things over time, but what about predicting
time itself? People experience time differently, especially the young and old, and using
RNN we can simulate an internal clock like what would be found in people. It works
by using RNN to simulate a group of neurons like you would find in a brain. We start
with a simple RNN. Recall that a recurrent neural network is comprised of four layers:
input, hidden, context, and output. The input layer works as a buffer for the signal, the
signal is transformed at the hidden layer as units interact without artificial neurons. The
activity is recorded in the context layer and then reinjected into the hidden layer. Finally,
the output layer sums up the signals sent to it. This is but one possible model for judging
time using RNN [15].
Different types of machine learning methods and models get good results, what
if we combined them? To improve the performance of prediction over time models,
or time series forecasting, we combine the characteristics of a Graph Convolutional
Network (GCN) and a Bidirectional Long Short-Term Memory network (BiLSTM).
The combined approach has better performance than the two separately and with a
lower error percentage across the metrics used. The combined model used two models
trained in each separately, then generates a new model from their outputs. It then uses
the results of that as input over and over for training until it makes its prediction [16].
Using recurrent neural networks to predict prices is fine and all, but what about using
them to predict inflation? Inflation is bad news for a country’s economy, and it is only
going up these days. This affects the financial sector and being able to predict inflation
helps investors properly adjust their asset holdings, firms adjust their prices, and so on.
Long short-term memory model (LTSM) is great for this for four reasons: one, LTSMs
are flexible. Two, under conditions with even mild regularity LTSMs can give fairly
accurate results. Three, LTSMs were developed specifically for sequential data analysis.
Four, new optimization routines made training neural networks much easier. The result
was an improved performance of the NN model across all tables [17].
A daring leap - the predictive capabilities of RNN can be used for optimization and
reservoir management. Oil wells, water wells, is it worth digging here? Surprisingly,
RNN may be used to answer this. A proxy model is developed to predict well-by-
well oil and water rates, given time and bottom-hole pressure schedules. It enables an
estimation of values needed for overall optimization. First, the good rates are predicted
using LTSM, then a CNN is used to generate a 3D oil-water flow simulation [18].
In addition to everything that neural networks may be used to anticipate, what about
earthquakes and how resilient are buildings during them? Even predicting diseases such
as Alzheimer [11, 12, 20]. The plan is to train the network with a small number of samples
by employing a partitioning approach. Neural networks and probability techniques are
used to produce a novel seismic analysis tool. The models developed fall under one of two
categories, ‘white box’ and ‘black box.’ White box models are based on physical laws
and are usually mechanical models while black boxes are not based on any previous
knowledge about the structural system. Previously, simple neural networks could not
handle dealing with nonlinear data, like structure behavior during an earthquake, but
with the development of deep learning, the capacity of NNs has been greatly improved.
There are some limits to this but to get around them, prior data of the structure during a
quake is added to help describe the nonlinear behavior [13].
Where You Think Stock Takes with the Linear Regression Model 245
Machine learning has many applications and a particularly big one is stock price
prediction. The stock market is volatile, dynamic, and nonlinear, and making accurate
price predictions is challenging. Macro and micro factors, politics, economic factors,
unexpected events, and company performance can all influence prices. But at the same
time, there is plenty of data to find patterns in, which means it’s possible to use analytic
techniques to identify and predict trends. To write an algorithm to take in data, learn,
and predict future stock prices, there are several methods we can use, and a few are
highlighted below [6, 8].
Long Short-Term Memory, or LTSM for short, is a deep learning technique developed
to deal with the vanishing gradients problem encountered in long sequences. In particular,
it handles sequential data such as time and text. The update gate, forget gate, and output
gate are the three primary components of LTSM. The output gate chooses how much
data to output as activations to the following layer, while the update and forget gates
choose whether each memory cell’s component is updated. Because it can recognize
long-term dependencies in sequential data, LTSM is well suited for jobs like predicting
stock prices [14].
The Moving Average technique is a second option. The anticipated value is the mean
of the preceding timestep(N) values. We define the current adjusted closing price as the
meaning of the preceding N days’ adjusted closing price [14].
A third method of machine learning is the Recurrent Neural Network (RNN) [10]. It
is a common deep-learning technique used for pattern recognition and takes into account
how data can change over time. Time-series data, like stock prices, are a common topic
for its use. RNNs are like short term memory portions of the brain, remembering recent
memories and creating context. Over time, they create a feedback loop that preserves
short-term and long-term memory over time [6–8]. These are but a few types of machine
learning, but we will not be using them for this paper.
A common method of machine learning and the one we will be using for this paper is
Linear regression. This version of machine learning is a linear approach to modeling the
relationship between one or more input variables and a single output variable. A simple
linear regression equation is the line equation:
y = mx + b (1)
where m is the slope/gradient of the polynomial of the line y (predict coefficient) and
b is the intercept of the line (bias coefficient). Linear regression is excellent when it
comes to numbers, which is what we need for this paper. Part of the process is finding
coefficients, essentially the degree to which the line slopes upwards or downwards. The
goal with these is to find such coefficients that the difference between each point in the
dataset with its corresponding predicted value is minimal.
A stock’s closing price and date are the parameters we particularly want for our linear
regression model and the close price is especially important for traders. The goal is to
predict as accurately as possible with as few errors as possible. The linear regression
model is used to train a function to form a predictive model from a data set of x and y
values, prices, and time essentially. After being trained, if you want to predict x from an
unknown y, this function can be used [9].
246 B. S. Rawal et al.
3 Experiments
3.1 Experiment-I
The scope of this experiment is to create a model to assess and predict a stock price over
time. This proposal allowed freedom of choice for the stock in question, the method of
data retrieval, and the method of prediction, provided the latter used machine learning.
3.1.1 Scope
• Training on an offline dataset
• Retrieving data from an API
• Retrieving data via web scraping
• Short- and long-term prediction
• In-application visualization
In addition to these components, data retrieval via Python library was also
implemented for reasons to be explained further.
3.1.2 XGBoost
Methodology
Initial decision making regarded the choice of stock. After a cursory search of datasets
on the website Kaggle to pull from, Mastercard seemed to have a valuable DataFrame
to work with [1]. This was because having an initial, offline dataset to pull from would
allow the paper to focus initially on manipulating and predicting data as the main focus
before concerning itself with how to collect the data. Next, sources of machine learning
were to be identified. The three primary candidates that were decided upon were SciKit
Learn’s linear regression model, SGBoost, and Tensorflow.
3.1.3 XGBoost
XGBoost offered a similar level of familiarity to the researcher. It offers a suite of
automated optimizations within ranges to choose from, though the decision was made
to have it optimized between a decision tree and a linear regression algorithm. It was not
apparent at the time of research, but its linear regression algorithm is markedly similar
to that of SKLearn’s.
3.1.4 TensorFlow
The researcher had the least familiarity with Tensorflow, save for machine learning in
the context of image generation in another course. Tensorflow was similar to XGBoost
in its automated tuning, though required additional constraints to define the bounds of
this tuning.
All three methods settled on linear regression algorithms, with a predetermined 80–
20 train-test split. In the case of XGBoost, given a range of 1–20 scalars, the algorithm
found between 16 and 18 scalars appropriate, depending on the date range of the collected
Where You Think Stock Takes with the Linear Regression Model 247
data. It also consistently found that targeting squared log error was more optimized than
squared error alone.
In the instance of Tensorflow, a predefined 10 epochs were decided upon to mini-
mize training loss and to speed up testing time. Additional research into an appropriate
optimizer found Adam’s optimization to be the best fit for his research.
After this, data collection was to be focused on. While the model worked on static
data, the initial proposal promised live data as well. While API data was readily available
for a large fee, scraping initially proved far more promising. Though most websites fea-
ture page layouts designed to deter scraping, the program could successfully pull from
certain pages at any time during the day and apply the current stock price to the model.
As [3], refers to the target page used for scraping in the current iteration of the model.
However, this approach did not account for historical data. For that, the appropriate
approach was determined to be the finance Python library. This had the advantage of
no paywall, immediate results, and any timeframe requested. From what it appears, the
data retrieved for [1] and [2] both used this method if the names of the columns are any
indication.
The confident prediction for lower values, not to say the small increase in deviation
for higher values was any worse.
3.1.6 XGBoost
Figure 2, represent XGBoost found its optimal results between 16 and 19 decision trees
with the decision tree approach, but linear regression fielded superior results. Once
targeting the squared-log error, the R2 value was >.9999, identical as far as significant
figures went to that of SKLearn’s result. The mean squared and meant absolute errors
were also identical at 4.98 × 10–6 and 1.30 × 10–3 , respectively. The graph shows similar
if not greater confidence with lower values, though less at higher stock values.
confident prediction for lower values, not to say the small increase in deviation for
higher values was any worse.
3.1.7 Tensorflow
Figure 3, represents the Tensorflow offered as the most challenging of the three options.
The researcher had the least familiarity with it and required frequent consultation of
the documentation to determine the use of various parameters. However, the results are
still markedly accurate. While not quite to the same degree as SKLearn and XGBoost, a
.9996 R2 accuracy is remarkable all the same. Similarly, a mean squared error of 1.82 ×
10–5 is ten times that of the previous two entries, though still infinitesimal, as is a mean
absolute error of 2.61 × 10–3 , which is double that of the previous two strategies.
Where You Think Stock Takes with the Linear Regression Model 249
Fig. 3. Tensorflow
Additionally, despite iterating over the dataset ten times, the training loss was less
than 0.012. This allows more leeway for further epochs, though this would likely make
overtraining concerns worse. We can see the model error in Fig. 4.
3.1.8 Analysis
The data presented shows clear statistical anomalies in the sheer accuracy provided. The
amount of data available allows for many iterations of training even under an 80–20
split. Should further research be conducted, the split would likely be closer to 60–40 or
perhaps lower.
Of the three models, the Tensorflow model performed the worst, though it remains
highly accurate. XGBoost and SKLearn are identical numerically, but graphically
SKLearn seems to perform slightly better in terms of assigning a line of best fit to
the testing accuracy.
Assuming that the models are accurate, there are three potential use cases for a model
like this. First of which, the existing model can be used to predict future changes to the
Mastercard stock without extreme external influence on the value of the stock, as the
initial proposal intended to do. Secondly, were any new models to be trained on this same
dataset, they could be compared to the accuracy of this model to determine their value.
Finally, were the stock price of Mastercard to suddenly deviate from anticipated trends
that this model predicts, the magnitude of these trends can be assessed and recorded.
The accuracy of this model was found to be highest between the XGBoost and
Tensorflow models, which can predict the closing value of a Mastercard stock daily with
astonishing accuracy. However, such accuracy calls into concern the possibility of model
overtraining, and when fed different data or perhaps a different stock altogether, may
produce far less accurate results.
3.2 Experiment-II
3.2.2 Methodology
The experiment started with research into linear regression and understanding how it
works. After researching linear regression and machine learning, a dataset was needed
for testing. Figure 5, shows different datasets that were found and tested so that there
would be a way to see comparisons when tested against one another. After the data was
gathered, it then needed to be pruned and cleaned to ensure that it would be usable for
further testing purposes.
Then there was the initial step of analyzing the data and seeing the trends of the
previous years. The closing prices were looked at and compared to one another, as were
the volumes.
Where You Think Stock Takes with the Linear Regression Model 251
Then the next step was viewing the moving averages of the different companies’
stocks throughout their run. This was so that one could potentially see a less jagged
graph, as the initial line graphs of the closing prices varied quite a lot.
Figure 6, shows the course of trade volume of those four companies.
Fig. 6. The trade volume of Facebook, Mastercard, Coca-Cola, and Nvidia Stock
Then the next step was viewing the moving averages of the different companies’
stocks throughout their run. This was so that one could potentially see a less jagged
graph, as the initial line graphs of the closing prices varied quite a lot.
Figure 7 Representing the moving average volume of stocks, we can notice that there
is continuous growth over the year by year.
Then some computation was required to find the daily return of the stocks, as this
can be deemed important for those looking into stocks.
252 B. S. Rawal et al.
Fig. 7. Moving average volume of Facebook, Mastercard, Coca-Cola, and Nvidia Stocks
And then we have the predicted trends shown on a linear map for each company.
Below is the linear map for Facebook. Figures 8, 9, and 10, shows the linear map for
Facebook, MasterCard, and CocaCola respectively.
Fig. 8. Facebook.
Where You Think Stock Takes with the Linear Regression Model 253
Fig. 9. MasterCard.
3.2.3 Analysis
While we can find data, at first it was difficult to see how everything would compare to
one another. The initial testing with linear regression made me nervous as some of the
scores were looking very confident in some areas while being wildly incorrect in others
[9].
All of the predicted data in the new section started below the previous day’s closing
prices, which was consistent between the models. With further inspection of the models,
some would seem to be closer to the actual data when shifted to start with the closing
price of the model.
254 B. S. Rawal et al.
We can better comprehend how the data manifested itself after being able to see the
data with the plots. There are frequent peaks and dips in comparable locations when
comparing the correct values to the prediction, demonstrating a comprehension of the
data. The resulting graphs’ trends for all 4 organizations revealed commonalities in their
behavior relative to the real values. The graphs reliably displayed the major peaks and
dips, further demonstrating how the program was able to forecast future stock prices.
Throughout the process, we can better understand how data behaved with itself, see-
ing the relations between daily revenue vs the closing costs, and how having the correct
data to balance against is important. Overall, it was a very strong learning experience.
3.3 Experiment-III
For this setup, we used a Jupiter Notebook as our test environment. We chose linear
regression as our method but also experimented with other machine learning algorithms
such as LTSM and RNN [10]. Attempts at using LTSM lead to software issues as certain
libraries would consistently fail to import required tools and result in errors. Models
could not be established, alternative platforms would not import datasets at all, and
eventually, we dropped LTSM as a method.
Another method used was RNN. This method succeeded but the calculation process
made our computer very hot, with the fan running at full tilt. We found this concerning
and when the algorithm was finished, plotting out the data showed the results were
somewhat inaccurate. As a result, we forwent other methods and focused on linear
regression. Given that those other methods were rejected, screenshots from them are not
included in this report [10].
First, we imported the necessary libraries and a dataset of stock prices going back
roughly thirty years. Figure 11 shows the description of the first five rows of the stock
dataset.
Fig. 11. The description of the first five rows of the stock dataset
Then we examined the dataset and dropped unnecessary columns. Among the data
columns is the “close” forecast, which is what we want to predict.
Where You Think Stock Takes with the Linear Regression Model 255
Figure 12 and 13 reflects the X and Y variables, representing date and price respec-
tively, for training and testing are established and initialized. The X variable is assigned a
dataset of opening prices, highs, lows, and volume of stocks and the Y variable is assigned
the closing price. Next, we import train_test_split from sklearn.model_selection and split
our variables to train using one set of X and Y, and test using the other.
4 Conclusion
Based on a preliminary trial run, it may be concluded that the nation’s economic patterns
are significantly influenced by movements in the stock market. Market investors are quite
interested in and concerned about both short- and long-term trend forecasts.
256 B. S. Rawal et al.
References
1. Author, Mastercard Data Set with Day Columns 2006 to 2010, Kaggle, 29 Jan-
uary 2023. https://www.kaggle.com/datasets/rashmithreddy/mastercard-data-set-with-day-
columns-2006-to-2010. Accessed 29 Jan 2023
2. Kohli, N.: Stock prediction using linear regression - starter, Kaggle, 20 August
2020. https://www.kaggle.com/code/nikhilkohli/stock-prediction-using-linear-regression-sta
rter. Accessed 29 Jan 2023
3. “MasterCard Stock Price Today: NYSE Ma Live Ticker,” Investing.com, 23 April 2023.
https://www.investing.com/equities/mastercard-cl-a. Accessed 23 Apr 2023
4. James, G., Witten, D., Hastie, T., Tibshirani, R., Taylor, J.: Linear regression. In: An Introduc-
tion to Statistical Learning, pp. 69–134. LNCS, Springer Texts in Statistics. Springer, Cham
(2023). https://doi.org/10.1007/978-3-031-38747-0_3
5. Rahman, K.: Facebook stock data - live and latest. Kaggle, 14 April 2023. https://www.kaggle.
com/datasets/kalilurrahman/facebook-stock-data-live-and-latest. Accessed February 2023
6. What is machine learning? IBM. (n.d.). https://www.ibm.com/topics/machine-learning.
Accessed February 2023
7. Li, J.: Research on market stock index prediction based on network security and deep learning.
Secur. Commun. Netw. 2021, 1–8 (2021)
8. Soni, P., Tewari, Y., & Krishnan, D.: Machine learning approaches in stock price prediction:
a systematic review. J. Phys. Conf. Ser. 2161(1), 012065. IOP Publishing (2022)
9. https://machinelearningmastery.com/linear-regression-for-machine-learning/
10. https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks
11. Yu, Z., Wang, K., Wan, Z., Xie, S., Lv, Z.: Popular deep learning algorithms for disease
prediction: a review. Clust. Comput. 26(2), 1231–1251 (2023)
12. Warren, S.L., Moustafa, A.A.: Functional magnetic resonance imaging, deep learning, and
Alzheimer’s disease: a systematic review. J. Neuroimaging 33(1), 5–18 (2023)
13. Wang, T., Li, H., Noori, M., Ghiasi, R., Kuok, S.C., Altabey, W.A.: Seismic response prediction
of structures based on Runge-Kutta recurrent neural network with prior knowledge. Eng.
Struct. 279, 115576 (2023). prediction-6c1994da8001
Where You Think Stock Takes with the Linear Regression Model 257
14. Eltouny, K.A., Xiao, L.: Large-scale structural health monitoring using composite recurrent
neural networks and grid environments. Comput.-Aided Civ. Infrastruct. Eng. 38(3), 271–287
(2023). Machine Learning Techniques Applied to Stock Price Prediction, https://towardsda
tascience.com/machine-learning-techniques-applied-to-stock-price
15. Hallez, Q., Mermillod, M., Droit-Volet, S.: Cognitive and plastic recurrent neural network
clock model for the judgment of time and its variations. Sci. Rep. 13(1), 3852 (2023)
16. Lazcano, A., Herrera, P.J., Monge, M.: A combined model based on recurrent neural networks
and graph convolutional networks for financial time series forecasting. Mathematics 11(1),
224 (2023)
17. Almosova, A., Andresen, N.: Nonlinear inflation forecasting with recurrent neural networks.
J. Forecast. 42(2), 240–259 (2023)
18. Kim, Y.D., Durlofsky, L.J.: Convolutional–recurrent neural network proxy for robust
optimization and closed-loop reservoir management. Comput. Geosci. 27(2), 179–202 (2023)
19. Structural Health Monitoring with bolt load cells | BoltSafe. https://boltsafe.com/structural-
health-monitoring/
20. Eltouny, K.A., Liang, X.: Large-scale structural health monitoring using composite recurrent
neural networks and grid environments. Comput. Aided Civ. Infrastruct. Eng. 38(3), 271–287
(2023)
Analysis of Parent with Fine Tuned Large
Language Model
1 Introduction
In the domain of artificial intelligence and natural language processing, large language
models have emerged as groundbreaking innovations, fundamentally re shaping the way
machines engage with human language. These models equipped with billions of param-
eters and driven by deep learning frameworks, have opened up new possibilities across a
variety of applications, from generating text and translating languages to analyzing sen-
timents and answering questions. In the dynamic landscape of AI research, open-source
models like GPT4ALL’s ggml models and Facebook’s Llama model offer capabilities
such as question answering, natural language processing, text generation, and text sum-
marization. Their open-source nature provides ample opportunities for customization
and finetuning to suit specific needs. Various techniques have been used for fine-tuning
these models, with Low Rank Adaptation (LoRa) being a popular method that involves
freezing the weights of the trained models and incorporate trainable rank decompo-
sition matrices into the transformer architecture’s many layers. Due to this, there are
much fewer trainable parameters for downstream activities. However, in recent months,
a novel fine-tuning approach called Qlora has emerged. Qlora efficiently reduces mem-
ory usage to the extent that a 65-billion parameter model can be fine-tuned just on one
48 GB GPU while ensuring complete 16- bit fine-tuning productivity. Qlora achieves
this by propagating gradients into Low-Rank Adapters (LoRA) through a frozen, 4-bit
quantized pre-trained language model. As a result, the average memory requirements
for fine-tuning a 65-billion parameter model are reduced from over 780 GB of GPU
memory to less than 48 GB, without compromising runtime or predictive performance
when compared to a fully fine-tuned 16-bit model [1, 2, 4, 5].
The major contribution of this paper is to conduct a thorough analysis of two promi-
nent large language models. The foundational model is Facebook’s Llama, and the subse-
quent model is Guanaco, which underwent fine-tuning using Qlora, a notable commercial
approach. Our investigation aims to uncover the unique characteristics, capabilities, and
idiosyncrasies that distinguish these models. By examining their architectures, training
methodologies, and real-world performance, Our goal is to offer a thorough insight into
both the strengths and limitations of these approaches, emphasizing the benefits of Qlora
training compared to conventional models [1, 4, 5].
This paper is structured as follows, with Sect. 2 discussing the review of conventional
works along with suitable problems. Following this, Sect. 3 presents an overview about
the large language models, benchmark tests and relevant descriptions. The datasets used
are discussed in Sect. 4. Section 5 shows the results and finally, the entire study is
summarized in Sect. 6 with further suggestions.
2 Literature Survey
In this review of existing literature, we deeply explore QLORA, an innovative tech-
nique designed to address the resource constraints associated with training and fine-
tuning extremely large language models. The primary focus of this examination is on
the components of QLORA and its implications within the research community [1].
Fine-tuning of massive language models is essential for tailoring them to spe-
cific tasks. Nevertheless, this process can be computationally demanding and memory-
intensive. The introduction of the QLORA technique by the research team presents a
promising solution, allowing the fine-tuning of a massive 65- billion-parameter model
can operate on a single 48GB GPU, and it can sustain optimal performance for 16-bit
fine-tuning tasks. This achievement represents a significant breakthrough in the field, as
it democratically grants access to cutting-edge language models [1].
260 V. Baviskar et al.
A notable achievement: The Guanaco model family: One of the most remarkable
outcomes of QLORA is the emergence of the Guanaco model family. This family not
only pushes the boundaries of previous language models but also surpasses all previ-
ously disclosed models on the Vicuna benchmark. Achieving a 99.3% performance level
comparable to ChatGPT on this benchmark serves as a testament to the effectiveness of
the QLORA technique. Impressively, fine- tuning the Guanaco models requires just 24
h on only one GPU, making them attainable to a broader research community [6].
The paper in question introduces an innovative approach known as Low-Rank Adap-
tation (LoRA), which aims to address the constraints associated with full fine-tuning in
large-scale language models like GPT-3 175B. This literature review’s purpose is to
provide an overview of the fundamental concepts, methodologies, and discoveries pre-
sented in the paper, emphasizing the importance of LoRA within the field of Natural
Language Processing (NLP) [2, 4].
The foundation of LoRA is rooted in the concept of pre-training on extensive general-
domain datasets, followed by fine-tuning for specific tasks in a particular domain. As
the size of models increases, fine-tuning becomes increasingly computationally expen-
sive, particularly with models like GPT-3 175B, which possess an immense number of
parameters. Conventional fine-tuning methods involve retraining all model parameters,
resulting in high memory and computational demands. In response to this challenge, the
authors introduce LoRA as a more efficient alternative [2].
LoRA involves the immobilization of pre-trained model weights and the insertion
of trainable rank decomposition matrices into each layer of the transformer architec-
ture. This procedure substantially decreases the count of trainable parameters for subse-
quent tasks, all the while preserving or potentially improving model performance. The
fundamental stages of LoRA are outlined as follows: [2, 3]
1. Keep the weights of the pre-trained model fixed.
2. Incorporate trainable rank decomposition matrices into each layer.
3. Only modify the rank decomposition matrices during the adaptation process.
3 Methodology
Our analysis revealed the following key findings:
MMLU which stands for Massive Multitask Language Understanding, represents a novel
benchmark meticulously designed to gauge the extent of knowledge acquired during the
pretraining phase. It does so by assessing models exclusively in zero-shot and few-
shot contexts, thereby introducing a more challenging evaluation approach that closely
resembles the way we assess human capabilities. This benchmark encompasses a dif-
ferent array of 57 subjects, spanning science, technology, engineering management, the
humanities, the social sciences, and more. The difficulty levels of the benchmark vary
from basic to highly advanced, and it assess both general real world knowledge and
problem-solving proficiency. The subjects encompass not only conventional domains
like mathematics and history but also delve into specialized areas such as law and ethics.
Analysis of Parent with Fine Tuned Large Language Model 261
The benchmark’s extensive coverage and depth make it exceptionally well-suited for
pinpointing the limitations or gaps in a model’s knowledge.
The scores are then derived by averaging the model’s score across the tests. For
example, on the question answering tasks in the MMLU benchmark, humans typically
achieve F1 scores of around 90. This suggests that an average human would likely score
in the 80s or 90s on the MMLU benchmark overall.
We have two main ways to get information out of a model to evaluate it:
– get the probabilities that some specific tokens groups are continuations of the prompt
and compare these probabilities together for our predefined possible choices as
mentioned in the Fig. 1
– get a text generation from the model (by repeatedly selecting tokens as we have seen in
Fig. 1) and compare these text generations to the texts of various predefined possible
choices.
The Elo rating system, conceived by Arpad Elo, a physics professor of Hungarian-
American origin, is a method employed to assess the relative skill levels of participants
in competitive games characterized by a zero-sum outcome, such as chess. Elo designed
this system as an enhanced replacement for the earlier Harkness system used in chess.
Additionally, the Elo system has found applications as a rating method in a wide range of
activities, including Soccer, American football, baseball, basketball, billiards, ping pong,
assorted board games, esports, and, in more recent times, extensive language models.
In Eq. 1, if player X possess a rating denoted as Rx and player Y a rating denoted as
Ry (Assumed values of Rx and Ry are equal at the beginning of the test), the precise
formulation employing the logarithmic curve with a base of 10, governing the probability
of player X emerging victorious is as follows:
1
EX = (1)
1 + 10(RY −RX ) /400
262 V. Baviskar et al.
Using the collected data, we compute the Elo ratings of the models here put the main
results in the Table 1.
In Eq. 2, the players’ ratings can undergo linear updates following each encounter.
Assuming player X, with a rating denoted as Rx, was anticipated to achieve Ex points
but actually scored Sx points, the formula for adjusting that player’s rating is
The architecture of ARC (Abstraction and Reasoning Corpus) is designed to test the
reasoning capabilities of language models. It consists of a benchmark dataset of image-
based reasoning tasks that require the production of an output image given a specific
input. The ARC tasks are designed to test four core knowledge systems: objectiveness,
agentness and goal-directedness, numerical knowledge, and elementary geometry and
topology [7].
The ARC dataset consists of 1,000 reasoning tasks, each with 2 to 5 input- out-
put image pairs provided as training instances. The training inputs are different from
the actual test input, but they can be solved using the same underlying procedure. The
tasks are open-ended, with objects having different shapes and colors and forming var-
ious relations with each other. The grid size can also vary between tasks, making them
unsolvable through search algorithms.
To solve ARC tasks, a language model needs to encode the 2D input-output images
into a textual representation. This can be done by representing each pixel’s color numer-
ically or with color descriptors. The encoded images are then incorporated into prompts
that instruct the language model to solve the task. Two single-stage strategies for
prompting the language model have been explored.
The ARC architecture also includes the 1D-ARC dataset, which is a simplified,
single-dimensional version of ARC. The 1D-ARC tasks aim to make ARC tasks more
approachable for language models by reducing task complexity and dimensionality. The
performance of language models on the 1D-ARC dataset has been evaluated, revealing
improvements but also highlighting the limitations of simplification alone in bridging
the reasoning gaps of language models.
In order to enhance the reasoning capabilities of language models for ARC tasks,
an object-based approach has been proposed. This approach integrates an external tool
called the ARGA (Abstraction and Reasoning Graphs with Attention) framework, which
Analysis of Parent with Fine Tuned Large Language Model 263
ARC Task
ARC Solution
Text Prompt
Encoder
Instructions Large
Language
Examples
Model
Output
Parser
– Encode the input-output images: Convert the 2D input-output images into a tex-
tual representation. This can be done by representing each pixel’s color numerically
or with color descriptors. Use delimiters to separate adjacent pixels and “newline”
characters to separate rows in the image.
– Prompt the language model: Incorporate the encoded images into prompts that instruct
the language model to solve the task. Explore different strategies for prompting the
language model, such as single-stage strategies.
– Apply reasoning and abstraction: Utilize the reasoning capabilities of the language
model to analyze the input-output pairs and identify patterns, relationships, and trans-
formations. Apply abstraction techniques, such as object-based representations, to
enhance the model’s understanding of the task.
– Generate the solution: Based on the analysis and reasoning, generate the output image
that corresponds to the given input. The solution should align with the patterns and
transformations observed in the input-output pairs.
– Evaluate the solution: Compare the generated output image with the expected output
image provided in the test instance. Assess the correctness of the solution based on
how well it matches the expected output.
Google colab with it’s Intel Xeon, Nvidia T4 gpu and Ubuntu 22.04 was used.
264 V. Baviskar et al.
4 Dataset Used
Llama was trained using the following data sets:
4.2 HH-RLHF
This dataset is composed of evaluations made by humans regarding the degree of help-
fulness and harmlessness. It encompasses 160,800 instances, with each data point pre-
senting two responses from an assistant to a user’s query, along with a human assessment
of the most favorable response [9].
4.3 FLAN V2
The FLAN v2 compilation comprises 1,836 tasks that have been expanded with carefully
curated templates and diverse formatting patterns, culminating in a total of more than
15 million examples. The Llama models underwent training using task combinations
outlined in the FLAN v2 collection [10].
4.4 Guanaco
Guanaco’s training is based on the proprietary OASST1 dataset, a multilingual collection
gathered by OpenAI. However, the text does not specify the training data for Llama.
Guanaco underwent fine-tuning using the OASST1 dataset, known as the Open
Assistant dataset, featuring 161,443 distinct messages from 66,497 conversations across
35 languages. This dataset was sourced through crowd-sourcing efforts.
We have compared both the models Guanaco 7B and Llama 7B by implementing the
following benchmark tests. Overall scores of both the models using all benchmark tests
is shown in Table 1 and the comparative graphs are shown in Fig. 3.
Scores here representation of the models ability in above tests. For more
information about how these test give their scores refer to Sect. 3 in the paper.
1. Abstract Reasoning Corpus (ARC)
2. Massive Multitask Language Understanding (MMLU)
3. ELO Rating(On Vicuna Benchmark)
Figure 3 represent the comparison between the models Llama 7B and Guanaco 7B
here Guanaco presents better results in MMLU as well as ARC test. Llama shown slightly
better performance than guanaco in the ELO ratings [11].
266 V. Baviskar et al.
6 Conclusion
To sum up, this paper has offered an extensive analysis of two notable large language
models, specifically Llama and Guanaco. By conducting a thorough assessment across
multiple facets, encompassing benchmark task performance and practical utility, we’ve
acquired valuable perspectives on the merits and limitations of each model.
Furthermore, the training methodologies employed for these models played a sig-
nificant role in their respective performance. Llama’s extensive training on a diverse
dataset made it a robust general-purpose language model, while Guanaco’s fine-tuning
process tailored it to a specific domain, enhancing its task-specific performance.
References
1. Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: QLoRA: Efficient Finetuning of
Quantized LLMs (2023)
2. Hu, E J.: LORA: Low-Rank adaptation of Large Language Models, 17 June 2021
3. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International
Conference on Neural Information Processing Systems (NIPS’17), pp. 6000–6010. Curran
Associates Inc., Red Hook, NY, USA (2017)
4. Radford, A., Narasimhan, K.: Improving Language Understanding by Generative Pre-
Training (2018)
5. Touvron, H., Lavril, T., Izacard, G., et al.: LLaMA: open and efficient foundation language
models (2023)
6. Hadi, M.U., Al-Tashi, Q., Qureshi, R., et al.: Large language models: a comprehensive survey
of its applications, challenges, limitations, and future prospects (2023)
7. Xu, Y., et al.: LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the
Importance of Object-based Representations
8. https://huggingface.co/datasets/OpenAssistant/oasst1/blob/main/README.md. Accessed
12 Apr 2023
9. https://huggingface.co/datasets/HuggingFaceH4/hh-rlhf. Accessed 25 Sep 2023
10. https://huggingface.co/datasets/SirNeural/flan_v2. Accessed 23 Feb 2023
11. https://github.com/huggingface/blog/blob/main/evaluating-mmlu-leaderboard.md. Accessed
28 Sep 2023
AI Content Detection
India
Abstract. The rise of AI-generated data, mainly from models like ChatGPT,
LLAMA2 poses serious difficulties to academic integrity and raises worries about
plagiarism. The current research looks on the competences of various AI con-
tent recognition algorithms to distinguish between human and AI-authored mate-
rial. This research looks at numerous research papers, publication years, datasets,
machine learning approaches, and the benefits and drawbacks of detection meth-
ods in AI text detection. Various datasets and machine learning techniques are
employed, with various types of classifier emerging as a top performer. This work
creates an Extra tree classifier that can distinguish ChatGPT produced text from
human authored content. “ChatGPT Paraphrase” dataset was used for model train-
ing and testing. The result shows that the proposed model resulted in 80.1% accu-
racy and outperformed the existing models namely Linear Regression (LR), Sup-
port Vector Machine (SVM), Decision Tree, (DT), K-Nearest Neighbour (KNN),
Ada Boost Classifier (ABC), Random Forest Classifier (RFC), Bagging Classifier
(BG), Gradient Boosting Classifier (GBC).
1 Introduction
In an era marked by the pervasiveness of digital text, the capacity to differentiate between
individuals or human-generated data and AI-generated data has become an increasingly
important issue [1]. The evolution in Artificial Intelligence (AI) technology, represented
by sophisticated language models like ChatGPT, has heralded a new era of informa-
tion distribution in which text created by algorithms like GPT-3.5 may closely replicate
human language. This situation raises critical queries about the authenticity, credibility,
and ethical implications of the text that permeates our digital landscape [2]. Fast growth of
artificial intelligence has led to a growing need for tools to identify the source of content.
By identifying the distinctive characteristics of AI and human-generated content, this
project aims to provide a tool for identifying misinformation, authenticity, intellectual
property protection, responsible AI use, user empowerment, bridging the gap between AI
and humans, ethical AI development, and meeting societal needs. AI content detectors
can be used by journalists to verify the authenticity of news articles, by social media users
to detect fake news and propaganda, and by teachers to identify students who are using
AI-generated text [3]. This not only preserves the integrity of their work but also safe-
guards their competitive advantage in the market. By detecting instances of plagiarism
and copyright infringement, businesses can also take legal action to protect their rights
and reputation [4]. This research paper delves into the realm of detecting AI-generated
text, exploring the evolving techniques, challenges, and implications associated with
this endeavor. As AI systems, including large language models (LLMs) like ChatGPT,
continue to advance in their natural language processing capabilities, their capacity to
produce text that is identical to human-generated content grows. Consequently, the need
for robust and reliable methods to discern the origin of digital text has never been more
urgent [5]. The goal of this work is to showcase a comprehensive picture of landscape
of AI-generated text detection, including both the technologies used and the ethical
concerns that underpin this discipline. Through an analysis of state-of-the-art detection
techniques, we seek to shed light on the evolving arms race between AI text generators,
such as GPT-3.5, and detectors, including those developed to identify text produced by
AI language models [6]. Moreover, this research delves into the multifaceted impact of
AI-generated text, ranging from the propagation of disinformation to its potential use
in enhancing human communication with AI systems like ChatGPT. By understand-
ing the various dimensions of AI text generation and detection, this work seeks to add
to a deeper comprehension of the challenges and opportunities which lie ahead in our
increasingly AI-mediated information ecosystem. In a world where the margins between
machine-generated and people or human generated text blur, this paper serves as a vital
exploration into the realm of detecting AI-generated text, ultimately aiming to equip
researchers, policymakers, and technologists with the knowledge needed to navigate the
complex terrain of digital communication in the 21st century [7].
The primary goal of this research is to differentiate between data produced by Chat-
GPT and data published by human authors. Using the publicly available Kaggle dataset
named ChatGPT Paraphrases, every model is reevaluated.
The following is the paper’s format: Sect. 1 presents a brief overview of ChatGPT,
its advantages and disadvantages, as well as an overview of AI-generated content detec-
tion. Section 2 conducts a survey to summaries past work on AI content detection and
approaches. Section 3 discusses the research objectives and execution details, while
Sect. 4 discusses the outcomes. Section 5 outlines the Conclusion as well as the future
scope.
AI Content Detection 269
2 Literature Survey
The research papers on distinguishing human generated text from LLM generated text
are analyzed in this study. The analysis involves many significant criteria such as source
for research papers, publication years, datasets under consideration, frequently used
Machine and Deep learning approaches, evaluation metrics used and their advantages
and disadvantages [8]. Based on the review a novel Extra tree classifier approach is
described and implemented, which is described in Sect. 3. First, queries related to the
aims of the review are identified [9].
The following research questions have been proposed:
• RQ1: Different sources to get the related research papers?
• RQ2: What are the years from which the related research is taken?
• RQ3: What are the datasets used in such research papers?
• RQ4: What are widely used ML and DL techniques used in these papers?
• RQ5: What are the benefits and drawbacks of approaches for distinguishing human
text from material created by LLM?
• RQ6: What are various evaluation metrics used to compare and analyze results?
Search Clue
The search clue contains search terms as well as approaches for obtaining all relevant
research papers on distinguishing human text from Large Language Model (LLM)-
generated text. The following search strings were chosen: ChatGPT-generated text
detection.
arXiv
MDPI
LET Journal
Publications
Springer
IEEE
Figure 2 shows that research publications are chosen from IEEE, Springer, arXiv,
MDPI, and other sources. This review includes 14 downloaded papers in total.
IEEE
Springer
CellPress
MDPI
arXiv
RQ2: What are the years from which the related research is taken?
AI Content Detection 271
Among all articles published between 2018 and 2023, the majority of those under con-
sideration for review are from 2021 and 2022, followed by 2023[8]. In the year 2023,
12 of 14 articles are published. Figure 3 shows that the majority of the publications
accepted for study are from 2023.
14
12
10
8
Count
0
2021 2022 2023
Publication Year
Table 1. (continued)
RQ4: What are widely used ML and DL techniques used in these papers?
ML and DL approaches are used in all of the research articles. ML techniques such as
SVM, LR, DT, KNN, RF, ABC, BC, GBC and Extremely Randomized Tree are used
in papers, as are DL approaches such as Long Short-Term Memory and Multi-Layer
Perceptron.
RQ5: What are the benefits and drawbacks of approaches for distinguishing human text
from material created by LLM?
Table 2 describes the advantages and disadvantages of various techniques.
AI Content Detection 273
Table 2. (continued)
Table 2. (continued)
Table 2. (continued)
addressing the research problem while rigorously assessing the accuracy of the models
in classifying the text data’s source.
In this research, we harnessed paraphrase data generated by ChatGPT, a language
model. This valuable dataset was sourced from Kaggle and comprises four essential
columns: “text,” “paraphrase,” “category,” and “source.” In the “text” column, one can
find original sentences or questions, while the “paraphrase” column houses correspond-
ing paraphrased versions, artfully crafted by ChatGPT. The “category” column serves
as a categorization mechanism, assigning each text entry to either a question or sen-
tence category. Lastly, the fourth column, “source,” offers insights into the origin of the
text, providing information such as whether it originated from platforms like Quora,
CNN News, or Squad 2. This dataset forms a fundamental cornerstone of our research,
enabling a comprehensive exploration of paraphrasing and its applications. The system
architecture and various modules implemented to achieve outcomes are highlighted in
Fig. 4, which is displayed below.
As shown in Fig. 4, the system Architecture is divided into six modules which are
described as –
Module 1 - Data Collection:
In this module, the dataset used is called “ChatGPT Paraphrases,” obtained from Kaggle.
The dataset comprises 419,197 rows and contains four columns. After preprocessing, we
shrunk the dataset into two columns and used only 50000 rows. Among its text samples,
approximately 51.68% are generated by humans, with the remaining 48.31% generated
by ChatGPT. The dataset snapshot that was used is displayed in Fig. 5 below.
Module 2 - Data Pre-processing (Data Splitting and Data Vectorization):
This module is dedicated to data pre-processing. There are two important steps within
it, those are as mentioned below:
Word Vectorization:
• TF-IDF word vectorization: Transforming text data into numerical representations
using the TF-IDF technique.
278 R. Sable et al.
• Dataset balancing: Addressing class imbalance within the dataset, ensuring equitable
representation of human-generated and ChatGPT-generated text.
4 Results
Model performance is measured using evaluation metrics. Each assessment metrics gives
a unique viewpoint on the outcome. Five evaluation metrics are employed in this paper
namely: Accuracy, Precision, Recall, F1 score, and Matthews correlation coefficient
(MCC) [13, 14]. The percentage of accurately anticipated cases is calculated by accuracy.
It is determined by dividing the total number of accurately predicted samples by the total
number of samples, as stated in Eq. 1.
TP + TN
Accuracy = (1)
TP + TN + FP + FN
While precision calculates true positive predictions from all positive cases, recall
assesses true positive predictions from actual positive instances. The accuracy and recall
formulas are shown in Eqs. 2 and 3, respectively.
TP
Precision = (2)
TP + FP
TP
Recall = (3)
TP + FN
The F1-score represents the mean of precision and recall which is stated by Eq. 4.
This measurement is more balanced than precision and recall.
Precision.Recall
F1 score = (4)
Precision + Recall
Table 4 highlights the performance of various existing technique and proposed
techniques according to the evaluation metrics like Accuracy and F1-score. Figure 6
compares several models based on accuracy.
280 R. Sable et al.
The confusion matrix, which is displayed in Fig. 7, shows the count of predicted and
actual values, is used to assess the performance of the Extra Tree Classifier.
Figure 8’s Receiver Operating Characteristic (ROC) curve plots the True Positive
Rate (TPR) versus the False Positive Rate (FPR) at different threshold levels to examine
the effectiveness of the Extra Tree Classifier.
Upon thorough experimentation and evaluation, several key findings have emerged
from the above implementation:
Model Performance: The performance of a variety of machine learning models, includ-
ing Logistic Regression, Support Vector Machine, Decision Tree, K-Nearest Neighbor,
Random Forest, Extra Trees Classifier, Ada Boost Classifier, Bagging Classifier, and
AI Content Detection 281
Gradient Boosting Classifier was assessed and it was found that Extra Tree Classifier
outperformed other existing model.
Accuracy: It is observed that Extra Trees Classifier exhibited the highest accuracy at
80.1%, followed closely by the Support Vector Machine and Random Forest models,
achieving accuracies of 77.9% and 77.3%, respectively.
F1-Score: The F1-score, which balances precision and recall, was another crucial metric
considered in evaluation. The Random Forest model demonstrated the highest F1-score
282 R. Sable et al.
at 0.780, closely followed by the Extra Trees Classifier at 0.799. This highlights their
effectiveness in maintaining a balance between false positives and false negatives.
Model Selection: While multiple models yielded promising results, the most appro-
priate model should consider the specific requirements and constraints of the applica-
tion. The Extra Trees Classifier, with its impressive performance in both accuracy and
F1-score, stands out as a strong candidate for practical deployment.
Further Optimization: It’s essential to note that model performance can often be
further optimized through hyper parameter tuning and feature engineering.
5 Conclusion
This research provided valuable insights into the ability of various machine learning
models to distinguish between human-generated and ChatGPT-generated text. While
the Extra Trees Classifier emerged as the top performer in our evaluation, the choice of
model should be made judiciously, taking into consideration the specific requirements
of the application at hand. These findings pave the way for improved AI content detec-
tion tools and have significant implications for applications in fields such as plagiarism
detection and content moderation. This study shows the tested performance against nine
machine learning classifiers with different hyper-parameters. Table 1 holds the detailed
performance analysis of different models. From the results, it is clearly visible that the
extra tree classifier outperforms all the classifiers with an accuracy of 80.1%. The results
also demonstrate that some well-known classifier such as the K-Nearest Neighbour and
Decision Tree classifier performs poorly on this dataset. Some regularization techniques
may improve the performance. These results show that the model is not biased to a par-
ticular class. Future work may involve fine-tuning the selected models to achieve even
higher classification accuracy.
References
1. Gritsay, G., Grabovoy, A., Chekhovich, Y.: Automatic detection of machine generated texts:
need more tokens. In: 2022 Ivannikov Memorial Workshop (IVMEM), Moscow, Russian
Federation, pp. 20–26 (2022). https://doi.org/10.1109/IVMEM57067.2022.9983964
2. Elali, F.R., Rachid, L.N.: AI-generated research paper fabrication and plagiarism in the
scientific community. CellPress https://doi.org/10.1016/j.patter.2023.100706
3. Uzun, L.: ChatGPT and academic integrity concerns: detecting artificial intelligence generated
content. Technology (LET Journal) 3, 45–54 (2023)
4. Khalil, M., Er, E.: Will ChatGPT get you caught? Rethinking of plagiarism detection. In:
Zaphiris, P., Ioannou, A. (eds.) Learning and Collaboration Technologies. HCII 2023. LNCS,
vol. 14040, pp. 475–487. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-34411-
4_32
5. Elkhatat, A.M., Elsaid, K., Almeer, S.: Evaluating the efficacy of AI content detection tools
in differentiating between human and AI-generated text. Int. J. Educ. Integr. https://doi.org/
10.1007/s40979-023-00140-5
6. Ma, Y., et al.: AI vs. Human -- Differentiation Analysis of Scientific Content Generation.
arXiv, arXiv:2301.10416 [cs.CL]
AI Content Detection 283
7. Islam, N., Sutradhar, D., Noor, H., Raya, J.T., Maisha, M.T., Farid, D.M.: Distinguishing
Human Generated Text from ChatGPT Generated Text Using Machine Learning. arXiv, arXiv:
2306.01761 [cs.CL]
8. Alamleh, H., AlQahtani, A.A.S., ElSaid, A.: Distinguishing human-written and ChatGPT-
generated text using machine learning. In: 2023 Systems and Information Engineering Design
Symposium (SIEDS), Charlottesville, VA, USA, pp. 154–158 (2023). https://doi.org/10.1109/
SIEDS58326.2023.10137767
9. Corizzo, R., Leal-Arenas, S.: One-class learning for AI-generated essay detection. Appl. Sci.
(Switzerland) 13(13). https://doi.org/10.3390/app13137901
10. Weber-Wulff, D., et al.: Testing of Detection Tools for AI-Generated Text. arXiv. arXiv:2306.
15666 [cs.CL]
11. Katib, I., Assiri, F.Y., Abdushkour, H.A., Hamed, D., Ragab, M.: Differentiating chat genera-
tive pretrained transformer from humans: detecting ChatGPT-generated text and human text
using machine learning. MDPI, Mathematics (2023). https://doi.org/10.3390/math11153400
12. Mitrovic, S., Andreoletti, D., Ayoub, O.: ChatGPT or Human? Detect and Explain. Explaining
Decisions of Machine Learning Model for Detecting Short ChatGPT-Generated Text. arXiv.
arXiv:2301.13852 [cs.CL]
13. Harada, A., Bollegala, D., Chandrasiri, N.P.: Discrimination of human-written and human
and machine written sentences using text consistency. In: 2021 International Conference
on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India,
pp. 41–47 (2021). https://doi.org/10.1109/ICCCIS51004.2021.9397237
14. Mitchell, E., Lee, Y., Khazatsky, A., Manning, C.D., Finn, C.: DetectGPT: Zero-Shot
Machine-Generated Text Detection using Probability Curvature. arXiv, arXiv:2301.11305
[cs.CL]
Developing an Efficient Toxic Comment
Detector Using Machine Learning Techniques
Abstract. Social media has changed the way people communicate, but it has also
become a breeding ground for dangerous content. Natural Language Processing
(NLP) is used in this study to classify unstructured data into dangerous and benign
categories, providing insights about internet toxicity. The NLP approach used in
the study gives light on the challenges and opportunities of toxicity identification.
The researchers uncovered patterns and trends indicative of dangerous content by
analysing massive amounts of text data, allowing them to construct powerful classi-
fication systems. The paper discusses the advantages and disadvantages of toxicity
detection. Automated systems can swiftly scan enormous amounts of content, but
they may misclassify some material, thereby leading to censorship or harassment.
The online toxicity detection provide valuable guidance for stakeholders seeking
to address this issue. By understanding the strengths and limitations of NLP-based
approaches, informed decisions can be made about implementing effective toxicity
detection strategies, ensuring a safer and more inclusive digital environment.
1 Introduction
In today’s world, the internet can lead you to a more advanced information environment.
However, there are also many disadvantages to this, such as the bad and harmful remarks
made by some internet users. Most harmful comments are spread via social media. An
AI model to detect harmful comments on such platforms to identify such comments
can be used [1]. To determine the percentage of toxicity in a comment, the toxic com-
ment detector employs algorithms for data cleaning and pre-processing. Text messages
containing threatening, derogatory, profane, racist, and other remarks are toxic. Toxic
remarks are detected without the need of humans using a variety of ways. Even though
negative behavior on social media is now the norm, it is still not acceptable. Toxic social
behavior is the spread of unjustified animosity or negativity that has a negative effect
on people who are exposed to it. Bullies online try to incite hatred and control other
people’s interactions. For instance, toxic behavior in online team competition games.
They found a link between a match’s result and the emergence of harmful behavior.
Toxic remarks can be seen on social media sites like Twitter on issues that are difficult
to discuss [2]. Subjects tend to be controversial, which increases the likelihood of toxic
behavior. Trend analysis can be used to better understand the dynamics of online discus-
sions in addition to aiding in moderation. One step in the content moderation process
is identifying offensive comments. To achieve this, the term “toxicity” is defined and
specify its subtypes. Additionally, the shown multiple deep learning methods, datasets,
and architectures designed specifically for sentiment analysis in online chats. Compa-
nies have started reporting remarks and barring individuals who are found guilty of
using foul language to prevent users from being exposed to inappropriate language on
internet forums or social media platforms [3]. To filter out foul language and shield
internet users from experiencing online harassment and cyberbullying, several Machine
Learning models have been created and put into use [4].
2 Related Work
In recent years, studies have been conducted on toxic comment detection. Most of these
studies employ machine learning techniques as Natural Language Processing and Deep
Learning. One of the early studies was conducted by proposed a framework that uses
logistic regression for binary classification of comments as toxic or non-toxic [5]. Neu-
ral network methods have been employed in several research to analyse hate speech;
few authors used numerous deep learning architectures in large experiments to learn
semantic word embedding to handle poisonous comment recognition Another study
presented a deep neural network-based sentiment analysis model for YouTube video
comments, which led to 70–80% accuracy [5]. Additionally, the general use of various
neural network methods for comment categorization has been extensively employed in
recently published literature [6]. However, these methods only addressed some of the
task’s issues, leaving others unresolved [7]. However, further unsupervised techniques
and approaches. In the study of literature they developed efficient and successful algo-
rithms for identifying and filtering these comments [8]. They used models like Support
Vector Machines (SVM) and Naive Bayes (NB), as well as more recent deep learning
models like Convolutional Neural Networks (CNN) and Recurrent Neural Networks
(RNN) [9, 10]. In the study of ML methods for toxic comment classification, the author
used machine learning techniques to conduct a thorough evaluation of the state-of-the-
art in the classification of toxic comments [11]. From 31 carefully chosen, pertinent
primary studies, they extracted data [12]. They investigated the following aspects of the
primary study analysis: the data set utilised, the evaluation metric, the machine learning
techniques used, the toxicity classes, and the comment language [13].
3 Data Pre-processing
There are several techniques for pre-processing such as tokenization, stemming, stop-
word removal and text-normalization. Various studies have highlighted the importance of
data pre-processing. For instance, in a study by the authors performed to kenization, stop-
word removal, and stemming on the dataset to enhance the performance of their model
and applied text normalization and stemming techniques to preprocess their data [9].
286 P. Bajaj et al.
3.2 Stemming
The stemming process involves removing suffixes from words to obtain their base form.
For example, the stem of the word “walking” would be “walk”, and the stem of the word
“played” would be “play”. This is done using algorithms that analyze the structure of
words and apply rules to strip off suffixes.
3.5 Lemmatization
Same words in different injected form are grouped together by the process of lemmatiza-
tion. It is mostly used in process like computation linguistics, natural language processing
(NLP) and chatbots.
Developing an Efficient Toxic Comment 287
3.8 TF-IDF
The acronym tf-idf, which stands for frequency-inverse document frequency (also known
as TF*IDF, TFIDF, TF-IDF, or Tf-idf), is a metric that quantifies a word’s significance
to a document inside a corpus or collection, accounting for the fact that certain terms are
used more frequently than others overall. It was commonly used as a weighting factor
in searches involving text mining, user modelling, and information retrieval. A 2017
survey found that 83% of text-based recommender systems in digital libraries employed
tf-idf [11]. Versions of the tf-idf weighting method are widely used by search engines as
a vital tool for classifying and assessing a document’s relevance to a user query. One of
the simplest ranking functions is made by adding the tf-idf for each search word; there
are several more intricate ranking algorithms.
4 Model Used
Several machine learning models have been employed in previous studies for toxic com-
ment detection, including logistic regression, SVM, Naïve Bayes, Random Forest, and
deep learning models like Long Short-Term Memory (LSTM) networks and Convolu-
tional Neural Networks (CNNs). In a literature review the authors used a Random Forest
model for toxic comment detection [12]. Similarly, a recent research paper [13] used
an LSTM network to achieve improved performance compared to other models. Most
of the chosen primary research has classified harmful remarks from the datasets given
in the previous section of this study using more than one machine learning technique.
The table below lists the number of primary studies in which a certain machine learning
technique was applied. As per literature review, RNN, LSTM, and logistic regression
among these two LSTM layers and four convolution layers. The score of 0.9645 indi-
cates the best accuracy [14]. Different deep neural networks are the most popular and
efficient approaches, however simpler and for baseline approaches, faster methods like
logistic regression were often used (Table 1).
Table 1. List of methods of Machine learning algorithm applicable for toxic comment detector.
It’s a linear model that works well for tasks involving binary classification. A log-
like function is employed to simulate the likelihood of the intended class. To get the
probability of the positive class, the input features are multiplied by weights, and the
resulting sum is then run through the logistic function.
A bidirectional LSTM, or BiLSTM, is a sequence model that has two LSTM layers, one
for processing input in the forward direction and the other for processing in the backward
direction. It is usually used in conjunction with NLP-related activities. The fundamental
idea behind this approach is that the model can better understand the relationship between
sequences (e.g., by identifying the words that come before and after another in a sentence)
if it processes data in both directions.
In order to capture long-range dependencies in the input sequence, this particular type
of Recurrent Neural Network (RNN) was created. It has a memory component with a
long information storage capacity. Input mechanisms are used to add and remove data
from a cell in a selective manner.
A sizable corpus of text data served as the model’s pre-training set for deep learning. It
can be honed for a particular purpose, such as the detection of harmful comments. The
trans-former architecture used by BERT allows it to extract long-range dependencies
from the input sequence.
290 P. Bajaj et al.
A superior machine learning algorithm is the Nave Bayes classifier, which is used for
classification tasks such as text classification. It aims to replicate the distribution of
inputs within a particular class or category since it is a member of the generative learning
algorithm family.
The popular machine learning method known as random forest, which combines the out-
put of multiple decision trees to generate a single result, was developed by Leo Breiman
and Adele Cutler. Because it can solve problems with regression and classification, its
adaptability and usability are what drive its widespread use.
This model is easy to use and effective for classification tasks. It builds a model of
decisions and their potential outcomes that resembles a tree. Every leaf node in the tree
represents a class label, and every internal node in the tree represents a decision based
on a feature.
This model is non-parametric and applicable to tasks involving classification. A new data
point is classified according to the training data’s k nearest neighbours’ class. Although
KNN is easy to use and straightforward, it can be computationally costly for large
datasets.
5 Problem Statement
Although a sizable number of internet comments found in public spaces are sometimes
positive, a large fraction is toxic in nature. Online data sets are treated to reduce noise
before being downloaded. The dataset must be processed by the machine learning model
in the form of a transformation of the raw comments before feeding it to the classification
models since the comments contain a lot of errors that multiply the features. The pro-
cessed dataset is trained using the logistic regression technique to distinguish between
harmful and non-toxic comments. To rectify this problem the problem statement as to
Developing an Efficient Toxic Comment 291
develop a multi-headed model that can recognise diverse toxins, such as threats, obscen-
ities, insults, and hate speech motivated by a person’s identity. The objective of an AI
toxic comment detector is to automatically identify and flag potentially harmful or offen-
sive language in written text. The purpose of this technology is to help individuals and
organizations prevent the spread of toxic comments and hate speech online, promote
a safe and respectful environment for discussion and communication, and ultimately
reduce the harm caused by online harassment, cyberbullying, and other forms of virtual
abuse. The AI toxic comment detector achieves this objective by analyzing text using
natural language processing (NLP) techniques and machine learning (ML) algorithms
to identify patterns and characteristics associated with toxic comments (Fig. 1).
Fig. 1. Workflow
6 Dataset
Data for the Toxic Comment Detector project was collected from various sources, includ-
ing a Google Forms survey, Kaggle data, and different social media platforms like
LinkedIn and Instagram. The purpose of the survey was to collect user-generated con-
tent, especially comments that were deemed toxic or offensive. Participants were asked
to submit comments they saw or received that they felt were inappropriate or harmful.
One important consideration when using this dataset is the potential bias of the data.
For example, survey respondents may not be representative of the general population
and may have specific views or opinions. Similarly, social media data can be biased
against certain populations or communities. Therefore, it is important to carefully con-
sider potential biases when using this data set and take steps to correct them. The research
292 P. Bajaj et al.
team then manually reviewed the comments to ensure they met the criteria for inclusion
in the dataset. The Kaggle dataset was collected for the same purpose to detect toxic com-
ments. The dataset contains more than 150,000 comments from various sources, some of
them were news articles, social media platforms and discussion forums. Comments were
flagged by human authors to indicate whether they were toxic or not. Finally, data was
collected from social media, including LinkedIn and Instagram. The data was obtained
by scraping public comments and messages that contained toxic or offensive language.
Overall, the Toxic Comment Detector Project dataset is a comprehensive collection of
comments that have been identified as potentially harmful or inappropriate. It contains
information from various sources and has been manually added to ensure accuracy. The
dataset is used to train and test the machine learning (ML) models of the project (Fig. 2).
6.1 Description
This toxic comment detection project involved several steps in building this machine
learning model that allowed online platforms to accurately identify and classify toxic
comments. The methodology started with data processing, which involved cleaning the
data and removing unnecessary data to ensure the quality and effective analysis of our
data. Then performed data visualization to gain insight into the structure of our data and
identify potential issues that needed to be addressed. Using the data variety of charts and
graphs to visualize the distribution of data, including histograms, bar charts, and scatter
charts were created. This step was useful to identify possible imbalances in the data, such
as an unequal issuing of toxic and non-toxic comments. The next step in our project was
text preprocessing, where the text data was converted into a format that could be analyzed
by machine learning models. Several text preprocessing techniques, including tagging,
derivation, and end-word removal were performed. Tokenization involved dividing the
text data into individual words, while the actual word was reduced to its basic form to
reduce the dimensionality of the data. Stop word removal involved removing common
words that did not affect the meaning of the text. To better understand the most frequently
used words in data, a word cloud was created. The word cloud provided a visual repre-
sentation of the most frequent words in the text data and helped identify the most visible
topics and themes in the comments. To ensure that our machine learning models were
not biased towards one class, we counterbalanced the target column of the dataset. This
Developing an Efficient Toxic Comment 293
also included ensuring that the data contained the same quantity of negative and positive
remarks. We split the data into training and test sets after data balancing. The test set
was used to assess the performance of machine learning models that had been trained
using the training set. Support Vector Machines (SVM), K-Nearest Neighbours (KNN),
Random Forest, Multinomial Naive Bayes (MultinomialNB), Logistic Regression, and
Bernoulli Naive Bayes (BernoulliNB) were among the machine learning algorithms we
employed. These models were selected based on how well they could complete tasks
involving text classification. To assess the effectiveness of machine learning models, we
used a number of evaluation metrics, such as F1 score, precision, and recall. The F1
score is the harmonic mean, recall measures the ratio of true positives to true positives,
and precision measures the ratio of true positives to predicted positives. of precision and
recall. These metrics provided an accurate measure of the model’s effectiveness in pre-
dicting whether a comment was toxic or non-toxic. After evaluating the performance of
the machine learning models, we tested them on new data to ensure their generalizability
and use in real applications. Overall, our toxic comment detection project involved sev-
eral steps to prepare and analyze text data, build machine learning models and evaluate
their performance. The project was important because it helped identify and address
toxic comments on online platforms that could have a negative impact on users and
communities. The use of various machine learning algorithms and evaluation metrics
ensured that our models were accurate and reliable and could be used in real-world
applications to detect and treat toxic comments.
The key accuracy features that can used for assessing the performance of the given model
are provided in the given section and discussed for the key parameters.
The accuracy metrics give a glimpse of the overall assessment of the correctness of the
model by the calculation of the proportion of correctly classified instances out of the
total instances. The formula for accuracy is given below:
(TP + TN)
Accuracy = (1)
(TP + TN + FP + FN)
6.4 Precision
Precision provides the ability for the model to avoid false positives. It is the proportion
of predicted true positives out of all the positive predictions. The formula is given below:
6.7 F1 Score
7 Results
In this study, assessment of how well six different machine learning models perform
in identifying harmful remarks were studied. Support vector machine (SVM), Random
Forest (RF), Bernoulli Naive Bayes (BNB), K-nearest neighbours (KNN), Bernoulli
Naive Bayes, and Multinomial Naive Bayes (MNB) were among the models that were
examined. To compare the performance of each model, F1 scores were computed. Based
on the comparison of F1 scores, it has been discovered that Random Forest and Logistic
Regression both outperformed other models at identifying harmful remarks. These two
models could therefore be thought of as potential possibilities for identifying poisonous
comments. It is crucial to remember that when choosing a model for a certain use
case, additional aspects such as model interpretability, computational efficiency, and
scalability may need to be considered. A computationally effective and comparatively
simple linear model is logistic regression. It is a well-liked option for classification
assignments and has a long track record of accomplishment. On the other side, Random
Forest (RF) is an ensemble model that combines various decision trees to enhance
performance. High-dimensional feature spaces and complex data are two things it is
known for handling (Table 2).
Developing an Efficient Toxic Comment 295
The machine learning models were analsyed to determine if they could correctly
identify the new text as harmful or not after training them. We used test data for this that
wasn’t used during training. Various text inputs, some of which were harmful and others
of which were not, made up the test data. From the experimental data, we trained models
to forecast the toxicity of each input. The derived metrics with accuracy, precision, recall,
and F1 scores to assess the performance of the models. These measurements allowed
us to assess how well the models were able to divide the test data into classifications of
dangerous and non-toxic substances (Fig. 3).
From results it was discovered that the models did a good job of classifying the
experimental data of toxicity and non-toxicity based on the assessment outcomes. The
models’ accuracy, precision, recall, and F1 scores were all within acceptable bounds,
suggesting that they might be used for the detection of harmful remarks in fresh text
streams. Overall, evaluating the models’ effectiveness and ensuring their utility in practi-
cal applications required testing them against experimental data. The test phase findings
allowed us to assess the models’ efficacy and dependability in identifying poisonous
remarks, which in turn allowed us to select the model that best meet our needs. In
conclusion, our research indicates that the models of Logistic Regression and Random
296 P. Bajaj et al.
Forest are equally promising for identifying dangerous substances. However, the final
model selection should be based on a thorough evaluation of various factors, including
interpretability, computational efficiency, and scalability.
8 Conclusion
As discussed many methods for classifying toxic comments in this paper. It was demon-
strated that the techniques have distinct error profiles and may be put together to create
an ensemble with a better F1-measure. The ensemble performs best in classes with few
instances and when there is a large level of volatility within the data. Shallow learners
and deep neural networks are some particularly potent combos. Our error analysis of the
ensemble findings revealed challenging subtasks for the categorization of harmful com-
ments. The inconsistently poor label quality is a significant contributor to mistakes. Many
problems go unanswered because there is a lack of training data that contains uncommon
or highly distinctive words. Its recommended for more study into using embedding to
express knowledge about the world and make it easier to distinguish across paradigmatic
situations.
References
1. Han, X., Tsvetkov, Y.: Fortifying toxic speech detectors against veiled toxicity. In: EMNLP
2020–2020 Proceedings of the Conference on Empirical Methods in Natural Language
Processing Conference (EMNLP), pp. 7732–7739 (2020). https://doi.org/10.18653/v1/2020.
emnlp-main.622
2. Karan, M., Šnajder, J.: Preemptive Toxic Language Detection in Wikipedia Comments Using
Thread-Level Context, pp. 129–134, September 2019. https://doi.org/10.18653/V1/W19-
3514
3. David, F., Guimarães, N., Figueira, Á.: A WebApp for reliability detection in social media.
Procedia Comput. Sci. 219, 228–235 (2023). https://doi.org/10.1016/j.procs.2023.01.285
4. Cherradi, B., Rachidi, R., Ouassil, M.A., Errami, M., Hamida, S., Silkan, H.: Classifying
toxicity in the Arabic Moroccan dialect on Instagram: a machine and deep learning approach.
Artic. Indones. J. Electr. Eng. Comput. Sci. 31(1), 588–598 (2023). https://doi.org/10.11591/
ijeecs.v31.i1.pp588-598
5. Obadimu, A., Mead, E., Hussain, M.N., Agarwal, N.: Identifying toxicity within YouTube
video comment. In: Thomson, R., Bisgin, H., Dancy, C., Hyder, A. (eds.) Social, Cultural, and
Behavioral Modeling. SBP-BRiMS 2019. LNCS, vol. 11549, pp. 214–223. Springer, Cham
(2019). https://doi.org/10.1007/978-3-030-21741-9_22/COVER
6. Zhai, B., Chen, J.: Development of a stacked ensemble model for forecasting and analyzing
daily average PM2.5 concentrations in Beijing, China. Sci. Total. Environ. 635, 644–658
(2018). https://doi.org/10.1016/J.SCITOTENV.2018.04.040
7. Zhang, Y., Liu, B., Cai, J., Zhang, S.: Ensemble weighted extreme learning machine for
imbalanced data classification based on differential evolution. Neural Comput. Appl.Comput.
Appl. 28, 259–267 (2017). https://doi.org/10.1007/S00521-016-2342-4
8. Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global opti-
mization over continuous spaces. J. Glob. Optim.Optim. 11(4), 341–359 (1997). https://doi.
org/10.1023/A:1008202821328
Developing an Efficient Toxic Comment 297
9. Anand, M., Eswari, R.: Classification of abusive comments in social media using deep learn-
ing. In: Proceedings of the 3rd International Conference on Computing Methodologies and
Communication (ICCMC 2019), pp. 974–977, March 2019. https://doi.org/10.1109/ICCMC.
2019.8819734
10. Diab, D.M., El Hindi, K.M.: Using differential evolution for fine tuning naïve Bayesian
classifiers and its application for text classification. Appl. Soft Comput. J. 54, 183–199 (2017).
https://doi.org/10.1016/J.ASOC.2016.12.043
11. Koutsoukas, A., Monaghan, K.J., Li, X., Huan, J.: Deep-learning: investigating deep neural
networks hyper-parameters and comparison of performance to shallow methods for modeling
bioactivity data. J. Cheminform. 9(1) (2017). https://doi.org/10.1186/S13321-017-0226-Y
12. Rupapara, V., Rustam, F., Shahzad, H.F., Mehmood, A., Ashraf, I., Choi, G.S.: Impact of
SMOTE on Imbalanced Text Features for Toxic Comments Classification Using RVVC
Model. https://doi.org/10.1109/ACCESS.2021.3083638
13. Carta, S., Corriga, A., Mulas, R., Recupero, D.R., Saia, R.: A Supervised Multi-class Multi-
label Word Embeddings Approach for Toxic Comment Classification. https://doi.org/10.5220/
0008110901050112
Handwritten English Alphabets Recognition
System
1 Introduction
The Handwritten English Alphabets Recognition System represents a notable example
of the positive impact of machine learning advancements across various sectors. This
state-of-the-art system leverages machine learning methodologies to effectively identify
and interpret handwritten English alphabet characters, a task known for its complexity.
These technological innovations serve as a vital link between the traditional realm of
pen-and-paper and the digital domain, offering invaluable solutions for tasks such as
document digitization, process automation, and efficient data handling. In an era marked
by an increasing dependence on digital information, such technologies play a pivotal
role in enhancing productivity and streamlining operations.
The primary objective of the Handwritten English Alphabets Recognition System is
to autonomously identify and interpret handwritten alphabet characters, thereby elimi-
nating the requirement for manual transcription. This renders it an indispensable asset
across diverse sectors, including education, finance, healthcare, and logistics, where its
impact extends beyond enhancing operational efficiency to mitigating human errors [1].
The Handwritten English Alphabets Recognition System encompasses a diverse
array of machine learning techniques, notably featuring Convolutional Neural Networks
(CNNs) and Recurrent Neural Networks (RNNs) [2]. These deep learning models have
been incorporated due to their extensive training on sizable datasets comprising various
handwritten alphabet samples. As a result of this training, these models possess the
2 Literature Review
The utilization of machine learning techniques for the Handwritten English Alphabets
Recognition System has sparked significant interest across various research domains and
has led to the development of practical applications, as evidenced by the existing body of
research. Among the various approaches, deep learning has emerged as a pivotal method,
contributing to enhanced accuracy and versatility in the recognition of handwritten letters
[4].
Deep learning models have significantly elevated the precision and versatility of
alphabet recognition. These models excel in effectively discerning intricate handwriting
patterns and variations, with numerous studies consistently showcasing their capacity to
generalize across diverse writing styles, sizes, and orientations (Fig. 1).
The content further underscores the diverse and valuable applications of the Hand-
written English Alphabets Recognition System [5]. These technologies enable the auto-
mated grading of handwritten assignments and assessments in educational contexts,
thereby alleviating the burden on educators and ensuring rapid and precise evaluation.
Additionally, these advancements play a crucial role in the extraction of handwritten
text from various sources, including forms, historical manuscripts, and other paper-
based documents, within the document digitization domain [6]. This not only safeguards
cultural heritage but also fosters research endeavors in a highly efficient manner.
The research findings additionally underscore the practical applicability of this sys-
tem within the corporate sector, specifically in domains such as finance and logistics.
It accelerates the pace of data input processes and enhances the overall efficiency of
information management systems in these industries. Moreover, it plays a vital role in
the healthcare sector by facilitating the conversion of patient records into digital formats,
effectively reducing errors stemming from manual data entry and elevating the quality
of healthcare provision [7].
In the realm of published research related to this subject, valuable insights are offered
into ongoing research endeavors aimed at pushing the boundaries of handwritten alphabet
recognition. To further enhance the system’s performance and resilience, researchers are
actively exploring innovative strategies, including but not limited to Transfer Learning,
Ensemble Models, and Attention Mechanisms. These endeavors represent a significant
avenue for advancing the capabilities of the Handwritten English Alphabets Recognition
System.
In the study authored by K. Gaurav, P. K. Bhatia, and their colleagues [8], an inves-
tigation is conducted into a diverse array of pre-processing techniques employed for
character recognition, encompassing a wide range of image types. These images include
not only straightforward handwritten formbased documents but also those featuring intri-
cate attributes like colored backgrounds and complex gradients of intensity. The research
extensively explores various preprocessing methodologies, comprising skew detection
and correction, contrast stretching, binarization, noise reduction, normalization, and
segmentation, along with morphological processing.
One significant revelation from the study is that a single preprocessing method alone
cannot comprehensively address the complexities of image enhancement. Consequently,
the research underscores the necessity of employing a combination of these aforemen-
tioned strategies to optimize pre-processing procedures. Nonetheless, the study acknowl-
edges that despite the application of these multifaceted techniques, achieving complete
accuracy in a pre-processing system remains a challenging endeavor, emphasizing the
intricacies involved in this crucial aspect of character recognition.
Salvador Espaa-Boquera and colleagues proposed an innovative hybrid model in
their research [9] for the identification of unbounded offline handwritten texts. In their
approach, they leveraged a Multilayer Perceptron to estimate emission probabilities,
while Markov chains were harnessed to simulate the structural aspects within the optical
model.
The study also entailed the implementation of diverse techniques aimed at standard-
izing the dimensions of text images and mitigating variations in slope and inclination
within handwritten text. These normalization processes were executed using supervised
Handwritten English Alphabets Recognition System 301
vast collections of handwritten samples. This evolution has substantially elevated recog-
nition accuracy levels. This breakthrough fundamentally reshaped our comprehension
of handwritten alphabets and their automated recognition capabilities.
During the late 2000s, there was a notable surge in the implementation of machine
learning methodologies in practical settings. This era witnessed the introduction of auto-
mated systems designed for the recognition of handwritten alphabets, particularly in
educational contexts. These systems played a pivotal role in streamlining grading pro-
cesses, offering a more efficient means of assessment, and alleviating the workload of
educators.
Up until my last knowledge update in September 2021, ongoing research efforts
persisted in enhancing these systems through techniques such as Transfer Learning and
Attention Mechanisms. The evolution of the Handwritten English Alphabets Recognition
System serves as a testament to the perpetual quest for efficiency and precision within
the realm of machine learning. This development underscores the escalating significance
of this field within our progressively technology driven society [16].
3 Proposed Methodology
The suggested methodology for the Handwritten English Alphabets Recognition System
employing Machine Learning Approaches adopts a thorough and structured strategy for
addressing the complex task of identifying handwritten alphabet characters. In order to
attain exceptional accuracy and adaptability, this approach harnesses the potential of
machine learning techniques, with a specific focus on Convolutional Neural Networks
(CNNs) and Recurrent Neural Networks (RNNs).
The initial phase of the methodology revolves around the meticulous gathering and
organization of data. An extensive dataset of handwritten English alphabets is meticu-
lously curated, encompassing a wide spectrum of writing sizes, orientations, and styles.
To ensure the optimal performance of the model, a suite of data processing techniques
is applied, including image expansion, standardization, and data augmentation. This
critical phase lays the foundation for the model’s ability to effectively generalize its
recognition capabilities across a diverse array of handwriting patterns [17].
Subsequent to the data preparation stage, the methodology proceeds with the selec-
tion of an appropriate model and the design of its architecture. In this context, Convo-
lutional Neural Networks (CNNs) are chosen for their proficiency in capturing spatial
hierarchies within image data, thus serving as the foundation for feature extraction. Con-
currently, Recurrent Neural Networks (RNNs) are introduced into the architecture to
account for temporal dependencies within the strokes and the sequential nature of hand-
written text. The fusion of these two neural network paradigms significantly enhances the
system’s precision in recognizing alphabetic characters, elevating its overall performance
and adaptability.
In the modeling phase, the preprocessed dataset is fed into the hybrid CNN-RNN
model. This allows the model to acquire the capability to discern diverse handwriting
styles by adjusting its internal parameters throughout the training process. To ensure
convergence and prevent overfitting, the training phase is iterative, with continuous
evaluation of the model’s performance (Fig. 2).
304 R. Kumar et al.
4 Proposed Approach
The suggested method for developing the Handwritten English Alphabets Recognition
System utilizing Machine Learning Approaches presents an innovative and methodical
approach to address the intricate task of recognizing handwritten alphabet characters.
The primary objective is to attain exceptional accuracy, adaptability, and versatility in
alphabet recognition through the utilization of machine learning techniques, specifically
harnessing the capabilities of Convolutional Neural Networks (CNNs) and Recurrent
Neural Networks (RNNs) (Fig. 3).
1. Data Collection and Preprocessing: The methodology begins with the assem-
bly of an extensive and diverse dataset comprising handwritten English alphabets.
This dataset encompasses a wide spectrum of writing styles, sizes, and orientations,
ensuring the model’s exposure to a rich variety of handwriting patterns. To elevate
the dataset’s quality and consistency, several preprocessing methods are employed,
including image scaling, normalization, and data augmentation. These crucial steps
are essential to facilitate the model’s effective generalization across a multitude of
diverse handwriting samples.
2. Model Architecture: Our approach employs a hybrid CNNRNN architecture, com-
bining the strengths of these two distinct types of neural networks. CNNs excel in
capturing spatial hierarchies within images, a critical aspect for identifying unique
characteristics in handwritten alphabets, making them the ideal choice for feature
extraction. Conversely, RNNs are adept at considering the sequential structure of
characters and the temporal dependencies of strokes in handwritten text. This fusion
of CNNs and RNNs [18] enhances the system’s capability to accurately recognize
alphabets, even in the presence of varying writing styles, underscoring its robustness
and adaptability.
3. Training and Validation: In the training phase, the preprocessed dataset is fed into
the CNN-RNN hybrid model. This enables the model to continually adapt its internal
parameters to grasp and accommodate intricate handwriting patterns. To ensure the
model converges effectively and avoids overfitting, a continuous validation process is
employed, where performance metrics such as accuracy and loss are closely tracked.
Through this iterative and recurrent training process, the model attains a heightened
level of proficiency in the accurate recognition of handwritten alphabets (Fig. 4).
4. Testing and Deployment: Once the algorithm has undergone training and valida-
tion, it becomes operational for real-world applications. This algorithm processes
samples of handwritten alphabets and subsequently generates predictions for the
306 R. Kumar et al.
Fig. 5. Result
References
1. Ding, K., Liu, Z., Jin, L., Zhu, X.: A comparative study of GABOR feature and gradient fea-
ture for handwritten Chinese character recognition. In: International Conference on Wavelet
Analysis and Pattern Recognition, p. 11821186, Beijing, China, 2–4 November 2007
2. Charles, P.K., Harish, V., Swathi, M., Deepthi, C.H.: A review on the various techniques used
for optical character recognition. Int. J. Eng. Res. Appl. 2(1), 659–662 (2012)
3. Bahlmann, C., Haasdonk, B., Burkhardt, H.: Online handwriting recognition with support
vector machines-a kernel approach. In: IEEE Proceedings Eighth International Workshop on
Frontiers in Handwriting Recognition, pp. 49–54 (2002)
308 R. Kumar et al.
4. Neetu, B.: Optical character recognition techniques. Int. J.Adv. Res. Comput. Sci. Softw. Eng.
4(5) (2014)
5. Pradeep, J., Srinivasan, E., Himavathi, S.: Diagonal based feature extraction for handwritten
character recognition system using neural network. In: 3rd IEEE International Conference on
Electronics Computer Technology, vol. 4, pp. 364–368 (2011)
6. Navneet, D., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings
of the CVPR2005 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, vol. 1. San Diego, CA, USA, 20–25 June 2005
7. Simonyan, K., Andrew, Z.: Very deep convolutional networks for large-scale image recogni-
tion. arXiv (2004)
8. Bajaj, R., Dey, L., Chaudhury, S.: Devnagari numeral recognition by combining decision of
multiple connectionist classifiers. Sadhana, part. 1, 27, 59–72 (2002)
9. Lorigo, L.M., Govindaraju, V.: Offline Arabic handwriting recognition: a survey. IEEE Trans.
Pattern Anal. Mach. Intell. 28(5) (2006)
10. Kumar, G., Bhatia, P.K., Banger, I.: Analytical review of preprocessing techniques for offline
handwritten character recognition. In: 2nd International Conference on Emerging Trends in
Engineering & Management, ICETEM (2013)
11. Espana-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martinez, F.: Improving
offline handwritten text recognition with hybrid HMM/ANN models. IEEE Trans. P
12. Brakensiek, A., Rottland, J., Kosmala, A., Rigoll, G.: Offline handwriting recognition using
various hybrid modeling techniques & character N-Grams. http://irs.ub.rug.nl/dbi/4357a8469
5495
13. Kumar, G., Kumar, S.: CNN based handwritten Devanagari digits recognition. Int. J. Comput.
Sci. Eng. 5, 71–74 (2017)
14. Arora, S.: Combining multiple feature extraction techniques for handwritten Devanagari char-
acter recognition. In: IEEE Region 10 Colloquium and the Third ICIIS, Kharagpur, INDIA
(2008)
15. Singh, D., Khan, M. A., Bansal, A., Bansal, N.: An application of SVM in character recognition
with chain code. In: Communication, Control and Intelligent Systems (CCIS), pp. 167–171
(2015)
16. Som, T., Saha, S.: Handwritten character recognition using fuzzy membership function. Int.
J. Emerg. Technol. Sci. Eng. 5(2), 11–15 (2011)
17. Hanmandlu, M., Murthy, O.R.: Fuzzy model based recognition of handwritten numerals.
Pattern Recog. 40, 1840–1854 (2007)
18. Patnaik, S.S., Panda, A.K.: Particle swarm optimization and bacterial foraging optimization
techniques for optimal current harmonic mitigation by employing active power filter applied
computational intelligence and soft computing. 2012, 897127 (2012)
19. Jawad, H., Olivier, P., Jinchang, R., Jianmin, J.: Performance of hidden Markov model and
dynamic Bayesian network classifiers on handwritten Arabic word recognition. Knowl.-Based
Syst. 24, 680–688 (2011)
20. Plamondon, R., Srihari, S.: Online and off-line handwriting recognition: a comprehensive
survey. IEEE Trans. Pattern Anal. Mach. Intell. 22, 63–68 (2000)
Stock Price Prediction Using Time Series
Rahul Maurya(B) , Dashniet Kaur , Ajay Pal Singh , and Shashi Ranjan
1 Introduction
2 Related Work
The following actions are suggested to accomplish the goal: The data must be supplied
into the system first, after which it must be cleaned by making sure that any outliers
are removed, a filtration procedure is carried out, and the data’s null values are rejected.
Every month, information is taken from Yahoo! Finance. Then, ADF test are used for
determining whether the data shows stationarity after being transformed into time series
objects. Then the time series object should be dissected for the observation of various
seasonality variables and patterns in order to acquire correct results, as the outcomes
depend on many factors.
Subsequently, the time series objects are put into a variety of algorithms, including
neural networks, ARIMA, naive forecasting, exponential smoothing, and seasonal naive
forecasting.
3 Proposed Model
3.1 Arima
The Arima model is also known as the Box-Jenkins model, which was first presented by
George Box and Gwilym Jenkins [2]. By fusing the moving average with auto regression
models, the Auto Regressive Integrated Moving Average hybrid model was produced.
The following is its equation. y t = c + 1y (t 1) + + p + 1y (t 1) + + + q + (t) (1)
Apart from that, lagged value of y(t) are held by predictors and lagged errors when the
dependent variable, y t, is differentiable (may be differentiated several times).
This model is known as the ARIMA (p, d, and q) model. Where p and q quantify
the ordering of the Auto Regression and Moving Average parts, and d is the degree of
differentiation involved., respectively [3].
This forecasting method uses weights that are allocated to the historical data in such a
manner where they exponentially decay with time. The most recent ones are put on top,
and as the time factor raises, they begin to deteriorate.
Stock Price Prediction Using Time Series 311
Without making any predictions, the forecast is made based on historical data and is
accomplished using the equation below. T = y (T) | y (T + h). (1) in which (T) is the
prediction based on current data, and (T + h) represents the forecast based on previous
data.
With the exception of the forecast being based on past data from that same season, this is
comparable to the naı¨ve forecasting approach; the following equation is used to produce
seasonal naive forecasting. m(k + 1) = y(T + h)|T = y(T + h)|T (1) where m is the
season.
Forecast based on historical data is denoted by T + h.
This forecasting technique is called NNAR (p, x), where p denotes the lag in input values
and x show the number of hidden layers, and it is utilized in nonlinear and complex
forecasting scenarios (Fig. 1).
Fig. 2. Charting information obtained from Yahoo Finance (the Apple Stock Index)
4.1.6 Dividing the Dataset into Groups for Testing and Training
The data is usually divided into training and testing groups before a model is created.
The model is developed using the training subset, and its performance is assessed using
the test set.
The predictions made by each algorithm are returned after initializing them and feeding
them with Apple stock values.
Stock Price Prediction Using Time Series 315
4.2.1 ARIMA
The ARIMA model’s predicted values may be seen in Fig. 6.
Displaying stock price forecasts from an ARIMA model Displaying in Fig. 7 the
Apple stock price forecasted by the ARIMA model.
Fig. 8. Values estimated using exponential smoothing on stock prices Exponential Smoothing
Stock Price Prediction Visualization Check out Fig. 9 to see how Apple stock values were projected
using exponential smoothing.
Fig. 10. Displays the forecast values generated by the Naive model.
Check out Fig. 11 to see the Apple stock values as forecasted by Nave Forecasting.
Stock Price Prediction Using Time Series 317
Fig. 11. Displaying stock price forecasts made via uninformed forecasting.
Fig. 12. Projected stock price values Seasonally ignorant prediction Visualizing Seasonal Naive
Stock Price Predictions Visualizing Seasonal Naive’s predictions for Apple stock values.
Fig. 13. Using seasonal naive forecasting to visualize stock price predictions
Fig. 14. Neural networks anticipate values for the stock price
The Google search result of the Apple stock price is displayed below in Fig. 17. The
rankings suggest that Exponential Smoothing is on top. It can be seen that the result of
Exponential Smoothing (Hi 80) equalizes the current price of Apple shares.
Exponential smoothing performed better than other models in this case when predicting
the stock price, but performance can vary depending on the situation and the data. When
data is received, it is recommended that you feed it to each of the corresponding models,
compare the outcomes, and then, based on the rankings of the results, choose the most
accurate result. Time series analysis for stock price prediction offers valuable insights
into historical trends and patterns. However, its inherent complexity, influenced by mul-
tifaceted market dynamics, renders precise forecasting challenging. While incorporating
advanced models and machine learning techniques enhances accuracy, unforeseen events
and external factors may still impact outcomes. Investors should approach predictions
with caution, recognizing the inherent uncertainties and potential risks associated with
stock market fluctuations. A comprehensive understanding of both quantitative and qual-
itative factors remains essential for informed decision-making in the dynamic realm of
financial markets. In summary, employing time series analysis for stock price predic-
tion provides valuable historical insights, yet predicting future prices is complex due
to dynamic market factors. Advanced models enhance accuracy, but unforeseen events
can disrupt outcomes. Investors should approach predictions cautiously, acknowledging
inherent uncertainties and market risks. A comprehensive grasp of quantitative and qual-
itative factors remains vital for informed decision-making in the ever-evolving landscape
of financial markets.
References
1. Huamin, T., Qiuqun, D., Shanzhu, X.: Reconstruction of time series with missing value using
2D representation-based denoising autoencoder. J. Syst. Eng. Electron. 31(6), 1087–1096
(2020). https://doi.org/10.23919/JSEE.2020.000081
2. Ariyo, A.A., Adewumi, A.O., Ayo, C.K.: Stock price prediction using the ARIMA model. In:
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation,
pp. 106–112 (2014). https://doi.org/10.1109/UKSim.2014.67
320 R. Maurya et al.
3. Gupta, A., Kumar, A.: Mid Term daily load forecasting using ARIMA, Wavelet- ARIMA
and machine learning. In: 2020 IEEE International Conference on Environment and Electri-
cal Engineering and 2020 IEEE Industrial and Commercial Power Systems Europe (EEEIC
/ I&CPS Europe), pp. 1–5 (2020). https://doi.org/10.1109/EEEIC/ICPSEurope49358.2020.
9160563
4. International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN:
2278–3075, vol. 9(5) (2020). D1869029420/2020©BEIESP, https://doi.org/10.35940/ijitee.
D1869.039520
5. International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN:
2278–3075, vol. 8(9S4) (2019). I11400789S419/19©BEIESP, https://doi.org/10.35940/ijitee.
I1140.0789S419
6. Fama, E.F.: Efficient capital markets: II. J. Financ. 46, 1575–1617 (1991)
7. Atsalakis, G.S., Dimitrakakis, E.M, Zopounidis, C.D.: Elliot wave theory and neuro-fuzzy
systems, stock market prediction: the WASP system. Expert Syst. Appl. 38, 9196–9206 (2011)
8. Mitra, S.K.: Optimal combination of trading rules using neural networks. Int. Bus. Res. 2(1),
86–99 (2009)
9. Dai, W., Shao, Y.E., Lu, C.-J.: Incorporating feature selection method into support vector
regression for stock index forecasting. Neural Comput. Appl.
10. Fama, E.F., French, K.R.: International tests of a five-factor asset pricing model. J. Financ.
Econ. 123, 441–463 (2015)
11. Zhang, J., Li, L., Chen, W.: Predicting stock price using two-stage machine learning
techniques. Comput. Econ. 57, 1237–1261 (2020)
12. Zhang, J., Teng, Y.-F., Chen, W.: Support vector regression with modified firefly algorithm
for stock price forecasting. Appl. Intell. 49, 1658–1674 (2018)
13. Zhao, Y., Yang, G.: Deep learning-based integrated framework for stock price movement
prediction. Appl. Soft Comput. 133, 10992 (2022)
14. Liu, Z., Li, Y., Liu, H.: Fuzzy time-series prediction model based on text features and network
features. Neural Comput. Appl. 35, 3639–3649 (2021)
15. Wang, H., Zhang, Y., Liang, J., Liu, L.: DAFA-BiLSTM: deep autoregression feature aug-
mented bidirectional LSTM network for time series prediction. Neural Netw. 157, 240–256
(2022). [PubMed]
16. Barunı´k, J.; Kocˇenda, E.; Va´cha, L. Asymmetric connectedness on the U.S. stock market:
bad and good volatility spillovers. J. Financ. Mark. 27, 55–78 (2015)
17. Nguyen, V.C., Nguyen, T.T.: Dependence between Chinese stock market and Vietnamese
stock market during the COVID-19 pandemic. Heliyon 8, e11090 (2022)
18. Cao, J., Li, Z., Li, J.: Financial time series forecasting model based on CEEMDAN and LSTM.
Phys. Stat. Mech. Appl. 519, 127–139 (2019)
19. Chen, S., Ge, L.: Exploring the attention mechanism in LSTM-based Hong Kong stock price
movement prediction. Quant. Financ.
20. Wang, C., Chen, Y., Zhang, S., Zhang, Q.: Stock market index prediction using deep
transformer model. Expert Syst. Appl. 208, 118128 (2022)
21. Chen, Y., Wu, J., Wu, Z.: China’s commercial bank stock price prediction using a novel
K-means- LSTM hybrid approach. Expert Syst. Appl. 202, 117370 (2022)
Multi-featured Speech Emotion Recognition
Using Extended Convolutional Neural Network
Arun Kumar Dubey, Yogita Arora, Neha Gupta, Sarita Yadav, Achin Jain(B) ,
and Devansh Verma
Abstract. There has been a significant increase in recent years in the investigation
of emotions expressed via speech signals; this field is known as Speech Emotion
Recognition (SER). SER holds immense potential across various applications and
serves as a pivotal bridge in enhancing Human-Computer Interaction. However,
prevailing challenges such as diminished model accuracy in noisy environments
have posed substantial obstacles in this field. To address the scarcity of robust
data for SER, we adopted data augmentation techniques, encompassing noise
injection, stretching, and pitch modification. Distinguishing our approach from
recent literature, we harnessed multiple audio features, including Mel-Frequency
Cepstral Coefficients (MFCCs), mel spectrograms, zero crossing rate, root mean
square, and chroma. This paper employs Convolutional Neural Networks (CNNs)
as the foundation for emotion classification. The Toronto Emotional Speech Set
(TESS) and the Ryerson Audio-Visual Data-base of Emotional Speech and Song
(RAVDESS) are two well-established datasets that we utilize. The accuracy of
our proposed model on the RAVDESS dataset is 72%, and on the TESS dataset, it
achieves an impressive 96.62%. These results surpass those of extant models that
have been customized for each specific dataset.
1 Introduction
Speech is one of the inherent means by which humans express their emotions. Speech
emotion recognition (SER) is the methodology by which an individual’s emotional con-
dition is deduced from their spoken language. Picard was the first person to study SER as
a model for enhancing human-machine interaction in 1997. Human speech [1] is a rich
source of semantic and personal information that can facilitate healthy communication
between humans or humans and machines. To accomplish this harmony, the emotional
state of a user is extracted from their speech. There is a correlation between the emotion
of speech and its acoustic characteristics, such as timing, voice quality, articulation, and
intonation. The variation between these characteristics forms the foundation of SER.
Speech Emotion Recognition [2] has numerous applications ranging from a diagnostic
tool for therapists for psychological evaluation, in-car board systems to predict the state
of mind of a driver, automatic translation systems to assess the emotional state of a
speaker for improved communication, robots, mobile services, call center applications,
aircraft cockpits, etc. There has been a tremendous increase in data and cost computation
in recent decades. Due to this, the deep learning approach is rapidly being adopted for
improving the quality of speech and emotion recognition. SER systems consist of three
elements:
1. Speech signal acquisition
2. Feature extraction
3. Classification of emotions
The Log-Mel spectrogram, Human Factor Cepstral Coefficients, Mel scale cepstral
analysis, Mel Frequency Cepstral Coefficient, and Short-Term Fourier Transform are
frequently utilized in speech analysis to extract features. The structure of the paper is as
follows. The subsequent segment provides an overview of the existing relevant research
in the domain of SER. The paper’s conceptual and technical framework is elucidated
in Sect. 2. The proposed methodology and model architecture are elaborated upon in
Sect. 3. The investigated data set, experimental particulars, and acquired results are
all encompassed in Sect. 4. To assess the proposed model’s robustness, we conducted a
comparative analysis of our empirical findings with the most recent approaches discussed
in the same section. The conclusion and discussion of prospective developments in SER
are provided in Sect. 5.
Recently, researchers and academics have developed a new interest in emotion recog-
nition. A survey of current research in speech emotion recognition systems was con-
ducted [5], and it discussed the various trends in emotion recognition, the problems
encountered, the classification techniques employed, and the essential design criteria
for emotional speech databases. Either speech [1, 2] or facial expressions [11] or both
can be used to determine an individual’s emotions. In the modern era, heart rate, blood
pressure, skin temperature, and blood volume pulse are also used to detect human emo-
tions. Speech-Emotion Recognition can be implemented in Consumer Electronics Home
Products [1], which analyzes the emotional state of speech and suggests an appropri-
ate method. Using 1D CNN in speaker-independent experiments makes the prediction
model more accurate and optimal for the deployment of in-home assistants. IEMOCAP,
TESS, EMO-DB, and RAVDESS are the standard datasets available on the market for
predicting emotions in speech. These datasets contain a variety of emotions, including
anger, sadness, joy, revulsion, pleasure, fear, etc. Nonetheless, as mentioned in the survey
[5] and summarized below, these datasets have their limitations.
1. The majority of speech emotive databases struggle to accurately and naturally sim-
ulate emotions.In some databases, the quality of the recorded utterances is not so
good.
2. Real-world noises are not considered since the audio are recorded in an artificial
space.
Researchers use various feature extraction techniques, such as pitch, mfcc [3, 4],
speed [6], chromium [7], mel_spectogram [5], etc., to obtain the desired results. In
addition, researchers have utilized various Deep Learning, Machine learning techniques
Multi-featured Speech Emotion Recognition 323
to predict emotions from audio files, such as 1D Convolutional Neural Networks [4],
MLP Classifier [8], SVM [6] Decision Tree [5], Random Forest [10], etc. To improve
the accuracy and robustness of Speech Emotion Recognition on the IEMOCAP and
RAVDESS datasets [4], a method is known as Head Fusion based on the multihead
attention mechanism was used, with respective accuracies of 76.18% and 76.38% using
ACNN. The authors in [6] proposed an architecture that combines hybrid convolutional
neural network (CNN) and feedforward deep neural network (DNN) architectures. In
addition, the output of the proposed hybrid network is inputted into a softmax layer to
generate a probability distribution over categorical classification for speech recognition.
Additionally, efforts have been made to increase recognition accuracy and reduce the
overall model cost and processing time [8]. Using radial basis function (RBF)-based
K-means clustering algorithms and a deep BiLSTM network, it offers a novel approach.
Using the STFT algorithm, a key segment is extracted from the entire cluster and then
converted into spectrograms. The extracted and normalized high-level discriminative
features are then passed to deep BiLSTM for classification. Standard IEMOCAP, EMO-
DB, and RAVDESS datasets are utilized for the evaluation. The system in [9] consists of
two components: a processing unit and a classifier. The processing unit is used to extract
the necessary speech characteristics, and the classifier classifies the emotion based on
these features. In addition, the use of autoencoders for dimensionality reduction and
their effect on classification are covered. The survey results indicate that the conclusions
drawn from different studies lack consistency [5]. The principal factor contributing to
this issue is the investigation of only a single emotive speech database in each study,
coupled with the lack of a database of adequate quality. The relevant works, their feature
extraction methodologies, and classifiers are enumerated in Table 1.
Table 1. Comparison with similar research works for speech emotion recognition:
2 Method Background
Convolutional Neural Network: CNN was inspired by the structure of the human brain.
It functions similarly to the way neurons in the brain process and transmit information.
A convolutional neural network, like a conventional neural network, consists of multiple
layers, but the presence of the convolutional layer and the pooling layer distinguishes it.
CNN contains an input layer, an output layer, and numerous concealed layers. Figure 1
shows the flow of data through the neural network.
2. Pre-emphasis: A filter is used to increase the energy of high frequencies since the
voice segment at high frequencies has a low magnitude compared to voice segments
at lower frequencies. It improves phone detection accuracy.
3. Windowing: slice the audio signal into different segments with each segment having
25 ms width and with the signal at 10ms apart.
4. DFT (Discrete Fourier Transform): To convert the signal from the time domain to
the frequency domain, DFT is applied.
5. Mel-Filter Bank: The mel scale is employed to convert the true frequency into the
frequency that is perceptible to humans.
6. Log: The output of Mel-filter is annotated with log in order to emulate the human
auditory system.
7. IDFT: Inverse of DFT operation is performed.
8. Dynamic features: From each audio signal sample, the MFCC technique will gen-
erate 39 dynamic features that are utilized as input for the speech recognition model.
The comprehensive extraction process is depicted in Fig. 2.
Mel Spectrogram: Higher frequencies are more difficult for humans to distinguish
from lower frequencies. Even for two sets of identically distant sounds, our perception
of distance may vary. A mel scale is a unit of pitch in which equal distances in pitch sound
are perceived to be equal distances by the receiver. Therefore, it assists in simulating the
human perception of sound. Figure 3 depicts this entire procedure. The y-axis (frequency)
was mapped onto the mel scale to create the mel spectrogram. We have obtained mel
spectrograms for different audio signals (including happy, fear, and disgust) and are
shown in Fig. 4.
Zero Crossing Rate: Zero Crossing Rate is a metric utilized to assess the degree of
signal consistency. It can also be expressed as the count of times a signal’s value is
converted from positive to negative or vice versa divided by the duration of the frame. It
326 A. K. Dubey et al.
Fig. 3. Process of obtaining the mel spectrogram from the input audio signal
represents the rate at which a signal transitions from positive to zero to negative or from
negative to zero to positive.
Root Mean Square: It attempts to perceive loudness, which can be used for event
detection. Furthermore, it is much more robust to outliers, meaning if we segment the
audio, we can detect new events (such as a new instrument, someone speaking etc.)
much more reliably. It can be computed for each frame, either from the audio sample y
or from a spectrogram S.
Chroma: Chroma is a tool used for analyzing audio and classifying them into twelve
different classes based on its pitch. Chroma has two main features namely, chroma vector
and chroma deviation. The features explained above are used in our proposed model for
feature extraction.
Librosa: Librosa is one of the python packages used for the analysis of music and audio.
It is helpful in visualizing the audio signal and performs different feature extractions with
Multi-featured Speech Emotion Recognition 327
the help of various signal processing techniques. We have used librosa extensively while
doing feature extraction and data augmentation. The features are extracted from the
audio by librosa functions.
3 Proposed Methodology
The proposed emotion detection approach consists of four stages, as mentioned in the
Fig. 8.
Fig. 5. Performing Data Augmentation on an input audio signal: (a) Original Signal, (b) After
addition of noise, (c) After stretching audio, (d) After changing pitch
4 Result Analysis
4.1 Result Analysis on RAVDESS Dataset
With Convolutional Neural Network, the Accuracy achieved was 72% as shown in
Fig. 7. The train size used for the data is 80% while the test size is 20%. Categori-
cal_crossentropy is used as a loss function, which is designed to quantify the difference
between two probability distributions. Roc_auc_score obtained is 0.8965. The confusion
matrix is given in Fig. 8.
Fig. 7. Accuracy graph obtained from the Fig. 8. Confusion matrix for the RAVDESS
RAVDESS Dataset. Dataset.
330 A. K. Dubey et al.
With Convolutional Neural Network, the Accuracy achieved was 96.62% as shown in
Fig. 9. The train size used for the data is 80% while the test size is 20%. Categor-
ical_crossentropy is used as a loss function. Roc_auc_score obtained is 0.9979. The
confusion matrix is given in Fig. 10.
Fig. 9. Accuracy graph obtained from the TESS Fig. 10. Confusion matrix obtained from
Dataset. the TESS Dataset.
The confusion matrices are shown in Fig. 10. And Fig. 12. For RAVDESS and TESS
datasets respectively and shows the layout that compares the predicted class labels against
the actual class labels over all data instances. For instance, looking at Fig. 12, there were
462 instances where Emotion 0 (Old Fear) was predicted correctly while 10 instances
where it was not. We can see from the confusion matrix that emotions are predicted
accurately mostly.
Our proposed approach has been compared with the best existing models for Speech
Emotion recognition in Table II. In Table II, a comparison between the feature extraction
techniques, classifiers used, and the accuracy achieved in different models has been
done. The probable reason for achieving the best accuracy is that we have applied data
augmentation techniques like adding noise, stretch and pitch to make our data more
natural and to triple the size of data. Features like MFCCs have always helped to acquire
better results previously [1]. The works mentioned in Table 2 use only one or two feature
extraction techniques. However, this work has used a total of five features (mfcc, mel
spectrogram, chrome, root mean square, and zero crossing rate) for feature extraction.
Furthermore, the usage of 1D CNN classifier has led to a better result and provided
superior accuracy.
Multi-featured Speech Emotion Recognition 331
Table 2. Comparison of the best speech emotion recognition models with the proposed approach
5 Conclusion
References
1. Chatterjee, R., Mazumdar, S., Sherratt, R.S., Halder, R., Maitra, T., Giri, D.: Real-time speech
emotion analysis for smart home assistants. IEEE Trans. Consum. Electron. 67(1), 68–76
(2021). https://doi.org/10.1109/TCE.2021.3056421
2. Venkataramanan, K., Rajamohan, H.R.: Emotion recognition from speech (2019). arXiv:
1912.1045
3. Parry, J., et al.: Analysis of deep learning architectures for cross-corpus speech emotion
recognition. In: Proceedings of the Conference of the International Speech Communication
Association (Interspeech), pp. 1656–1660 (2019). https://doi.org/10.21437/Interspeech.2019-
2753
4. Xu, M., Zhang, F., Zhang, W.: Head fusion: improving the accuracy and robustness of speech
emotion recognition on the IEMOCAP and RAVDESS dataset. IEEE Access 9, 74539–74549
(2021). https://doi.org/10.1109/ACCESS.2021.3067460
5. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features,
classification schemes, and databases. https://doi.org/10.1016/j.patcog.2010.09.020
332 A. K. Dubey et al.
6. Ezz-Eldin, M., Khalaf, A.A.M., Hamed, H.F.A., Hussein, A.I.: Efficient feature-aware hybrid
model of deep learning architectures for speech emotion recognition. IEEE Access 9, 19999–
20011 (2021). https://doi.org/10.1109/ACCESS.2021.3054345
7. Kanwal, S., Asghar, S.: Speech emotion recognition using clustering based GA-optimized
feature set. IEEE Access 9, 125830–125842 (2021). https://doi.org/10.1109/ACCESS.2021.
3111659
8. Mustaqeem, Sajjad, M., Kwon, S.: Clustering-based speech emotion recognition by incorpo-
rating learned features and deep BiLSTM. IEEE Access 8, 79861–79875 (2020). https://doi.
org/10.1109/ACCESS.2020.2990405
9. Patel, N., Patel, S., Mankad, S.H.: Impact of autoencoder based compact representation on
emotion detection from audio. J. Ambient Intell. Hum. Comput. 13, 867–885 (2021). https://
doi.org/10.1007/s12652-021-02979-3
10. Sonmez, Y.Ü., Varol, A.: New trends in speech emotion recognition. In: 2019 7th International
Symposium on Digital Forensics and Security (ISDFS), pp. 1–7 (2019). https://doi.org/10.
1109/ISDFS.2019.875752
11. Castellano, G., Kessous, L., Caridakis, G.: Emotion recognition through multiple modalities:
face, body gesture, speech. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-
Computer Interaction. LNCS, vol. 4868, pp. 92–103. Springer, Heidelberg (2008). https://
doi.org/10.1007/978-3-540-85099-1_8
12. Shaqra, F.A., Duwairi, R., Al-Ayyoub, M.: Recognizing emotion from speech based on age
and gender using hierarchical models. Procedia Comput. Sci. 151, 37–44 (2019). https://doi.
org/10.1016/j.procs.2019.04.009
Large Language Models for Search Engine
Optimization in E-commerce
Abstract. The paper discusses how Large Language Models (LLMs) can be used
in search engine optimization activities dedicated to e-commerce. In the first part
the most important Search Engine Optimization (SEO) issues are discussed, such
as technical SEO aspects, keyword selection, and content optimization. Then the
study presents an in-depth look at OpenAI’s advancements, including ChatGPT
and DALL-E. The latter sections describe the capabilities of Large Language
Models into the realm of SEO, particularly in e-commerce. Firstly, a set of prompts
for LLMs that can be used to create content and HTML code for online shops is
proposed. Then advantages, and drawbacks of incorporating LLMs in SEO for
e-commerce are presented. The research concludes by synthesizing the potential
of merging AI with SEO practices, offering insights for future applications.
1 Introduction
The increasing competition in e-commerce sector necessitates advanced tools and tech-
niques to improve rank in search engine results. A high ranking in the search engine
results page (SERP) not only guarantees an increase in organic website traffic [5], but
also reduces spending on paid advertising. The explosive growth of Large Language
Models (LLMs) in the last two years [19] has resulted in the increasing use of this tool
in SEO activities. LLMs not only allow the generation of large amounts of content but
can also support other elements of SEO.
There are several studies that summarizes the previous work in the domain. One of
them examines how ChatGPT can affect search marketing and SEO [33]. Another study
compares ChatGPT to Spyfu as a keyword discovery tool, revealing that while Spyfu
outperforms in discovering efficient keywords, ChatGPT excels in predicting search
trends and identifying specific, user-targeted long-tail keywords and queries [34].
The main aim of this paper is to propose methods of applying LLMs into SEO activi-
ties in e-commerce and investigate the advantages and disadvantages of it. The structure
of the paper is the following: Sect. 2 provides a brief overview of SEO in the context of
content optimization. Section 3 focuses on the impact of Artificial Intelligence on content
creation. Here, innovations by OpenAI, like ChatGPT and DALLE, are spotlighted. Our
core examination in Sect. 4 revolves around the infusion of LLMs in SEO, proposing a
set of prompts for e-commerce. It also analyses benefits, and potential pitfalls within the
application of LLMs in SEO. Section 5 shortly summarizes our insights and proposes
directions for further research in this area.
Fig. 1. Organic CTR for Google not including ads - Source: own elaboration based on data from
Backlinko [5].
in SERP may reduce the spendings on paid ads because in the case of paid advertising
in the Google Ads system, one of the criteria affecting the fee charged is the quality
indicator, of which the quality of the landing page is an element.
Long Tail theory originally described a market phenomenon where a wide range of
niche products can generate a similar return on sales as a relatively narrow range of
very popular products [6]. In the context of SEO, the term refers to longer and specific
keyword phrases that are less frequently used but have higher conversion rates [7] - a
percentage indicator showing the number of users who performed further actions on a
site after clicking on a search result. Long Tail phrases can generate about 70% of traffic
to a site [8], so their issue is important to discuss. For a more in-depth discussion of
key phrases, mention should be made of phrases like Fat Head and The Chunky Middle.
Long Tail in SEO are key phrases consisting of at least 3 words, characterized by higher
accuracy and lower popularity [1]. Fat Head are short key phrases consisting of one or
two words, generating more traffic [7]. The Chunky Middle are phrases that are more
detailed and descriptive than Fat Head, with less interest but still popular [7]. Long Tail
phrases are important for SEO, allowing to target a site to specific demographic groups
[1]. The use of Long Tail phrases is crucial for effective SEO of a site.
336 G. Chodak and K. Błażyczek
The technical side of a website is crucial for SEO, so its most important elements will
be briefly discussed. Unique title is one of most crucial element of the website which
can improve its ranking in search results [9]. The title should be placed in the <title>
tag in the <head> section and is visible to both search engines and users [9]. It is meta-
information about a page that directly affects a site’s ranking [10]. It is recommended that
the title of the homepage include the name of the company or site, important information
such as location or main areas of activity [9]. It should be both descriptive, unique, and
short, limited to 65 characters, so that it is still relevant to Google or other search engines
[10].
Another important tag is the metatag description. It has two main uses. The first use
is to accurately and concisely describe the content of a given page, and the second is to
encourage the user to click on the page on the SERP [10]. The content of the description
should be truthful, should not exceed 160 characters, should encourage the user to click
on the page, and should include keywords that are also in the title [10]. The description
of the page is given via the <meta> tag with the name property set to the description
value. The content should be placed in the <head> tag in the same way as the <title>.
The last metatag is keywords, containing a list of keywords relevant to the site.
Selecting the right keywords is significant for a successful SEO strategy. There are many
methods and tools that can help with keyword selection.
The first thing to determine is clearly outline the scope of the business [11]. The next
step is to brainstorm for keywords and phrases [10]. At this stage, it’s a good idea to
collect as many of them as possible [11]. Once words and phrases were collected, it is
a need to organize them from the most general to the most specific. One such method is
competitive research [11], by checking the top-ranked sites and including the keywords
of those sites in the previously created set.
To find better key phrases, the trend analysis can be conducted [10]. Tracking them
allows understanding what phrases are currently popular among users and how their
preferences are changing, and with it can tailor website content to meet the current
needs and interests of customers. A popular tool for studying trends is Google Trends2 .
Once the key phrases have been collected, one can proceed to select the best ones.
There are many tools that can help with this [11]. One of the most popular is Google’s
Keyword Planner3 , which offers a lot of information about key phrases, such as average
monthly searches, competitiveness of phrases and keyword suggestions.
An alternative method, which will be discussed in the 4th section, is to use LLMs to
generate and select the best keywords for a particular business.
2 https://trends.google.pl/trends/.
3 https://ads.google.com/home/tools/keyword-planner/. To use the tool, Google Ads account is
required.
Large Language Models for Search Engine Optimization 337
The recommended starting point when planning content is to analyze and select the best
keywords and phrases for the site and its individual pages [10]. Once the selection has
been made, it will be possible to create content so that it incorporates the words and
phrases selected earlier.
One of the most important things is that the content placed on the sites should be
of high quality that will engage users and be appropriately tailored to the length of the
site [2]. Preparing high-quality content requires understanding the target audience and
how and what they are looking for [10]. The quality of the site is influenced by: original
and unique content, clean structure, formatting, readability of content or having CTA
elements4 . It is advisable to use interactive multimedia, such as videos, infographics,
and, if possible, integrate content with social media. Through this, users are encouraged
to stay on the site and interact with it. it is necessary to update content regularly, as fresh
content is important for effective SEO [2].
Search engines place great importance on the uniqueness of content on pages [10].
There is important to avoid thin content, which refers to pages that do not provide enough
unique content to warrant inclusion in search results [12]. It is also important that the
content be different from other pages on the site, and that it have unique titles and meta
descriptions [10]. Unique video, audio and graphic content is also valued by search
engines [10].
3.1 OpenAI
OpenAI is an artificial intelligence research lab in San Francisco, founded in 2015 [13].
Its goal is to develop AI technology for the benefit of humanity [14] particularly in the
context of developing general artificial intelligence, that is, the ability of a computer to
perform tasks as a human would [15]. It does this through research, platform, and services
[13]. The public and the technology industry are showing great interest in OpenAI [13].
The best-known tools from OpenAI are ChatGPT and DALL E [16].
3.2 ChatGPT
Transformers model, or more specifically, its versions: GPT-1, GPT2, GPT-3, GPT-3.5
and GPT-4 [18]. At the current time5 , ChatGPT allows operation on models: GPT-3.5
and GPT-4. These models analyze context and predict the next words in a response
[16]. Although there have been reports of GPT-5, it is currently untrained, and GPT-4 is
powerful enough to meet needs for many years to come [12].
ChatGPT uses a multi-layered transformer network to generate answers to users’
questions [19]. A transformer is a type of neural network architecture [16]. The main
advantage of transformers is effective parallelism, which allows large models to be scaled
and trained [16]. Transformers allow the tool to understand the context and meaning of
ambiguous words based on the surrounding words in a sentence [19].
The tool’s model was trained on a huge set of texts collected from the internet from
2021, covering about 570 GB of data, including websites and books [16]. The model for
ChatGPT also underwent additional training based on reinforcement learning using the
RLHF6 method [19]. In this method, the model is given various tasks, and the results
are evaluated by humans. The evaluation is transformed into a score, which in turn is
fed back into the training process [19].
ChatGPT provides a free version, based on the GPT-3.5 generative model, and a paid
version7 based on the GPT-4 model. The paid version is more advanced, has earlier access
to new features, faster response times, and has priority in handling queries during heavy
load [16]. GPT-4 is an improved version of GPT-3.5. The latter model allows 175 billion
parameters to be included depending on the question received, while the former allows
at least 3 or 4 times as many [19]. This number of GPT-4 parameters allows it to answer
questions in a more human-like style than GPT-3 [20]. The GPT-4 model was trained
for an additional 6 months beyond the GPT-3.5 training, considering information from
humans and artificial intelligence, resulting in significant improvements [20]. In addition,
GPT-4 is trained on newer data, considering information as far back as September 2021,
making its answers more up to date than GPT-3.5, which only had data through June 2021
[20]. With increased parameters, GPT-4 is less prone to hallucinations than GPT-3.5,
but caution should still be exercised when using it [20].
the latent space [24]. Then, a decoder processes these abstract representations from the
hidden space and transforms them into images [24]. The hidden variables determine the
features of the image, such as shape, colour, or texture. Manipulation of this latent space
allows users to create images with different features and styles.
It is worth mentioning that AI-generated images are currently not protected by copy-
right in the US and several other countries, so they can be used freely without fear of
legal consequences, just as with ChatGPT [16]. This is particularly relevant for online
shops, where DALL E can be used to generate creative backgrounds for products.
Communication and issuing of LLMs commands is done employing prompts. The new
field of knowledge that deals with building prompts to use LLMs for specific purposes
is called prompt engineering [28]. On the one hand, it is based on trial and error research
and the undertaking of numerous experiments. On the other hand, prompt engineering
is about optimizing queries to LLMs based on knowledge of their performance and
learning processes [27].
Prompts for SEO in an online shop can be divided into the following categories: key-
word research and analysis, content creation, technical SEO. This division shows the dif-
ferent areas of SEO, but it is important to remember that these areas are not disconnected.
Prompts can also take a comprehensive form covering all the areas mentioned.
Examples of prompts that can be deprecated are provided for each aforementioned
category. Parameters of prompts are given in brackets.
What are the primary keywords associated with the [shop’s main products]?
Which long-tail keywords can be targeted to capture [e-shop sector]?
What are the most important keywords for product [product name]?
Content Creation
Generate the [number of words] description of the product [name of product]? The
description should include the following keywords: [keyword1, keyword2, …, keywordN]
in style [style definition].
340 G. Chodak and K. Błażyczek
Rephrase the description of the product [product name] which is the following [text
of product description]? The description should include the following keywords:
[keyword1, keyword2, …, keywordN] in style [style definition].
Generate [number of words] text concerning product [product name] including keywords
[keyword1, keyword2, …, keywordN] to the online store blog?
Generate [number of words] text concerning product [product name] including keywords
[keyword1, keyword2, …, keywordN] to the Facebook post concerning online store [name
of the store]?
If the internet shop software allows descriptions to be inserted in the form of HTML
code, LLMs can generate ready-made code containing SEO-relevant elements. Prompts
may look like the following:
Generate the product description [Product name] to internet shop [Internet Shop name]
in HTML format which include heading tag <H1> with keywords [keyword1, keyword2,
…, keywordN] and two heading tags <H2> with keywords [keyword1, keyword2, …,
keywordN].
Rephrase the product description [Product name] which is the following [text of product
description] to internet shop [Internet Shop name] in HTML format which include
heading tag <H1> with keywords [keyword1, keyword2, …, keywordN] and two heading
tags <H2> with keywords [keyword1, keyword2, …, keywordN].
The list of keywords can be provided by the user, but also suggested by LLM.
Technical SEO
Create Title tag of no more than 65 characters for product [product name] with the
following description [text of product description].
Create Description metatag of no more than 160 characters for product [product name]
with the following description [text of product description].
Create Keywords metatag of no more than 100 characters for product [product name]
with the following description [text of product description].
Comprehensive Prompt
Generate the product description [product name] to internet shop [Internet Shop name]
in HTML format which include heading tag <H1> with relevant keywords and two
heading tags <H2> with relevant keywords. The HTML code should include Title tag
of no more than 65 characters and Description metatag of no more than 160 characters
and Keywords metatag of no more than 100 characters.
Result of prompting ChatGPT 4.0 with the comprehensive prompt presented above
for product TV QLED 55’ and fictitious internet shop ShopwithTVs.com. is presented
below:
Large Language Models for Search Engine Optimization 341
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>ShopwithTVs.com - QLED 55" Ultra HD Smart TV</title>
<meta name="description" content="Experience unmatched clarity with the 55-
inch QLED Ultra HD Smart TV on ShopwithTVs.com. Dive into a world of vibrant
colours and immersive visuals.">
<meta name="keywords" content="QLED, 55-inch, TV, Ultra HD, Smart TV, Shop-
withTVs">
</head>
<body>
<h1>QLED 55" Ultra HD Smart TV on ShopwithTVs.com</h1>
<h2>QLED Display Technology</h2>
<p>The QLED technology offers superior picture quality, presenting vivid colours and
clear details that bring every scene to life. Powered by Quantum Dot technology, enjoy
breathtaking visuals like never before.</p>
<h2>Smart Features for the Modern Viewer</h2>
<p>With the integrated smart hub, seamlessly stream your favourite shows, movies,
and online content. Connect with leading streaming platforms, control your TV through
voice commands, and enhance your viewing experience.</p>
</body>
</html>
5 Conclusion
Using artificial intelligence-generated content brings many advantages. First, it accel-
erates the process of creating and publishing content, which is both time-saving and
cost-effective [25]. AI also enables personalization of content and optimization for SEO,
which can significantly impact the effectiveness of marketing and communications. In
addition, LLMs can create content in multiple languages, allowing for global content
customization. Another important advantage is overcoming the problem of content cre-
ators’ lack of inspiration and preventing misinformation, which minimizes the risk of
costly lawsuits or negative impact on a company’s reputation in the event of business
irregularities [25]. The use of LLMs in content generation also comes with some lim-
itations. First and foremost, the content generated may be characterized by a lack of
Large Language Models for Search Engine Optimization 343
References
1. Ledford, J.: Search Engine Optimization Bible, 2nd edn. Wiley, Indianapolis (2009)
2. Shenoy, A., Prabhu, A.: Introducing SEO: Your Quick-start Guide to Effective SEO Practices,
1st edn. Apress, Govardhan Nagar (2016)
3. Enge, E., Spencer, S., Stricchiola, J.: The Art of SEO, 3rd edn. O’Reilly Media Inc, Sebastopol
(2016)
4. Marketbrew.ai, Understanding Query Parsers: How Search Engines Process Your
Searches. https://marketbrew.ai/understanding-query-parsers-how-search-engines-process-
your-searches. Accessed 28 Oct 2023
5. Backlinko, We analyzed 4 million Google Search Results. Here’s what we learned about
organic CTR. https://backlinko.com/google-ctr-stats. Accessed 28 Oct 2023
6. Anderson, C.: The Long Tail: Why the Future of Business is Selling Less of More, 1st edn.
Hyperion, New York (2006)
7. Ahrefs, Long-tail Keywords: What They Are and How to Get Search Traffic From Them.
https://ahrefs.com/blog/long-tail-keywords/. Accessed 28 Oct 2023
8. Mangools, How to find long-tail keywords (and why they’re important). https://mangools.
com/blog/long-tail-keywords/. Accessed 28 Oct 2023
9. Google Search Central, Search Engine Optimization (SEO) Starter Guide. https://developers.
google.com/search/docs/fundamentals/seo-starter-guide?hl=en. Accessed 28 Oct 2023
10. Enge, E., Spencer, S., Stricchiola, J.: The Art of SEO, 4th edn. O’Reilly Media Inc, Sebastopol
(2023)
11. Clay, B., Jones, K.: Search Engine Optimization All-in-One For Dummies, 4 (edn.). For
Dummies, Hoboken (2022)
12. Kent, P.: SEO for Dummies, 7th edn. For Dummies, New Jersey (2020)
13. Joel, T., Venkataraman, S.: A review about Artificial Intelligence (AI) with reference to open
AI. Int. J. Res. Trends Innov. 8(5), 335–337 (2023)
14. OpenAI, About. https://openai.com/about. Accessed 28 Oct 2023
15. OpenAI, Planning for AGI and beyond. https://openai.com/blog/planning-for-agi-and-
beyond. Accessed 28 Oct 2023
16. Baker, P.: ChatGPT For Dummies,1 (edn.). For Dummies, Hoboken (2023)
17. Alto, V.: Modern Generative AI with ChatGPT and OpenAI Models, 1st edn. Packt Publishing,
Birmingham (2023)
18. Caelen, O., Blete, M.: Developing Apps with GPT-4 and ChatGPT, 1st edn. O’Reilly Media
Inc, Sebastopol (2023)
19. Loukides, M.: What Are ChatGPT and Its Friends?, 1st edn. O’Reilly Media Inc, Sebastopol
(2023)
344 G. Chodak and K. Błażyczek
20. Digital Trends, GPT-4 vs. GPT-3.5: how much difference is there? https://www.digitaltrends.
com/computing/gpt-4-vs-gpt-35. Accessed 28 Oct 2023
21. Phoenix, J., Taylor, M.: Prompt Engineering for Generative AI (Early Release, Raw &
Unedited), 1st edn. O’Reilly Media Inc, Sebastopol (2024)
22. OpenAI, DALL·E: Creating images from text. https://openai.com/research/dall-e. Accessed
28 Oct 2023
23. Cointelegraph, What is DALL-E, and how does it work? https://cointelegraph.com/news/
what-is-dall-e-and-how-does-it-work. Accessed 28 Oct 2023
24. AssemblyAi, How DALL-E 2 Actually Works. https://www.assemblyai.com/blog/how-dall-
e-2-actually-works/. Accessed 28 Oct 2023
25. HubSpot, The Pros and Cons of AI-Generated Content. https://blog.hubspot.com/marketing/
ai-generated-content. Accessed 28 Oct 2023
26. Wu, J., Gan, W., Chen, Z., Wan, S., Lin, H.: AI-generated content (AIGC): a survey. (2023).
arXiv preprint arXiv:2304.06632
27. Spasić, A.J., Janković, D.S.: Using ChatGPT standard prompt engineering techniques in
lesson preparation: role, instructions and seed-word prompts. In: 2023 58th International
Scientific Conference on Information, Communication and Energy Systems and Technologies
(ICEST), pp. 47–50. IEEE (2023)
28. Kocoń, J., et al.: ChatGPT: jack of all trades, master of none. Inf. Fusion, 101861 (2023)
29. Alkaissi, H., McFarlane, S.I.: Artificial hallucinations in ChatGPT: implications in scientific
writing. Cureus 15(2) (2023)
30. Alawida, M., Mejri, S., Mehmood, A., Chikhaoui, B., Isaac Abiodun, O.: A comprehensive
study of ChatGPT: advancements, limitations, and ethical considerations in natural language
processing and cybersecurity. Information 14(8), 462 (2023)
31. Dubey, P., Ghode, S., Sambhare, P., Vairagade, R.: A discussion with illustrations on world
changing ChatGPT–an open AI tool. In: Multimedia Data Processing and Computing,
pp. 135–153. CRC Press
32. Haleem, A., Javaid, M., Singh, R.P.: An era of ChatGPT as a significant futuristic support
tool: a study on features, abilities, and challenges. Bench Counc. Trans. Benchmarks Stand.
Eval. 2(4), 100089 (2022)
33. Cutler, K.: ChatGPT and search engine optimisation: the future is here. Appl. Market. Anal.
Peer-Rev. J. 9(1), 8–22 (2023)
34. Jiang, P.: Discovering efficient keywords – an exploratory study on comparing the use of
ChatGPT and other third-party tools. J. Emerg. Trends Market. Manag. 1(2), 40–45 (2023)
Handwritten Equation Solver: A Game-Changer
in Mathematical Problem Solving
Anmol Gupta(B) , Disha Mohini Pathak, Rohit Sharma, and Somya Srivastava
1 Introduction
Solving an equation accurately isn’t an easy task and someone today or tomorrow gets
stuck in solving a problem. The main problem behind this is the lack of availability of
resources when needed. The age-old tradition of asking a teacher, friend, family member,
etc., is comparatively slower as everyone can’t be available at every times.
Recognition of handwriting is a tough task, sometimes even your acquaintances
aren’t able to figure out what one has written, then, where technology comes into play.
It could be done easily using Convolutional Neural Network (CNN) Algorithms. In
this paper, OpenCV library is used for pre-processing of our image, then, dividing the
dataset the algorithms to solve the equations. The strategy is to calculate the equations
recognizing arithmetic symbols and numbers and performing the desired operations to
give the result. In this paper, Kaggle’s Dataset has been used to process the images.
Apart from that, certain handwritten equations have been taken with various humans,
including us to test the accuracy of the system.
The accuracy of the system is the most important factor in the project because a
student learns what he sees. If, by any means, the accuracy is less then, the system
will be generating faulty outputs and it will hamper the growth of the child and also
contributes towards failure of the project. All the equations will be scanned that are
available in the dataset and then trying to solve with ensuring the accuracy.
Talking about the proposed technique that this paper is using is dependent on the
various algorithms of classification and segmentation and its aim is to provide a detailed
analysis of the multiple algorithms to find out which of the algorithms is the best bet to
rely on in this situation. K-means algorithm is being used to do the segmentation and
K-nearest neighbors (KNN) and SVM to do the classification technique from all of the
different techniques.
The problem has a significant impact in today’s world, the reason being humans
are somewhere being dependent on technology to ease our everyday lives. It’s not easy
to have a manual support throughout, but with this technology it makes it feasible for
learners to work as per their convenience. Students aren’t able to reach out to help at all
the times, but our technology can be accessed at all times.
There are already various available solutions in the market, but the problem with
the other solutions is the lack of accuracy and a lack of a comparative study as they
are dependent on a single or two algorithms without comparing the others and directly
relying on a single strategy.
In this paper, the concepts of segmentation and classification have been used in
solving the problem. There are various segmentation and classification techniques that
are being used over here.
1.1 Segmentation
Classification of images by dividing the image into multiple parts is referred as segmen-
tation. It aims to identify the important information and the objects which are present
in any image. It aims towards the segmentation of equation line as well as the character
segmentation.
Types of Segmentation
1. CCA: It refers to Canonical Correlation Analysis which finds the linear relationship
between any two given variables and segments and clusters the data.
2. Mean Shift: It is defined as a clustering algorithm that is used to shift the density of
the kernel to the region of higher density.
3. K-Means: This algorithm is used to cluster a dataset into multiple(k) different clusters.
It minimizes the sum of the squared distance between the data points along with their
assigned clusters.
4. Spectral Clustering: A similarity matrix is derived from the data and the eigen-values
as well as the eigen-vectors of that data are used. It is beneficial when dealing with
non-linear or graph-like data.
Handwritten Equation Solver 347
5. Graph cuts: It uses the data points as the nodes of a graph to find similarity measures
between them. The cluster should over and all be connected and it is required to
partition the graph in such a way that edges have the least weight whenever they are
selected.
6. Deep Learning: The automatic learning from the datasets. The raw data can be used
to process complex work.
1.2 Classification
2 Related Work
Various works have been done in the past regarding the Handwritten Equation Solving
system and after studying those papers and the research work which has been done in
this field by other people, it could be find out that a lot of important algorithms and a
certain gap which was present in each of the research paper that have been studied. A
total of 18 research papers have been studied in which the research work that was done
in the previous time on or was related to the problem statement that’s being discussed
in this paper. It could be found out that over and all about 12 research papers have
similar work to what has been done in this paper. There were various references which
were already available for moving ahead with the project. The importance technology
has developed nowadays in everyday lives can’t be overlooked. There is no room for
question about the powers of Machine Learning and Artificial Intelligence, and it can’t
348 A. Gupta et al.
be neglect what future it holds. There’s a lot of potential that has been opened in the
market by the field of deep learning. It is also a fact that with passing times, the work
is becoming more and more automated. The solution to any given problem should be
available with a single click. The power of internet and image processing isn’t debatable.
Everything is becoming more and more technology dependent. The power of human
brain is well-known, but there’s quite an unawareness of the powers and the capabilities
of the computer’s brain. Nowadays, EdTech is blooming and with the rise of EdTech
there’s a rise of opportunities in the fields that technology has to offer. Even if there’s a
need to solve a mathematical equation, it could be solved by just clicking a picture of
the problem and then, it could be solved with the blink of an eye. This is not so easy
as it seems to be, it requires a lot of operations and background processing which many
aren’t aware of. The applications of machine learning and deep learning can be seen,
but it requires a fixed procedure and a usage of multiple algorithms to process an image,
segment the individual characters of varying handwritings and processing them to give
output of any given problem that has been raised. Multiple works have already been
done on the same field which include the likes of Catherine Lu and Karan veer Mohan
[1] who used the powers of Convolutional Neural Networks (CNN) to recognize the
handwritten mathematical expressions that were written online and they processed the
equations over there and then it could be seen that the desired output after the proper
image processing is complete. [2] was an attempt on the similar thing where it was aimed
to recognize the handwritten mathematical expressions and to distinguish between the
various characters. [3] aimed at classifying images based on various image processing
algorithms. A colored-image was converted into a binary image and was classified using
a similar approach. [4] worked on the recognition and segmentation of characters on a
license plate using the concepts of horizontal projection and linked component analysis.
Horizontal projection is a method where any object is thrown at a certain angle other than
90, then a certain trajectory is formed by that object and the type of motion is known
as projectile motion. Henceforth, it was named as the horizontal projection. Linked
component analysis is where two or more components are connected with each other
and the study of them individually as well as in group as a whole is done.
There are many mathematical symbols and every symbol is unique and performs a
different function. [5] aimed at segmenting the various mathematical symbols that are
there and to unique identify them for processing purposes. [6] focused on ImageNet
i.e., a database of images or where there are multiple images that are to be processed
by the power of image processing. It worked on the concept of deep CNNs which
are feedforward neural networks and applied back propagation algorithms that works
on the adjustment of weights and biases so that the cost function associated with the
network can be reduced. Mathematical Formulas are the most important things when
seen in mathematical perspectives. Being uniquely, able to identify them and solving
them accurately is a task on its own. [8] worked on developing an improved algorithm
to segment various symbols from the images containing mathematical formula. It aimed
at the demonstration of the deep network architecture in the field of image classification
Handwritten Equation Solver 349
where various images were classified and differentiated from one another. [9] worked on
using Support Vector Machine (SVM) and projection histogram for the recognition of any
mathematical expression. SVM is a supervised machine learning algorithm where data
is plotted on a graph against various features that the data has. Projection Histogram
is a method of image projection where an image is projected either in horizontal or
vertical direction. [10] motive was to develop adequate strategies and methods that could
help in achieving the goal of character segmentation. Whenever there are connected
components that they are required to be segmented accurately and that was the primary
aim of this paper to find the success rate in character segmentation of the connected
components. Different people have different handwritings and the way a person draws a
certain character also varies on different persons. The aim of [11] was to bridge this gap.
It aimed at distinguishing various handwritings uniquely and to process the mathematical
expressions that were written in that handwriting. [12] aimed at processing of images in
a system using the concepts of deep learning and neural information processing. Deep
learning is a multi-layer machine learning algorithm that simulates the power of human
brain to the machine and the computer processes raw data and learn from a wide volume
of data. Tracking data and certain features or objects in an image is a tedious process and
when there is a large amount of image data it becomes even more boring. [13] aimed
at limiting this limitation. The tracking of points and objects could be done easily in an
image using the methodology that was proposed in that paper. Whenever there is a need
to learn anything or processing anything there is something which is often outlooked
and that’s patterns. [14] aimed at bridging this gap and aimed at extracting out various
patterns that are present in our data. The usage of back propagation and Artificial Neural
Network (ANN) algorithm was done in [15] where back propagation as the name justifies
is a process of moving back from the output nodes to the input nodes in order to test
and detect any errors that might have occurred in the presentation. Artificial Neural
Network is a connection of connected components or nodes just like an animal brain
and processing the data on those grounds. The human eyes can see a single image but
that’s not the reality. An image isn’t any single-coloured structure but rather a group of
various pixels that are combined to make image looks like an image. A computer can
process only pixelated image and [16] solved this problem. It pixelated the image into
various pixels via the use of deep convolutional network. Overfitting is a problem which
mustn’t be overlooked and it means when a large number of data points are being shown
very close or in a cluster in our graph. [17] aimed at solving the problem of overfitting
using the concepts of neural networks. [18] aimed at the usage of multi-columnar deep
neural networks to classify various images that our present in any dataset.
This is what could be inferred from studying various research papers (Table 1).
350 A. Gupta et al.
Table 1. Summary
Table 1. (continued)
In this paper, we have added the abstract and introduction in the beginning where we
have given a brief overview of the project as well as the literature review and the study
which we have done for the paper.
The next section of the paper covers proposed methodology where we have given
the workflow diagram and the details regarding the procedure which we are following
throughout the paper.
We have also given the result analysis towards the end of the paper by comparing
the accuracies of various segmentation and classification techniques that we have used.
352 A. Gupta et al.
3 Proposed Methodology
The method deals with scanning of an input image which is further pre-processed to
convert it into a binary image and then extracting out the features to remove the noise
in the image, if any. Then, multiple segmentation techniques are being applied to find
out the accuracy in all the techniques and the most accurate technique, i.e., K-means
in this particular case is chosen. After the image has been segmented, there is a usage
of Convolutional Neural Network (CNN) algorithm to extract out the features from the
image. Similar to the segmentation techniques, multiple classification techniques have
been applied to draw out a conclusion that K-Nearest Neighbors (KNN) and Support
Vector Machine (SVM) gives the highest accuracy in this particular use case. After this,
the symbols are aggregated to form a mathematical equation and that equation is solved
to give the desired output (Fig. 1).
4 Implementation
4.1 Dataset Preparation
The very first and the most integral step is dataset preparation. Characters, operations,
mathematical symbols, digits, etc., are defined and the dataset is accurately defined.
4.2 Pre-Processing
The changes in the input image by certain modifications, so that, the image is per-
fect for the recognition purposes. There are various techniques which deals with such
modification of the images and that are listed below.
Conversion of RGB Image to Gray-Scale. The bitmap associated with gray-scale
image is Y and with RGB image there are three bitmaps associated and that are- R,
G and B. The detection of colors is much easier on gray-scale image as compared to
colored image. The conversion takes place and is done with the help of a single matrix.
The equation following this conversion is:
Y = 0.299R + 0.587G + 0.114B (1)
Handwritten Equation Solver 353
Binarization. The images are converted into pixel data of 0 s and 1 s where 1 represents
the black pixels whereas, the white pixels are demonstrated by 0 s.
Noise Reduction. The unnecessary pixels that are present in an image are referred as
noise. Whenever there’s a Gaussian noise, the aim is to remove it from the image. While,
salt and pepper noise are often avoided because it doesn’t cause much disturbance in the
system (Fig. 2).
4.3 Segmentation
Classification of images by dividing the image into multiple parts is referred as segmen-
tation. It aims to identify the important information and the objects which are present
in any image. It aims towards the segmentation of equation line as well as the character
segmentation.
Equation Line Segmentation. The multiple equations that are present in the input
image are separated from each other (Fig. 3).
5 Result Analysis
The accuracy of the various segmentation techniques is (Fig. 5 and Table 2).
Technique Accuracy
CCA 57.429
Mean-Shift 90.429
K-Means 92.714
Spectral Clustering 91.629
Graph Cuts 91.743
Deep Learning 92.229
Based on the above data, the highest accuracy is in K-Means clustering and that is
92.71371429% which is find out after testing on our own handwritten dataset and finding
out the accuracy in all the cases on the same.
The accuracy of the various classification techniques is (Fig. 6 and Table 3):
Technique Accuracy
Logistic Regression 91.667
Support vector machine 92.857
Decision Tree 64.286
Bayes Classifier 78.571
K-Nearest Neighbors 92.857
Random Forest 64.286
Based on the above data, the highest accuracy is in K-Nearest Neighbors (KNN) and
Support Vector Machine (SVM) and that is 92.85714286%, by finding the accuracy of the
classification techniques using our own handwritten dataset and testing the techniques
on that data.
6 Conclusion
The system eliminates the reliability of an individual on human sources for the solution
of an equation. It helps the person to accurately recognize the various symbols and digits
in an equation and also solving them accurately. This adds to the procedure of saving
time.
The system aims to use the modern technology to solve every day problems and to
help students learn with fun. The system isn’t just helpful for students but can also help
elderly people who aren’t good with arithmetic. Anyone who knows how to use a cell
phone can easily use it by just clicking the picture and seeing the output.
Hence, the system is very beneficial in learning purpose and it bridges the gap
between education and technology.
Based on our method, it could be concluded that the highest accuracy was achieved
by using K-Means as the clustering algorithm and SVM and KNN as the classification
algorithms. This paper covers both the application as well as the theoretical aspects
involving the research to find out the most suitable segmentation and classification
technique. It is different in the aspect that we have tried all the possible techniques
to reach out to the most accurate one.
References
1. Lu, C., Mohan, K.: Recognition of online handwritten mathematical expressions using
convolutional neural networks (2015)
2. Matsakis, N.E.: Recognition of handwritten mathematical expressions. MIT Published (1999)
3. Kamavisdar, P., Saluja, S., Agrawal, S.: A survey on image classification approaches and
techniques. Int. J. Adv. Res. Comput. Commun. Eng. 2(1), 1005–1009 (2013)
4. Abderaouf, Z., Nadjia, B., Saliha, O.K.: License plate character segmentation based on
horizontal projection and connected component analysis. In: 2014 World Symposium on
Computer Applications & Research (WSCAR), pp. 1–5. IEEE (2014)
Handwritten Equation Solver 357
5. Jakjoud, W., Lazrek, A.: Segmentation method of offline mathematical symbols. In: 2011
International Conference on Multimedia Computing and Systems (ICMCS), pp. 1–7. IEEE
(2011)
6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep CNNs. Adv.
Neural Inf. Process. Syst. (2012)
7. Wang, H., Wang, Y., Lu, L., Liu, J., Li, S., Zhang, Y.: An improved algorithm for symbol
segmentation of mathematical formula images. In: 2016 16th International Symposium on
Communications and Information Technologies (ISCIT), pp. 461–464. IEEE (2016)
8. Pauly, L., Hogg, D., Fuentes, R., Peel, H.: Deeper networks for pavement crack detection. In:
Proceedings of the 34th ISARC, pp. 479–485. IAARC (2017)
9. Gharde, S.S., Baviskar, P.V., Adhiya, K.P.: Identification of handwritten simple mathematical
equation based on SVM and projection histogram. Int. J. Soft Comput. Eng. 3(2), 425–429
(2013)
10. Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation.
IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)
11. Literature Review on Handwritten Mathematical Expression Recognition, Chapter 2
12. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene
recognition using places database. Adv. Neural Inf. Process. Syst. 487–495 (2014)
13. Wang, N., Li, S., Gupta, A., Yeung, D.Y.: Transferring rich feature hierarchies for robust
visual tracking. arXiv preprint arXiv:1501.04587 (2015)
14. 2018 Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and
2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR)
15. Handwritten Mathematical Expressions Recognition using Back Propagation Artificial Neural
Network
16. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image
super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision
(ECCV 2014). LNCS, vol. 8692 pp. 184–199. Springer, Cham (2014). https://doi.org/10.
1007/978-3-319-10593-2_13
17. Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image
classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition,
pp. 3642–3649. IEEE (2012)
18. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple
way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958
(2014)
Unveiling the Next Frontier of AI
Advancement
Advancing Image Classification Through
Self-teachable Machine Models
and Transfer Learning
1 Introduction
The application of deep convolutional neural networks [25] (DCNNs), has wit-
nessed remarkable advancements, leading to significant breakthroughs in various
domains, most notably in image classification. DCNNs have demonstrated their
prowess in automatically learning intricate mappings from input data to out-
put labels. Their ability to process complex, high-dimensional data, like images,
has revolutionized pattern recognition systems and enabled the development of
increasingly accurate and efficient solutions.
However, despite their exceptional performance in well-curated datasets,
DCNNs often face a substantial challenge when confronted with real-world sce-
narios characterized by data that significantly deviates from the training distri-
bution. This limitation has spurred the pursuit of solutions that can enhance
the generalization capabilities of deep neural networks, by using the technique
of transfer learning which is a subfield of machine learning.
Chandigarh University.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
D. Garg et al. (Eds.): IACC 2023, CCIS 2053, pp. 361–373, 2024.
https://doi.org/10.1007/978-3-031-56700-1_29
362 M. K. Jha et al.
This research paper ventures into the realm of transfer learning and AutoML
within the context of image classification, employing scalable DCNNs. Further-
more, the growth of neural networks in depth introduces an intriguing and unex-
pected phenomenon-the degradation problem. As network depth increases, the
accuracy initially reaches a saturation point but subsequently declines, a phe-
nomenon distinct from overfitting. This raises questions regarding the ease of
optimizing neural networks, suggesting that not all networks are created equal.
The paper introduces a novel deep residual learning framework to tackle this
challenge, fundamentally altering how networks are trained. Instead of expect-
ing each layer to model the desired mapping directly, residual learning allows
these layers to focus on fitting a residual mapping, mitigating the vanishing
gradient problem with the introduction of shortcut connections. This approach
facilitates the construction of extraordinarily deep networks while maintaining
or even enhancing training and generalization performance.
Additionally, we explore its significance in medical image classification, where
interpretability and reliability are paramount. This paper also addresses the
practical techniques of knowledge transfer, emphasizing the role of transfer learn-
ing [19] in real-world scenarios, where adaptability to new conditions is indis-
pensable. The innovative concept of Google’s Teachable Machine [4] empowers
individuals, irrespective of their technical expertise, to train machine learning
models through intuitive interactions. The potential to democratize AI, enabling
custom models for image classification, sound recognition, and gesture control,
opens doors to uncharted possibilities and challenges. The motivation behind
this research lies in the recognition that there is a significant need to expand
and refine the capabilities of teaching machine-based models [15], making AI
more accessible and inclusive.
2 Related Work
In the past years, significant progress has been made in the fields of [33] trans-
fer learning, image classification using Convolutional Neural Networks, efficient
deep learning neural [33] networks, and efficient image processing. This section
provides an overview of the key contributions in these areas.
Transfer learning, the process of leveraging knowledge learned from one
domain to improve performance in another, has garnered substantial attention.
Notable works in this field include Pan and Yang (2010) who introduced transfer
component analysis, and more recently, Devlin et al. (2018) [36] demonstrated
the effectiveness of pre-trained language models, such as BERT [36], in a wide
range of natural language [36] understanding tasks. In the realm of computer
vision, the study by Yosinski et al. (2014) [35] laid the foundation for under-
standing transferability in deep convolutional networks. These breakthroughs
have set the stage for the development of more efficient and accurate transfer
learning techniques.
Convolutional Neural Networks (CNNs) have been [35] instrumental in image
classification tasks. Krizhevsky et al.’s AlexNet (2012) [17] marked a turning
Advancing Image Classification Through Self-teachable Machine Models 363
point by significantly reducing error rates on the ImageNet dataset [26]. Sub-
sequent work by He et al. (2016) [12] introduced ResNet, which addressed the
vanishing gradient problem and enabled the training of exceptionally deep net-
works. More recently, Howard et al. (2019) [31] presented EfficientNet, a family
of models that demonstrated state-of-the-art [31] performance with fewer param-
eters, emphasizing the need for efficient architectures.
Efficiency in deep learning models has become paramount, especially with
the growing demand for edge and mobile device deployment. Tan et al. (2019)
[27] proposed MobileNetV3, a family of lightweight neural networks that bal-
ance accuracy and speed, making them ideal for resource-constrained applica-
tions. Another noteworthy development is the work by Sandler et al. (2018) [27]
on MobileNetV2, which introduced novel depthwise separable convolutions to
reduce computational complexity. These efficient neural network architectures
have enabled real-time image processing on edge devices.
Efficient image processing techniques [32] play a pivotal role in enhancing the
overall performance of computer vision systems. Recent advancements include
the use of quantization and pruning methods for neural networks, as demon-
strated by Han et al. (2015) and Li et al. (2017). Additionally, hardware accel-
eration technologies like NVIDIA’s TensorRT (2017) and Google’s Edge TPU
(2018) have enabled efficient inference on specialized hardware, further optimiz-
ing image processing pipelines.
3 Proposed Model
Automated machine learning (AutoML) has arisen as a transformative solution
to alleviate the intricate process of ML development. Its primary goal, is to
minimize the need for in-depth data science expertise that are statistical and ML
proficiency. Essentially, AutoML streamlines the construction of an ML pipeline
within a constrained computational environment. The evolution of AutoML has
resulted in comprehensive systems that seamlessly integrate various techniques,
providing user-friendly, end-to-end ML solutions. This movement is evident with
major tech firms, like Google, launching platforms such as Cloud AutoML, which
are geared towards empowering individuals with limited ML expertise to develop
top-tier custom models.
Amidst this backdrop, we introduce ‘Horizon’ - an AutoML-centric desk-
top application designed for binary image classification. Conceptualized using
the TKinter library for its graphical interface, Horizon distinguishes itself by its
inherent simplicity. It permits users to supply their own image sample directories
for training, streamlining the often tedious task of image classification, especially
when only limited information is available for model implementation. By harness-
ing the power of deep learning and transfer learning, Horizon allows for versatile
model training on datasets as varied as ’cat-dog’ differentiation or even ‘facial
mask detection’. Once the model is trained for a binary classification task, it
becomes a permanent tool, primed for future usage and testing. The overarching
objective is to offer a scalable, user-friendly solution that bridges the expertise
gap, making cutting-edge machine learning accessible to a broader audience.
364 M. K. Jha et al.
3.1 Datasets
The images in the dataset have been aggregated from various online repositories,
ensuring a diverse and representative sample for each classification challenge.
Unlike traditional models that demand meticulously curated and pre-processed
datasets, Horizon is architecturally designed to be dataset-agnostic, allowing
users the flexibility to upload raw, unprocessed data. Recognizing the challenges
posed by data variability and the inconsistency of real-world images, Horizon
incorporates advanced image augmentation techniques, resizing, and normaliza-
tion to color adjustments, ensuring the model’s robustness, obviating the need
for users to engage in any manual preprocessing and adaptability across varied
scenarios. This design philosophy underscores Horizon’s commitment to user-
centricity, allowing individuals, irrespective of their technical proficiency, to har-
ness the power of advanced machine learning without the intricacies of data
curation.
Learning Rate: The learning rate determines the step size during gradient
descent optimization [24] which is set to 0.0001.
Number of Layers: There are a total of four types of layers: Input (implicit),
ResNet50 (not explicitly added), Flatten (explicit), and Dense (two explicit lay-
ers). The ResNet50 layers are considered part of the pre-trained model and are
not explicitly added.
Batch Size: The batch size is configured to 32 in the code. It defines the number
of data samples used in each training iteration.
4 Model Architecture
The model architecture is built upon a sequential design signifies a linear stacking
of layers, ensuring that data flows from one layer to the next without branch-
ing or skipping, leveraging the prowess of the renowned ResNet50 model as
its foundation. This design choice harnesses the deep convolutional capabilities
of ResNet50, known for its exceptional ability to extract intricate features from
images through its deep layers. By employing the sequential structure, the model
can efficiently stack layers and operations in a linear flow, ensuring a streamlined
processing pipeline. This strategic integration ensures that the foundational lay-
ers capture generalized image features, allowing the subsequent custom layers to
focus on task-specific nuances, thereby facilitating the development of a robust
and highly capable image classification model.
366 M. K. Jha et al.
model training time, especially [20] with extensive datasets and complex models,
efficiency becomes paramount. EarlyStopping plays a critical role in this regard.
Instead of training a model for a fixed number of epochs, regardless of its conver-
gence status, EarlyStopping monitors a specified performance metric and halts
training once the metric stops showing significant improvement. This not only
saves computational resources but also prevents overfitting, as prolonged training
past the point of optimal performance can lead the model to fit [10] too closely
to the training data, reducing its generalization capability on new, unseen data.
In the context of AutoML, where the aim is to rapidly and efficiently produce
models that are ready for deployment, incorporating strategies like EarlyStop-
ping ensures that models are trained just enough to achieve optimal performance
without unnecessary computational overhead or risking overfitting.
6 Training Procedure
During the training phase, the model [38] undergoes training on the provided
dataset, simultaneously validating its performance using a separate validation
set. This training is undertaken over several epochs, with the exact number
determined by an EarlyStopping criterion. This approach ensures that train-
ing is halted once the model’s performance plateaus or begins to degrade, thus
preventing overfitting and unnecessary computational overhead.
After the training concludes based on the EarlyStopping criteria, the model is
saved to local storage as an H5 file. The choice of the .h5 format, also referred to
as HDF5 (Hierarchical Data Format version 5), stems from its wide acceptance in
the machine learning community for its capability to store intricate hierarchical
data structures efficiently. It’s a go-to format for preserving and subsequently
loading trained machine learning models, particularly neural networks. Upon
successful saving of the model, a confirmation message, is displayed, indicating
that the model is now available for future deployment, use, or further evaluation.
Advancing Image Classification Through Self-teachable Machine Models 369
The model, saved previously in the HDF5 file format, is loaded to make
predictions on the processed image. The prediction output is a probability dis-
tribution across the classes, which is subsequently interpreted. If the prediction
for the first class exceeds 0.5, it is selected; otherwise, the second class is chosen.
Additionally, the function showcases the confidence of the prediction by dis-
playing the maximum probability from the prediction array. The result is then
displayed on the user interface, showing the predicted class label along with its
confidence score. Moreover, a visual representation of the input image is provided
for reference.
The classification outcome, presented in the form of a class label and a con-
fidence score, gives users immediate feedback on the model’s prediction for the
uploaded image. Through this GUI-based approach, the application aims to sim-
plify complex deep learning workflows, making it accessible even to those without
extensive machine learning expertise (Fig. 4).
370 M. K. Jha et al.
8 Discussion
References
1. Ahmad, A., Saraswat, D., El Gamal, A.: A survey on using deep learning techniques
for plant disease diagnosis and recommendations for development of appropriate
tools. Smart Agric. Technol. (2023)
2. Alhashim, I., Wonka, P.: High-quality monocular depth estimation via transfer
learning (2018)
3. Ashwath, V.A., Sikha, O.K., Benitez, R.: TS-CNN: a three-tier self-interpretable
CNN for multi-region medical image classification. IEEE Access 11 (2023)
4. Abou Baker, N., Zengeler, N., Handmann, U.: A transfer learning evaluation of
deep neural networks for image classification. Mach. Learn. Knowl. Extract. 4,
22–41 (2022)
5. Chen, W., Su, L., et al.: Rock image classification using deep residual neural net-
work with transfer learning. Front. Earth Sci. (2023)
6. Cirstea, B.-I.: Contributions to handwriting recognition using deep neural networks
and quantum computing (2018)
7. De Giacomo, G., Catala, A., et al. (eds.): Frontiers in Artificial Intelligence and
Applications, vol. 325 (2020)
8. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural net-
works with pruning, trained quantization and Huffman coding. arXiv:1510.00149
[cs.CV] (2015)
9. He, K., Sun, J.: Convolutional neural networks at constrained time cost. In: CVPR
(2015)
10. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional
networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T.
(eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://
doi.org/10.1007/978-3-319-10578-9 23
11. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-
level performance on ImageNet classification. In: ICCV (2015)
12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
arXiv preprint arXiv:1512.03385 (2016)
13. He, K., Zhang, X., Shaoqing, R., Sun, J.: Deep residual learning for image recog-
nition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition.
Microsoft Research (2016)
372 M. K. Jha et al.
14. He, X., Zhao, K., Chu, X.: AutoML: a survey of the state-of-the-art.
arXiv:1908.00709v6 [cs.LG] (2021)
15. Hilal, A.M., Al-Wesabi, F.N., et al.: Deep transfer learning based fusion model for
environmental remote sensing image classification model. Eur. J. Remote Sens. 55,
12–23 (2022)
16. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.:
Improving neural networks by preventing co-adaptation of feature detectors.
arXiv:1207.0580 (2012)
17. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by
reducing internal covariate shift. In: ICML (2015)
18. Jin, H., Song, Q., Hu, X.: Auto-keras: an efficient neural architecture search system.
In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, pp. 1946–1956. ACM
(2019)
19. Kocmi, T., Bojar, O.: Trivial transfer learning for low-resource neural machine
translation. In: Proceedings of the 3rd Conference on Machine Translation 2018
(2018). Accepted to WMT18 research paper
20. Koziarski, M.: Two-stage resampling for convolutional neural network training in
the imbalanced colorectal cancer image classification. In: 2021 International Joint
Conference on Neural Networks (IJCNN), pp. 1–8 (2021)
21. Kraft, D., Bieber, G., Jokisch, P., Rumm, P.: End-to-end premature ventricular
contraction detection using deep neural networks. Sensors 23, 8573 (2023)
22. Krishna, S.T., Kalluri, H.K.: Deep learning and transfer learning approaches for
image classification. Int. J. Recent Technol. Eng. 7, 427–432 (2019)
23. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition.
Neural Comput. 1(4), 541–551 (1989)
24. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to
document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
25. Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification.
Neural Comput. 29(9), 1–98 (2017)
26. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J.
Comput. Vision (IJCV) 115, 211–252 (2015)
27. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetv 2:
inverted residuals and linear bottlenecks. arXiv:1801.04381 [cs.CV] (2018)
28. Sanida, T., Sideris, A., Sanida, M.V., Dasygenis, M.: Tomato leaf disease identifi-
cation via two-stage transfer learning approach. Smart Agric. Technol. 5, 100275
(2023)
29. Shao, S., McAleer, S., Yan, R., Baldi, P.: Highly accurate machine fault diagnosis
using deep transfer learning. IEEE Trans. Industr. Inf. 15(4), 2446–2455 (2019)
30. Szegedy, C., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on
Computer Vision and Pattern Recognition (2015)
31. Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural
networks. arXiv:1905.11946v5 [cs.LG] (2020)
32. Vo, A.T., Tran, H.S., Le, T.H.: Advertisement image classification using convolu-
tional neural network. In: 2017 9th International Conference on Knowledge and
Systems Engineering (KSE). IEEE (2017)
33. Bjørn von Rimscha, M. (ed.): Management and Economics of Communication. De
Gruyter Mouton, Berlin, Boston (2020)
Advancing Image Classification Through Self-teachable Machine Models 373
34. Xu, Z., Yu, H., Zheng, K., Gao, L., Song, M.: A novel classification framework
for hyperspectral image classification based on multiscale spectral-spatial convo-
lutional network. In: 2021 11th Workshop on Hyperspectral Imaging and Signal
Processing: Evolution in Remote Sensing (WHISPERS). IEEE (2021)
35. Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in
deep neural networks? arXiv:1411.1792 [cs.LG] (2014)
36. Zhang, D., et al.: Domain-oriented language modeling with adaptive hybrid mask-
ing and optimal transport alignment. In: Proceedings of the 27th ACM SIGKDD
Conference on Knowledge Discovery & Data Mining. ACM (2021)
37. Zhao, Q., Zhang, L., He, B., Liu, Z.: Semantic policy network for zero-shot object
goal visual navigation. IEEE Robot. Autom. Lett. 8(11), 7655–7662 (2023)
38. Zhao, Q., Zhang, L., He, B., Qiao, H., Liu, Z.: Zero-shot object goal visual naviga-
tion. In: 2023 IEEE International Conference on Robotics and Automation (ICRA),
pp. 2025–2031 (2023)
Analysis Effect of K Values Used in K Fold Cross
Validation for Enhancing Performance
of Machine Learning Model with Decision Tree
1 Introduction
Performance valuation of any Machine learning model is simply like valuating the scores,
that we evaluate scores of students in colleges and universities or in schools to decide
and fit the eligibility criteria for whether students are eligible or not for getting admission
in his/her best courses or similarly like to select a student for in campus interviews of
companies. The score identifies the fact that the applicant or the student is continuously
having the good performance. In the similar way we have been expected for every
machine learning model. When a machine learning model is tested, we accept that it
has achieve the expected results for predictions or forecasting or classification. There
are numbers way and mathematical formulas are available to analyze and evaluate the
accuracy of the ML model. The ML model engineer has responsibility to always develop a
generalized ML model so that the performance is expected. We must satisfy the customers
by adding the key benefits for business [2, 13]. Every ML model has performance in
terms of some range of numeric value. For example, we have developed a model and
found its performance between 85–90% during training and testing. In many cases the
model shows the same accuracy for both in training and testing and some time it will
not perform in the same result. Accuracy is nothing, it is just a numerical value for
better understanding and getting of a prediction for given unknown data based on the
problem, so that we can correctly understand the predictions. It is better to use different
combinations of data when training a ML model [14, 20].
Dataset
100%
70%
Data used for train-
ing
Trained ML model
Figure 1 shows the working of the traditional approach are based on two criteria it
is either split data into 70:30 or 80:20. The major part of split data is used to train the
model and the small parts of slit data is used to test the model. Due to this traditional
approach the model suffers from several problems some of them are [8, 9, 15].
• Problem with splitting: To achieve good accuracy the better performance of the model
we only used a large part of the data, when the size of the data is large the model will
376 V. K. Verma et al.
not be aware about nature of the data into second part and same thing will happen in
case of testing only one part of data used for tested. This is the reason why models
suffer from overfitting.
• Problem of generalization: Due to training with large portion and testing with small
portion of data model cannot be generalized.
• Number of Sample reduce for training: By reducing number of samples for training
model, learning of the model gets reduced. So, we cannot train a model with a smaller
number of records.
Our objective in this paper includes Applying K fold cross validation and splitting
into number of folds for training and testing, Apply distinct value of k and determine
whether the model is over- or under-fitted. Evaluating the model and determining its
accuracy and finally determining whether the model is generalizing well to data. By
using k-fold cross-validation, we can “test” the model on k different data sets, which
helps to ensure that the model is generalizable.
Fig. 2. Select 1st record and used for testing and remining records are used for training
Analysis Effect of K Values Used in K Fold Cross Validation 377
Fig. 3. Select 2nd record and used for testing and remining records are used for training
This approach is time consuming and only suitable where the number of records
is limited in the dataset. The main disadvantage of this approach is we require a large
number of iterations depending upon the number of records.
3 Literature Survey
We have studied some of the research paper related with our research work.
In 2015 R. Nithya et al. proposed Bayes model using 10-fold they used heart disease
patient Dataset. They showed three classifiers namely Naïve Bayes, Bayes Net and Multi-
nomial for are analysis purpose. They used cross-validation parameter on heart disease
patient dataset and estimated the performance of the algorithms. They used more than
25 different attributes from heart patient dataset for calculating the performance. By the
experimental analysis they observed that NB algorithm better than other algorithms [1].
In 2016 Babatunde R. S et al. proposed an approach based on Random Partitioning with
K Fold CV. They used the proposed approach for Face. They explained the importance
of Face recognition. They describe the application where Face recognition play an active
and important role like pattern recognition, computer vision, credit cards, passport and
security. They tested the proposed technique on the face when it is randomly partitioned.
They used quantitative experimental analysis [2]. In 2017 Max A. Little et al. proposed
strategies behind the CV technique. They applied a new idea and created a small group
of data sets. They applied this idea in ML to predict disease which have subject-specific
features. They constructed different cases and have subject-specific variations [3]. In
2018 Daniel Berar et al. proposed Cross-validation. They explained the very common
type of CV and with related data resampling approaches. They describe resampling with
K-fold CV; random subsampling with K-fold; how the learning set applied; problem of
overfitting; how to predict error; how resubstituting error and validation set. They also
used CV to resolve the overfitting problem. They explained that performance of CV is
inaccurate when we used small sample of data sets [4].
In 2018 Yoonsuh Jung et al. proposed an approach based on K Fold CV for ML selec-
tion. They describe that in model selection K-fold CV is commonly accepted technique.
They describe that construction of a current model is more important than the validation
of a model. They proposed CV technique in which (K − 1) folds are used for validation,
other fold is used construction. They provide some directions to choose the right value for
K [5]. In 2019 Nagadevi Darapureddy et al. proposed research for ML algorithms with
K-Fold CV. They used K Fold CV on different ML algorithms like Logistic regression,
Decision tree, SVM, KNN. They used UCI dataset for implementation. They applied for
a K Fold CV for training and testing. They compare the accuracy of all these classifiers
[6]. In 2019 Andrius Vabalas et al. proposed a technique based on limited sample size
using. They small sample size and applied on ML Model to predict individuals with
higher classification accuracy. They used simulations and showed K-fold performance
when small sample sizes are used. They showed that nested CV and train/test split tech-
niques produce robust performance. They suggested that we can design robust testing
techniques with a small dataset [7]. In 2020 Sitefanus Hulu et al. proposed performance
analysis for KNN using K Fold CV. They showed when data sharing is applied using CV
provides better percentage. They tested K-NN classifiers using iris data sets. They used
variation in test values and found 100% percentage accuracy. They showed variation in
value of K for K-Nearest Neighbor and also used variations in K for K-Fold CV and
found 95% percentage accuracy [8].
In 2021 Muhammad Asrola et al. proposed SVM using K Fold CV for the Industry’s
Sustainability Performance. They used SVM to create ML model which assesses the
Analysis Effect of K Values Used in K Fold Cross Validation 379
4 K Fold CV Approach
4.1 CV Process
While training a model we always require a good ratio of data for training as well as
testing. We prime need is to train the model with large portion of the data, otherwise,
model will fail to understand and recognise the trend of the data. This ultimately proce-
dure a higher bias produce. Figure 5 shows the working of 5 fold k cross validation [16,
17].
Multiple numbers of repetition are required in training and testing process. This
process helps for effectively validate the model. In K-fold CV there are three things that
need to take care. This approach has following steps.
1. The whole dataset needs to be split randomly into k fold.
380 V. K. Verma et al.
2. Use k-1 folds to tarin and build Model. To evaluate effectiveness, test the model using
kth fold.
3. For every iteration records the accuracy of every fold. Note down the predicted value
for every fold.
4. Iterate this process for all the k-folds has to serve in the testing process.
5. Calculate the average of all iteration for predicted value. Generate performance metric
which shows the overall performance of the model.
Standard guidelines need to follow while selecting k value to create number of folds:
• If the size of the data is decent, we can use 10 as standard k value.
• When the size of data is large, we used 5 as value of k.
• When the size of the data is small, need to increase number of folds. It is found that
for large k value run time has increased.
• When a dataset has a few numbers of records we need to apply leave-one-out approach.
Fig. 7. First round, the first 200 records are used for testing and the remaining 800 for Training
data.
Fig. 8. Second round, the next 200 records are used for testing and the remaining 800 for training
data.
Fig. 9. Repeat the process for remaining round, for testing and training
• 4 GB main memory
• 4 GB RAM
• Inbuilt HDD: 500 GB
7 Result Analysis
7.1 Performance Analysis Using 3 Folds
Comparing Accuracy Scores for 3 Folds. Form Table 1 shows that for first fold accu-
racy is 0.88947368, for second fold accuracy is 0.93684211 and for third fold accuracy
is 0.9047619. It is observed that accuracy for the first and third fold is less and for the
second fold it is high. The average accuracy of the model using 3 folds is 91%.
Comparing Precision Score for 3 Folds. From Table 2 shows that precision value
for first fold is 0.84722222, for second fold it is 0.94029851 and for third fold it is
0.825. It is observed that precision for the first and third fold is less and for second it
is high. The average precision value for model using 3 folds is 0.870840. Figure 10
shows the difference between training and testing for accuracy and precision score in
form of graphical representation for 3 folds. Figure 10 shows graphical representation
of accuracy and precision for 3 folds.
Comparing Recall Score for 3 Folds. From Table 3 shows clear that recall for first
fold is 0.85915493, for second fold it is 0.88732394 and for third fold it is 0.94285714.
It is observed that recall for the first and second fold is less and for third it very high.
Average recall value for the model using 3 fold is 0.896445.
384 V. K. Verma et al.
Comparing Accuracy scores for 3 Folds Comparing Precision Score for 3 folds
1.02 1.2
1
0.98 1
0.96 0.8
Precision
Accuracy
0.94
0.92 0.6
0.9 Training Training
0.88 0.4
TesƟng TesƟng
0.86 0.2
0.84
0.82 0
1 2 3 1 2 3
Fold No Fold No
Comparing F1 Score for 3 Folds. Form Table 4 shows that F1 Score value for first
fold is 0.85314685, for second fold it is 0.91304348 and for third fold it is s 0.88. It is
observed F1 Score for the first and third fold is less and for third fold it is high. Average
F1 Score for the model using 3 fold is 0.882063. Figure 11 shows difference between
training and testing for recall and F1 Score in form of graphical representation for 3
folds.
Analysis Effect of K Values Used in K Fold Cross Validation 385
1 1
0.95 0.95
F1 Score
Recall
0.9 0.9
Training Training
0.85 0.85
Tesng Tesng
0.8 0.8
0.75 0.75
1 2 3 1 2 3
Fold No Fold no
Comparing Accuracy Scores for 5 Fold. Form Table 5 it is observed that accuracy for
first and second fold is less and for fold third, fourth and fifth fold the accuracy is very
high. The average accuracy value of the model using five old is 0.934994564.
Comparing Precision Score for 5 Folds. From Table 6 it is observed that for first- and
second-fold precision is high and for the third, fourth and fifth it is low. The average
precision value of the model using 5 folds is 0.912429. Figure 12 shows the difference
between training and testing for accuracy and precision of graphical representation for
5 folds.
386 V. K. Verma et al.
Comparing Accuracy scores for 5 Folds Comparing Precision Score for 5 folds
1.02 1.02
1 1
0.98 0.98
0.96
0.96
Accuracy
Precision
0.94
0.94 0.92
0.92 Training 0.9 Training
0.9 0.88
Tesng Tesng
0.86
0.88 0.84
0.86 0.82
1 2 3 4 5 1 2 3 4 5
Fold No Fold No
Comparing Recall Score for 5 Folds. From Table 7 it observed that recall for first and
second fold is less and for fold third, fourth and fifth the precision values are very high.
The average recall value of the model using five folds is 0.9158361.
Comparing F1 Score for 5 Folds. From Table 8 it is observed that F1-Score for first
and second fold is less and for fold third, fourth and fifth fold F1 Score is high. Aver-
age F1 Score of the model using 5 folds is 0.912161. Figure 13 shows the graphical
representation of recall and F1 score for 5 folds (Table 9).
Analysis Effect of K Values Used in K Fold Cross Validation 387
1.05 1.05
1 1
0.95 F1-Score
0.95
Recall
0.9
Training 0.9 Training
0.85
Tesng Tesng
0.8 0.85
0.75 0.8
1 2 3 4 5 1 2 3 4 5
Fold No
Fold No
Comparing Precision Score for 7 Folds. From Table 10 it is observed that Precision
score for second and sixth fold is low and for remaining folds it I high. The average pre-
cision value of the model using seven folds is 0.9029988. Figure 14 shows the graphical
representation of accuracy and precision for 7 folds.
Comparing Recall Score for 7 Folds. From Table 11 it is observed that recall score
for second and third fold is less and for remaining fold recall values is high. Average
recall score for the model using 7 folds is 0.905837173.
388 V. K. Verma et al.
Comparing Accuracy scores for 7 folds Comparing Precision Score for 7 folds
1.05 1.2
1
1
0.8
Precision
Accuracy
0.95
0.6
0.9 Training Training
0.4
0.85 Tesng Tesng
0.2
0.8 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Fold No Fold No
Comparing F1 Score for 7 Folds. From Table 12 it is observed that for second, third
and sixth fold has less F1 score value and remaining fold has high F1 score value.
Analysis Effect of K Values Used in K Fold Cross Validation 389
Average F1 Score value for the model using 7 folds 0.9030012.2. Figure 15 shows the
graphical representation of recall and F1 score for 7 folds.
Comparing Recall Score for 7 folds Comparing F1-Score for all 7 folds
1.2 1.2
1 1
0.8 0.8
F1-Score
Recall
0.6 0.6
Training Training
0.4 0.4
0.2 Tesng 0.2 Tesng
0 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Fold No Fold No
Comparing Accuracy scores for 10 folds Comparing Precision Score for 10 folds
1.02 1.02
1
1
0.98
0.98 0.96
0.96
Precision
0.94
Accuracy
0.94 0.92
0.92 0.9
Training Training
0.9 0.88
0.86
0.88 Tesng Tesng
0.84
0.86 0.82
0.84 0.8
1 2 3 4 5 6 7 8 9 10 1 3 5 7 9
Fold No Fold No
Comparing Recall Score for 10 Folds. From Table 15 it is observed that recall value
for second and fourth fold is low and for remaining folds it is high. Average recall value
of the model using ten folds is 0.92012987.
Comparing F1 Score for 10 Folds. From Table 16 it is observed that recall value for
second and fourth fold is low and for remaining folds it is high. Average recall value of
the model using ten folds is 0.91732288. Figure 17 shows the graphical representation
of recall and F1 score for 10 folds.
392 V. K. Verma et al.
1.05 1.05
1 1
0.95 0.95
F1-Score
Recall
0.9 0.9
Training Training
0.85 0.85
Tesng Tesng
0.8 0.8
0.75 0.75
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Fold No Fold No
From the implementation and experimental analysis, we calculate the average accuracy
value for all folds (3, 5, 7 and 10). From graph 18 we can see that average accuracy
value is higher when the value of K is 10(10 folds). From graph 19 we can see that that
average precision value is higher when the value of K is 10(10 folds). From graph 20 we
can see that average recall value is higher when the value of K is 10(10 folds). Finally
From graph 21 we can see that average F1 Score value is higher when the value of K is
10(10 folds).
Analysis Effect of K Values Used in K Fold Cross Validation 393
Accuracy in Percentage
93.5
92.79907085
93
92.5
92
91.5 91.03592314
91
90.5
90
89.5
3 5 7 10
k value
91 90.299886
90
89
88 87.084024
87
86
85
84
3 5 7 10
k value
92.5 92.012987
Recall Value in percentage
92 91.5836102
91.5
91 90.5837174
90.5
90 89.6445339
89.5
89
88.5
88
3 5 7 10
k value
91 90.300128
90
89 88.206344
88
87
86
Fig.21. Comparison
3 average F1
5 Score for different
7 k values 10
k value
recorded up to four significant digits. All feature values are recorded with four signif-
icant digits. We have used the different K value for K Fold CV. Here we used four
performance parameters accuracy, precision, recall and F1 Score. By the implantation
and experimental analysis, we proved that the value for K is much better as compared
to 3, 5 and 7. By selecting K value 10 we check the average accuracy value, average
precision value, average recall value and finally F1 Score value we see the performance
of the classifier is better. So, we concluded here that it is better to use 10 as K value
for K Fold CV. We also found if we select the value of K as 5 it is also good but not as
compared to 3 and 7. So Using tradition approach 80:20 or 70:30. We have to use K Fold
CV approach because it is better way to achieve highest performance of the any ML
model. In future we will consider some other classifiers and maintain stable accuracy.
We also apply some others real life data set and check the performance of K Fold CV.
In future we will also consider the computational cost of different K values.
References
1. Nithya, R., Ramyachitra, D., Manikandan, P.: An efficient Bayes classifiers algorithm on 10-
fold cross validation for heart disease dataset. Int. J. Comput. Intell. Inform. 5(3), 229–235
(2015)
2. Babatunde, R.S., Olabiyisi, S.O., Omidiora, E.O., Ganiyu, R.A., Isiaka, R.M.: Assessing
the performance of random partitioning and k-fold cross validation methods of evaluation
of a face recognition system. In: The Ninth International Conference on Applications of
Information Communication Technologies to Teaching, Research and Administration, p. 129
(2015). https://doi.org/10.14738/aivp.36.1460
3. Little, M.A., Varoquaux, G., Saeb, S.: Using and understanding cross-validation strategies.
https://doi.org/10.1093/gigascience/gix020. Advance Access Publication Date: 17 March
2017 Chicago, USA and 4Rehabilitation Institute of Chicago, 345 E Superior, 60611, Chicago,
USA
4. Berrar, D.: “Cross-validation” Data Science Laboratory, Tokyo Institute of Technology
2-12-1-S3-70 Ookayama, Meguro-ku, Tokyo 152-8550, Japan (2018) Cross-validation.
Encyclopedia of Bioinformatics and Computational Biology, Volume 1, Elsevier
5. Jung, Y.: Multiple predicting K-fold cross-validation for model selection. J. Nonparametric
Stat. 30(1), 197–215 (2018). https://www.tandfonline.com/loi/gnst20
6. Darapureddy, N., Karatapu, N., Battula, T.K.: Research OF machine learning algorithms using
K-fold cross validation. Int. J. Eng. Adv. Technol. 8(6), 215–218 (2019)
7. Vabalas, A., Gowen, E., Poliakoff, E., Casson, A.J.: Machine learning algorithm validation
with a limited sample size. PLoS ONE 14(11), e0224365 (2019)
8. Hulu, S., Sihombing, P.: Sutarman, analysis of performance cross validation method and
K-nearest neighbor in classification data. Int. J. Res. Rev. 7, 69–73 (2020)
9. Asrol, M., Papilo, P., Gunawan, F.E.: Support vector machine with K-fold validation to
improve the industry’s sustainability performance classification. Procedia Comput. Sci. 179,
854–862 (2021)
10. Phinzi, K., Abriha, D., Szabó, S.: Classification efficacy using k-fold cross-validation and
bootstrapping resampling techniques on the example of mapping complex gully systems.
Remote Sens. 13(15), 2980 (2021). https://doi.org/10.3390/rs13152980
11. Lin, Z., Lai, J., Chen, X., Cao, L., Wang, J.: Curriculum reinforcement learning based on
K-fold cross validation. Entropy 24(12), 1787 (2022)
396 V. K. Verma et al.
12. Prusty, S., Patnaik, S., Dash, S.K.: SKCV: stratified K-fold cross-validation on ML classifiers
for predicting cervical cancer. Front. Nanotechnol. 4, 972421 (2022). https://doi.org/10.3389/
fnano.2022.972421
13. Suzuki, K., Kambayashi, Y., Matsuzawa, T.: CrossSiam: k-fold cross representation learning.
In: Proceedings of the 14th International Conference on Agents and Artificial Intelligence
(ICAART 2022), vol. 1, pp. 541–547 (2022). https://doi.org/10.5220/0010972500003116
14. Wieczorek, J., Guerin, C., McMahon, T.: K-fold cross-validation for complex sample surveys.
Stat 11(1), e454 (2022). https://doi.org/10.1002/sta4.454
15. Nti, I.K., Nyarko-Boateng, O., Aning, J.: Performance of machine learning algorithms with
different K values in K-fold cross-validation. J. Inf. Technol. Comput. Sci. 6, 61–71 (2021).
https://doi.org/10.5815/ijitcs.2021.06.05
16. Zhang, X., Liu, C.A.: Model averaging prediction by K-fold cross-validation. J. Econom.
235(1), 280–301 (2023)
17. Aghbalou, A., Sabourin, A., Portier, F.: On the bias of K-fold cross validation with stable
learners. In: International Conference on Artificial Intelligence and Statistics, pp. 3775–3794.
PMLR (2023)
18. Anguita, D., Ghelardoni, L., Ghio, A., Oneto, L., Ridella, S.: The ‘K’ in K-fold cross
validation. In: ESANN, pp. 441–446 (2012)
19. Wong, T.T.: Performance evaluation of classification algorithms by k-fold and leave-one-out
cross validation. Pattern Recognit. 48(9), 2839–2846 (2015)
20. Oyedele, O.: Determining the optimal number of folds to use in a K-fold cross-validation: a
neural network classification experiment. Res. Math. 10(1), 2201015 (2023)
The Forward-Forward Algorithm: Analysis
and Discussion
Pune, India
Abstract. This study explores the potential and application of the newly proposed
Forward-Forward algorithm (FFA). The primary aim of this study is to analyze
the results achieved from the proposed algorithm and compare it with the existing
algorithms. What we are trying to achieve here is to know the extent to which FFA
can be effectively deployed in any neural network and to investigate its efficacy
in producing results that can be compared to those generated by the conventional
Backpropagation method. For diving into a deeper understanding of this new
algorithm’s benefits and limitations in the context of neural network training, this
study is conducted. In the process of experimentation, the four datasets used are
the MNIST dataset, COVID-19 X-ray, Brain MRI and the Cat vs. Dog dataset.
Our findings suggest that FFA has potential in certain tasks in CV. However, it is
yet far from replacing the backpropagation for common tasks. The paper describes
the experimental setup and process carried out to understand the efficacy of the
FFA and provides the obtained results and comparative analysis.
1 Introduction
The field of neural networks has seen a rapid expansion in recent years, with researchers
exploring new algorithms and techniques for training these models. Gradient descent
computed using backpropagation is one of the most powerful techniques used in ML
and has proved highly effective in generating representations. However, from the brain
perspective, it is still unclear as to how real neurons implement it. There is no clear and
convincing evidence of how the errors are propagated through the layers via backprop-
agation. Further, the traditional methods require proper knowledge of the computations
performed in the forward pass for computing the correct derivatives. If the non-linearities
in the forward pass are unknown, then it is not plausible to compute the backpropagation
and carry the error back.
When a perfect representation and model of the forward pass are unavailable, rein-
forcement learning can be used. However, RL techniques consist of high variance. These
methods do not scale well, and large networks with billions of parameters can’t compete
with the backpropagation approach.
To that end, Hinton [5] proposed the Forward-Forward algorithm (FFA) inspired
by the Boltzmann machine. The FFA may prove superior to backpropagation in two
areas: firstly, as a way to understand and model learning in the human brain, especially
the cortex and secondly, in order to utilize the very low-power analogue hardware.
This study aims to investigate the potential of the FFA) or training neural networks.
Specifically, we seek to evaluate the algorithm’s performance on four different datasets:
the MNIST dataset [1], the COVID-19 chest X-ray dataset [2], the Brain tumor [3] image
dataset and the Cat vs Dog dataset [4]. By testing the algorithm on these datasets, we aim
to analyze and discuss our findings regarding the effectiveness of FFA across a range
of different applications and scenarios [6, 7]. In FFA the forward and backward passes
of backpropagation is replaced with 2 forward passes, one with negative data and the
other with positive data. The objective function for each layer is given as to have low
goodness for negative data and high goodness for positive data [8–10]. By following
this method, we can significantly boost the learning process and make the training of
the neural network with multiple layers easier. Firstly, the algorithm’s performance is
evaluated on multiple datasets to conclude that it achieves a high level of accuracy
in different given scenarios [11–15]. Secondly, the results achieved by the FFA will be
compared to the existing conventional backpropagation algorithm, which would enhance
the understanding of the difference between the two algorithms [5, 16]. And at last, we
tend to examine the potential of the FFA, which could contribute a lot to the training
approach of the neural networks. In a nutshell, this study demonstrates the potential of
this algorithm and how it can change the training of neural networks.
The paper is organized in 7 sections. Section 2 discusses methods, followed by
Sect. 3 discusses algorithmic formulation. In Sect. 4 system design is presented and in
Sect. 5 experimental setup is described. Section 6 presents the results followed by the
conclusion and future work in Sect. 7.
2 Methods
2.1 Datasets
Four benchmark datasets are used in the study. The datasets are selected as they align with
our study objective of evaluating the performance of the FFA across various applications
and scenarios. These datasets cover diverse tasks, allowing us to gain insights into the
algorithm’s effectiveness and potential benefits (Table 1).
Boltzmann machines [26] and Noise Contrastive Estimation [25] served as inspiration for
the greedy multi-layer learning process that makes up the FFA. The goal is to replace the
backward and forward passes of backpropagation with 2 identically functioning forward
passes that work on distinct data sets and have diametrically opposed goals. The positive
pass uses actual data and modifies the weights to improve each hidden layer’s quality. The
The Forward-Forward Algorithm: Analysis and Discussion 399
negative passes operate on the “negative data” and adjust the weights for decreasing the
goodness of every hidden layer. This research paper explores the two different measures
of goodness. However, many other measures are possible. Let’s assume that the sum
of the squares of the activity of the rectified linear neurons inside a layer serves as the
goodness function for that layer. The squared length of the activity vector is chosen
as the goodness function for two primary reasons. First, it has very simple derivatives.
Second, layer normalization removes all traces of the goodness function.
Learning aims to raise goodness much over a predetermined threshold value for
actual data and significantly lower it for negative data. More specifically, when the
likelihood that an input vector is positive (i.e., real) is determined by applying the log
function to the goodness minus some threshold, the goal is to accurately categorize the
input vectors as positive or negative., as shown in Eq. 1.
⎛ ⎞
p(positive) = σ ⎝ yj2 − θ ⎠ (1)
j
where yj is the activity of hidden unit j before the layer normalization. The neural net
would predict the negative data using a top-down connections approach.
3 Algorithmic Formulation
The understanding of Boltzmann Machines served as the inspiration for the idea [8,
17]. In order to change the network weights, backpropagation measures the discrepancy
between the output that was anticipated and the output that actually occurred. On the
other hand, the FFA makes the analogy of neurons that become “excited” upon viewing
a specifically recognized pairing of a picture and its accurate corresponding label. The
biological learning process that takes place in the cortex serves as some of the motivation
for this approach. The fact that this strategy eliminates the requirement for backpropa-
gation over the network and makes weight adjustments local to the layer itself is a big
benefit [6, 18, 19].
Algorithmic Formulation of the Feedforward Associative Algorithm:
1. Initialize the network architecture:
• Define the number of layers, neurons in each layer, and the activation functions
for every neuron.
400 S. Thakur et al.
Figure 1 shows the architecture for the proposed approach. To evaluate the potential of
the FFA for training neural networks, the following pipeline was followed:
A. Dataset Selection and Preprocessing
1. Identifying and selecting appropriate datasets for evaluation, including the MNIST
dataset, Cat vs Dog dataset, brain tumor dataset, and COVID-19 dataset.
The Forward-Forward Algorithm: Analysis and Discussion 401
Data Results
Model Performa Limitation
selection Implemen Analysis
Design nce s and
& pre- tation of and
and Evaluatio Future
processin FFA Interpreta
Training n Directions
g tion
2. Preprocessing the datasets by normalizing the input data, handling missing values
(if any), and partitioning the data into training set, validation set, and testing set.
B. Implementation of the FFA [18]
1. Developing the necessary software infrastructure and libraries for implementing
the FFA.
2. Designing and implementing the FFA, including the forward pass with positive
data, the forward pass with negative data, and the objective functions for each
layer.
3. Incorporating any additional modifications or adaptations to the algorithm based
on prior research or specific requirements of the datasets (Table 2).
C. Model Architecture and Training [6, 19, 20] consists of deciding the suitable NN
architecture for an individual dataset along with the hyper-parameters. Then, the
model training using FFA and monitoring the model convergence, accuracy and
loss and overfitting/underfitting and finding appropriate solutions to manage them
(Table 3).
402 S. Thakur et al.
D. Performance Evaluation [8, 21, 22] using standard evaluation parameters namely, F1
score, accuracy etc. is carried out. The comparison of the results obtained using FFA
with the traditional methods (using backpropagation) is also carried out (Table 4).
evaluation, and analysis, enabling us to draw meaningful conclusions and identify future
research directions.
5 Experimental Setup
For the CatVsDog Dataset, the model did not perform well. It started converging
after ten epochs, and the slope was constant. Therefore, to achieve better results, the
epochs were shut down after reaching 150, achieving an accuracy of 67.69% *Refer
Fig. 3). The accuracy achieved on the Brain tumour dataset was 62.29% after running
the model for 150 epochs. The graph did not converge, so just to validate whether higher
accuracy will be achieved, the model is run for more than 300 epochs, but it decreased
the accuracy, dropping from 62.29% to 58.59%. (Refer to Fig. 4).
The algorithm could not deliver on the most complex dataset, which was the COVID-
19 chest X-ray dataset; the graph for the loss started converging just after 80 epochs,
and the final accuracy was achieved as 45.23% (Refer Fig. 5).
The Forward-Forward Algorithm: Analysis and Discussion 405
References
1. Scellier, B., Bengio, Y.: Equilibrium propagation: bridging the gap between energy-based
models and backpropagation. Front. Comput. Neurosci. 11, 24 (2017)
2. Carandini, M., Heeger, D.J.: Normalisation as a canonical neural computation. Nat. Rev.
Neurosci. 13(1), 51–62 (2013)
3. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning
of visual representations. In: Proceedings of the 37th International Conference on Machine
Learning, pp. 1597–1607 (2020)
4. Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big self-supervised models
are strong semi-supervised learners. arXiv preprint arXiv:2006.10029 (2020)
406 S. Thakur et al.
5. Pereyra, G., Tucker, G., Chorowski, J., Kaiser, Ł., Hinton, G.: Regularising neural networks
by penalising confident output distributions. arXiv preprint arXiv:1701.06548 (2017)
6. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple
way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958
(2014)
7. Lillicrap, T., Santoro, A., Marris, L., Akerman, C., Hinton, G.E.: Backpropagation and the
brain. Nat. Rev. Neurosci. 21, 335–346 (2020)
8. Ren, M., Kornblith, S., Liao, R., Hinton, G.: Scaling forward gradient with local losses. arXiv
preprint arXiv:2210.03310 (2022)
9. Lillicrap, T., Cownden, D., Tweed, D., Akerman, C.: Random synaptic feedback weights
support error backpropagation for deep learning. Nat. Commun. 7(1), 13276 (2016)
10. Welling, M., Williams, C., Agakov, F.: Extreme components analysis. Adv. Neural Inf.
Process. 16 (2003)
11. Kendall, J., Pantone, R., Manickavasagam, K., Bengio, Y., Scellier, B.: Training end-toend
analog neural networks with equilibrium propagation. arXiv preprint arXiv:2006.01981
(2020)
12. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
13. Lillicrap, T., Cownden, D., Tweed, D., Akerman, C.: Synaptic feedback weights support error
backpropagation for deep learning. Nat. Commun. 7 (2016)
14. Lillicrap, T.P., Santoro, A., Marris, L., Akerman, C.J., Hinton, G.: Backpropagation and the
brain. Nat. Rev. Neurosci. 21(6), 335–346 (2020)
15. Löwe, S., O’Connor, P., Veeling, B.: Putting an end to end-to-end: gradient-isolated learning
of representations. Adv. Neural Inf. Process. 32 (2019)
16. Rao, R., Ballard, D.: Predictive coding in the visual cortex: a functional interpretation of some
extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87 (1999)
17. Richards, B.A., Lillicrap, T.P.: Dendritic solutions to the credit assignment problem. Curr.
Opin. Neurobiol. 54, 28–36 (2019)
18. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organisation
in the brain. Psychol. Rev. 65(6), 386 (1958)
19. Scellier, B., Bengio, Y.: Equilibrium propagation: bridging the gap between energy-based
models and backpropagation. Front. Comput. Neurosci. 11 (2017)
20. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive
coding. arXiv preprint arXiv:1807.03748 (2018)
21. Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. 2672–2680 (2014)
22. Grathwohl, W., Wang, K.-C., Jacobsen, J.-H., Duvenaud, D., Norouzi, M., Swersky, K.: Your
classifier is secretly an energy based model and you should treat it like one. arXiv preprint
arXiv:1912.03263 (2019)
23. Grill, J.-B., et al.: Bootstrap your own latent: a new approach to self-supervised learning.
arXiv preprint arXiv:2006.07733 (2020)
24. Guerguiev, J., Lillicrap, T.P., Richards, B.A.: Towards deep learning with segregated dendrites
(2017)
25. Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for
unnormalised statistical models. In: Proceedings of the Thirteenth International Conference
on Artificial Intelligence and Statistics, pp. 297–304 (2010)
26. Hinton, G.E., Sejnowski, T.J.: Learning and relearning in Boltzmann machines. Parallel
Distrib. Process.: Explor. Microstruct. Cogn. 1(282–317), 2 (1986)
Texture Feature Extraction Using Local Optimal
Oriented Pattern (LOOP)
Abstract. Various descriptors are preferred to extract the Local features of the
image, including Local Binary Pattern, Local Directional Pattern, and Local Opti-
mal Oriented Pattern. This paper provides the comparative analysis of LBP and
Local Optimal Oriented Pattern (LOOP) descriptors for local feature extraction,
further used for various applications. While tracking an object from a video, the
provided input video sampled into the subsequent frames. For the removal of noise
and enhance the frame’s contrast, Median filter is applied on each of the frames.
Local features of the image extracted using the Local Optimal Oriented Pattern
(LOOP) from these filtered images. The results of LOOP descriptor compared with
the Local Binary Pattern (LBP) in terms of histogram and execution time. Exper-
imental analysis shows comparison with the specified feature extraction method
in terms of the execution time and accuracy.
1 Introduction
Object detection has become a most attractive research topic for researchers in recent
years, closely related to video analysis and image feature extraction. Object occlusion,
multiple objects with same color and degradation or changes in luminance, are some of
the issues that generate more challenges in front of the object trackers. There is a need of
an effective algorithm that extracts features from the image irrespective of illumination
changes and object occlusion. For the extraction of the Global features we are considering
the whole image. However, the local features are extracted from the local regions of the
image [8]. An image’s local features include color, spatial, shape, and texture features.
The texture feature is widely used in various applications like facial feature extraction,
image matching, pedestrian and vehicle tracking, biomedical analysis, etc. For extracting
the texture features, Local Binary Pattern, Median Binary Pattern, Adaptive LBP, LBP
variance, Local Directional Pattern, and many more methods which are the advances in
LBP methods were used earlier.
Texture feature extraction is a crucial step in image analysis and computer vision
applications. It involves the process of capturing essential information from images to
characterize their texture patterns. One advanced technique for texture feature extraction
is Local Optimal Oriented Pattern (LOOP). LOOP is a method that focuses on capturing
the local orientation information of texture patterns in an image.
In the context of texture analysis, understanding the orientation patterns within an
image is vital for various tasks such as object recognition, image segmentation, and tex-
ture classification. Traditional texture analysis methods often struggle to capture intricate
details and local orientations effectively. This is where LOOP comes into play.
LOOP operates by computing the optimal orientations of local image patches and
forming patterns based on these orientations. By considering the local optimal orienta-
tions, LOOP can represent texture patterns more accurately, especially in cases where
textures exhibit complex and fine-grained structures. These patterns can then be used as
feature vectors for further analysis and classification tasks.
In this discussion, we will explore the fundamentals of texture feature extraction
using Local Optimal Oriented Pattern. We will delve into the methodology, its appli-
cations, and how it enhances the accuracy and reliability of texture analysis in various
real-world scenarios. Understanding the intricacies of LOOP is essential for researchers,
engineers, and practitioners working in the fields of computer vision, image processing,
and pattern recognition, as it opens up new possibilities for improving the analysis of
textured images.
Chakraborti et al. proposed the binary descriptor LOOP i.e. Local Optimal Oriented
Pattern is most widely used for texture classification is an enhancement over the LBP
and LDP and their variants [4]. This paper describes the Local Optimal Oriented Pattern
method that overcomes the drawbacks of the Local Binary Pattern and the Local Direc-
tional Pattern. The proposed method’s performance is tested through experiments, and
the results are compared with LBP to find effectiveness.
The motivation behind the development of Local Optimal Oriented Pattern (LOOP)
stems from the limitations of these traditional methods. Conventional techniques, such as
statistical approaches and co-occurrence matrices, may fail to capture local variations and
orientation information effectively. In real-world scenarios, textures can exhibit intricate
directional patterns, subtle variations, and irregularities that are crucial for accurate
analysis. Failing to capture these nuances can lead to misclassifications and inaccurate
results in applications like medical image analysis, remote sensing, and industrial quality
control.
LOOP addresses this gap by focusing on capturing local orientation information
within texture patterns. The motivation behind LOOP lies in its ability to accurately
represent texture patterns by identifying the optimal orientations at a local level. Unlike
traditional methods that often overlook subtle directional cues, LOOP excels in capturing
fine details, ensuring a more nuanced and precise characterization of textures.
Rest of the sections describes key fundamentals of Local Optimal Oriented Pattern,
Methodology used to apply LOOP descriptors on the images, its Comparison with exist-
ing methods, Experimental Results in terms of histogram and execution time required
to generate images after applying LOOP descriptor, Conclusion and References.
Texture Feature Extraction Using Local Optimal Oriented Pattern (LOOP) 409
Local Optimal Oriented Pattern (LOOP) is an advanced texture feature extraction tech-
nique designed to capture intricate local orientation information within texture pat-
terns [3]. Understanding the fundamental concepts of LOOP is crucial for its effective
application in texture analysis. Here are the key fundamentals of LOOP:
LOOP operates by computing local orientations within small image patches. These
orientations are essential for capturing the directional information present in textures.
Techniques such as gradient analysis or Gabor filters are commonly employed to compute
the orientations of image gradients within the localized regions. By analyzing gradients,
LOOP identifies the dominant orientations, forming the basis for subsequent processing
steps.
One of the distinctive features of LOOP is its focus on optimal orientations. Instead of
considering all computed orientations, LOOP selects the orientations with the highest
contrast or energy within the local neighborhood. These optimal orientations represent
the most significant directional cues in the texture pattern.
Based on the optimal orientations, LOOP forms patterns that encode the local texture
structure. Patterns are created by comparing the orientations of neighboring pixels or
image features. The arrangement of these orientations is translated into unique codes or
descriptors, forming characteristic patterns for each localized region. These patterns are
designed to be robust against variations in lighting conditions, noise, and other imaging
challenges, making them suitable for diverse real-world applications.
The generated patterns are encoded into feature vectors, which serve as compact repre-
sentations of the local texture information. Various encoding schemes, such as binary
encoding or histograms, are employed to convert the patterns into numerical feature
vectors. These feature vectors encapsulate the essential texture characteristics and can
be used for tasks such as texture classification, object recognition, and image retrieval.
The encoded features effectively capture the nuances of the texture patterns, enabling
accurate analysis and classification.
410 S. V. Sokashe-Ghorpade and S. A. Pardeshi
LOOP is highly adaptable and allows for the adjustment of parameters to suit specific
application requirements. Parameters such as patch size, threshold values for optimal
orientation selection and encoding methods can be fine-tuned. This adaptability ensures
that LOOP can handle a wide range of textures and imaging conditions. By optimiz-
ing these parameters, practitioners can tailor LOOP to achieve optimal performance in
different contexts.
In summary, the fundamentals of Local Optimal Oriented Pattern involve the com-
putation of local orientations, selection of optimal orientations, formation of distinctive
patterns based on these orientations, and encoding of patterns into feature vectors. This
methodology, with its emphasis on local orientation details and adaptability, empowers
LOOP to excel in capturing fine-texture patterns, making it a valuable tool for advanced
texture analysis in the field of computer vision and image processing.
3 Methodology
Applying Local Optimal Oriented Pattern (LOOP) on images extracted from videos
involves a series of steps to process the frames and extract texture features. Here’s a
methodology to apply LOOP on images extracted from video data:
The flow diagram of the proposed method is depicted in Fig. 1. When it comes to
object tracking, having diverse and challenging video sequences is crucial for testing
and evaluating tracking algorithms. MOT, VOT, TrackingNet, LaSOT, OBT, are some
Video datasets suitable for object tracking tasks. Datasets must align with the specific
challenges that tracking algorithm needs to address, such as occlusions, scale variations,
fast motion, and illumination changes.
Texture Feature Extraction Using Local Optimal Oriented Pattern (LOOP) 411
The foremost step is to collect the frames from the input videos available in the
database. A video is a series of moving images, displayed in a sequential manner, typ-
ically accompanied by audio that creates the illusion of continuous motion. Videos are
composed of scenes, shots, and frames. Scenes are made up of one or more shots and are
often defined by changes in time, location, or characters. A shot is a continuous sequence
of frames captured by a camera without interruption. It represents a single view or cam-
era angle during a specific duration. A frame is a single still image in a sequence of
frames that make up a video. These frames are displayed rapidly in succession to create
the illusion of motion.
In Fig. 2, the mechanism of video-to-frame conversion is depicted. The input video
is fed to the video frame extraction block, which divides the video into the number of
frames.
The next step is preprocessing of each image. Preprocessing includes the contrast
adjustment of the image and removal of noise. A median filter enables the removal of
noise while preserving edges. Preprocessed image is further used for texture analysis.
Texture analysis is carried out using Local Optimal Oriented Pattern, and the results are
compared with Local Binary Pattern.
Local Binary Pattern (LBP) is a popular texture descriptor used in computer vision and
image analysis. It was first introduced by Ojala et al., initially designed for monochrome
still images and further extended for color images.
LBP operates on grayscale images. For each pixel in the image, a binary code is
generated by comparing the pixel’s intensity value with the intensities of its neighboring
pixels. The comparison is done using threshold value. If a neighboring pixel’s intensity is
412 S. V. Sokashe-Ghorpade and S. A. Pardeshi
greater than or equal to the central pixel’s intensity, it is assigned a value of 1; otherwise,
it is assigned a value of 0. After comparing the central pixel with its neighbors, a binary
pattern is formed. This binary pattern is read in a clockwise or counterclockwise manner
to create a binary number. LBP calculates a histogram of these binary patterns over a local
neighborhood. The histogram represents the distribution of different texture patterns in
the image region under consideration [2].
The intensity value at pixel (xt , yt ) in an image I is denoted as it and im represents
the intensity value of a pixel in the 3 × 3 neighborhood around the center pixel (xt , yt )
where m = 0,1,2,….7 [11–13].
The Local Binary Pattern (LBP) value of the central pixel is calculated using Eq. (1)
and (2):
7
(xt , yt ) = P(im − it ) · 2m (1)
m=0
where,
1 x≥0
P(x) = (2)
0 otherwise
One of the significant disadvantages of LBP is that it is a rotation variant. Basic LBP
does not inherently handle rotation variations in texture patterns [10]. Although rotation-
invariant variants have been proposed, they can be more complex and computationally
intensive. There is no proper rule for assigning the starting pixel and sequence of their
subsequent binary weights. Change in starting pixel will result in different LBP values
for the same image.
Figure 3 shows the eight Kirsch masks in different directions. Each of the neighboring
pixel intensities im oriented in the direction of the response of eight Kirsch masks is then
compared with it as shown in Eq. (3), which returns either 1 or 0 according to Eq. (2).
This process forms an eight-digit binary number. As per the rank of the magnitude of
km among the outputs of eight Kirsch masks, weights are assigned to each calculated
binary number [4].
5 Experimental Results
Initially, the video was segmented into individual frames, and noise reduction was per-
formed using a median filter applied to each frame. Subsequently, LBP was applied to
the filtered images. The results obtained from LBP were then compared with those from
the LOOP descriptor based on histogram analysis. The findings from both LBP and
LOOP methods are presented and summarized in Fig. 4.
Figure 4A shows the original image of the road traffic. Figure 4B shows the image
after applying median filter. Both the images seems same but the histograms of respective
images shown in Fig. 4C and Fig. 4D depicts the difference between these images.
Figure 4E gives the image after applying LBP descriptor and Fig. 4F gives the image
after applying the LOOP descriptor. Figure 4G and Fig. 4H images are the histograms
of LBP and LOOP images. Results indicate that LOOP results are rotational invariant.
Histograms of LBP and LOOP images are compared with four different measures
Correlation, Chi-square, Intersection and Bhattacharyya. The results are summarized in
the Table 1. Correlation measures the statistical relationship between two variables. It
indicates whether and how strongly pairs of variables are related. The chi-square test
is a statistical test used to determine if there is a significant association between two
categorical variables in a dataset. It compares the expected frequencies of different cat-
egories in the dataset with the observed frequencies and calculates a chi-square statistic.
Intersection generally refers to the common elements or data points that exist in two
414 S. V. Sokashe-Ghorpade and S. A. Pardeshi
Fig. 4. Original Image B) Filtered Image C) Histogram of original Image D) Histogram of filtered
Image E) LBP Image F) LOOP Image G) Histogram of LBP Image H) Histogram of LOOP Image
or more sets. Bhattacharyya distance is a measure of the similarity between two prob-
ability distributions. In the context of histogram matching, Bhattacharyya distance is
used to compare the similarity between two histograms. Minimizing this distance helps
in aligning or matching histograms for various image processing and computer vision
applications. For Correlation and Intersection measures, higher the metric, more accu-
rate the match and for Chi-square and Bhattacharyya measures, less the results, better
the match [9]. The results are as expected. Correlation and Intersection measures provide
the less metric while Chi-square and Bhattacharyya measures provide the higher match
indicating in LOOP images more intensity values.
The execution time required for performing LOOP on a single image, 5 images and
10 images, respectively, using LBP and LOOP methods is summarized in Table 2. This
execution time in seconds is calculated using Python Programming.
Texture Feature Extraction Using Local Optimal Oriented Pattern (LOOP) 415
The result shows that the execution time of the LOOP is much less than the LBP
method.
Histogram measures and execution time is calculated using PyCharm Community
Edition 2020.2.1 x64.
6 Conclusion
In conclusion, the comparative study of Local Binary Pattern (LBP) and Local Optimal
Oriented Pattern (LOOP) revealed valuable insights into their respective strengths and
limitations in various applications, particularly in the context of object detection in road
traffic videos.
LBP, a well-established texture descriptor, demonstrated its robustness and effective-
ness in capturing local patterns in images. Its simplicity and computational efficiency
make it suitable for real-time applications. However, LBP’s performance is influenced by
noise and lacks spatial information, which might impact its accuracy in certain scenarios.
On the other hand, LOOP, an enhancement over LBP, showcased superior perfor-
mance, especially in situations where noise and variations in illumination posed chal-
lenges. Its ability to capture orientation-independent patterns and consider the strength
of intensity variations through Kirsch masks contributed to more accurate and reliable
results. LOOP’s consideration of local patterns in a broader context, coupled with its
computational efficiency, made it a promising choice for object detection tasks, especially
in complex environments like road traffic.
Ultimately, the choice between LBP and LOOP depends on the specific requirements
of the application. While LBP offers simplicity and efficiency, LOOP provides enhanced
accuracy and robustness, making it a preferred option in situations where intricate pattern
recognition and noise tolerance, are crucial. The findings of this comparative study
underline the importance of selecting the most suitable texture descriptor based on the
specific challenges and goals of the given task in computer vision applications.
References
1. NarainPonraj, D., Christy, E., Aneesha, G., Susmitha, G., Sharu, M.: Analysis of LBP and
LOOP based textural feature extraction for the classification of CT lung images. In: 2018
4th International Conference on Devices, Circuits and Systems (ICDCS), pp. 309–312. IEEE
(2018)
416 S. V. Sokashe-Ghorpade and S. A. Pardeshi
2. Ke-Chen, S., Yun-Hui, Y.A.N., Wen-Hui, C.H.E.N., Zhang, X.: Research and perspective on
local binary pattern. Acta Automatica Sinica 39(6), 730–744 (2013)
3. Aminu, A.A., Agwu, N.N.: General purpose image tempering detection using convolutional
neural network and local optimal oriented pattern. Signal Image Process.: Int. J. 12(2), 13–32
(2021)
4. Chakraborti, T., McCane, B., Mills, S., Pal, U.: LOOP descriptor: local optimal-oriented
pattern. IEEE Signal Process. Lett. 25(5), 635–639 (2018)
5. Hadid, A.: The local binary pattern approach and its applications to face analysis. In: First
Workshops on Image Processing Theory, Tools and Applications (IPTA) - Sousse, Tunisia.
IEEE (2008)
6. Karis, M.S.: Local binary pattern (LBP) with application to variant object detection: a survey
and method. In: IEEE 12th International Colloquium on Signal Processing & Its Applications.
Melaka, Malaysia, pp. 221–226 (2016)
7. Ayami, Y.M., Shabat, A.: An acceleration scheme to the local directional pattern (2018). arXiv
preprint arXiv:1810.11518
8. Jabid, T., Kabir, M.H., Chae, O.: Local directional pattern (LDP) for face recognition. In:
2010 Digest of Technical Papers International Conference on Consumer Electronics (ICCE),
pp. 329–330. IEEE (2010)
9. Marin-Reyes, P.A., Lorenzo-Navarro, J., Castrillón-Santana, M.: Comparative study of
histogram distance measures for re-identification (2016). arXiv preprint arXiv:1611.08134
10. Subramanian, P., Ramesh, G.P.: Diagnosis of keratoconus with corneal features obtained
through LBP, LDP, LOOP and CSO. Trends Sci. 18(20), 22 (2021)
11. Hassaballah, M., Alshazly, H.A., Ali, A.A.: Robust local oriented patterns for ear recogni-
tion. Multimed. Tools Appl. 79, 31183–31204 (2020). https://doi.org/10.1007/s11042-020-
09456-7
12. Mame, A.B., Tapamo, J.R.: Parameter optimization of histogram-based local descriptors for
facial expression recognition. PeerJ Comput. Sci. 9, e1388 (2023)
13. Pietikäinen, M., Hadid, A., Zhao, G., Ahonen, T.: Local binary patterns for still images. In:
Computer Vision Using Local Binary Patterns. Computational Imaging and Vision, vol. 40,
pp. 13–47. Springer, London (2011). https://doi.org/10.1007/978-0-85729-748-8_2
Feature Fusion and Early Prediction of Mental
Health Using Hybrid Squeeze-MobileNet
Abstract. Mental health is the main factor which is affected by stress, disease
and sarcastic statements or people comments. It effects on persons health directly
or indirectly. People cannot share or discuss about their mental condition, even
they can’t talk about it. Firstly, they cannot accept that they are suffering mental
illness. It is very necessary to predict the mental health of a Pearson in early
stage. There is the need to use new strategies for diagnosis and daily monitoring
of the mental health conditions. The goal of our research is to develop a module
based on feature fusion, which will be performed based on Soergel metric and
Deep Kronecker Network (DKN) and early prediction of mental health utilizing
Squeeze-MobileNet. It improves accuracy without sacrificing the model efficiency.
Particle swarm cuckoo search (PS-CS) is effective and capable to capture the
unpredictability of data. We got F1 score and validation score of NN is good as
compare to ML.
1 Introduction
bullying [3, 5, 18, 24]. Around 20% of the world population is facing mental health
problems [11]. Due to a mental health problem, a person faces physical and emotional
health issues. It may affect the human brain with various disorders like bipolar disorder,
depression, schizophrenia, and attention-deficit hyperactivity. The people who are having
serious metal issues are very sick or at higher risk [7]. The system or machine that will
detect human mental conditions is the most important health concern in the world.
As per the report of the World Health Organization (WHO), depression is a universal
mental disorder that affects most people, irrespective of their age. While doing the
research, there are many boundaries to recognizing depression and treatment for it. Due
to the unavailability or lack of experts in this domain, social embarrassing and wrong
or late diagnosis [12]. The mental health check is done by taking the Patient Health
Questionnaire (PHQ-9) test or by asking subjective questions. This type of evaluation
takes more time to diagnose. It is a very time-consuming and difficult task [5, 16–19].
It is very necessary to detect mental health issues at an early stage to understand the
problems. This is the only disease that doesn’t require any laboratory or medical tests.
It only requires the questionnaire and the response from the patient to that specific
questionnaire.
To identify the human actions of performance and how that person interacts with
people [7]. People who are depressed can manage their symptoms and avoid more health
problems by analyzing and taking treatment after identifying such signs of depression
at the first stage [2]. Researchers did their research by taking people’s tweets and their
responses to detect human mental conditions at a prior stage [8]. To cure depression in
the prior stage, it has a higher rate, but it still needs to cure. To overcome this problem,
it is necessary to predict and prepare for the risk of depression. Therefore, we need to
prepare a model that will collect people’s daily routines, like their diet, stress, family
relations, and work status, and analyze the correlation between depression and the factors
based on the collected information [1]. In the health sector, deep learning has achieved
success. To detect mental health, the available and collected data and the evaluation of
mental health are rare. To work on psychological problems such as diagnosing, detecting,
and monitoring human health, use the combined model of CNN and LSTM [20]. The
LSTM technique is better than the recurrent neural network (RNN). It is very useful
for the classification of time series signals [5]. The DL can handle various issues, like
natural language processing tasks and the current sentiment analysis. The DL model
works on mental health problems using a dataset available on social media [14, 15]. Due
to new technology and the changes in society, researchers carry a huge burden. They
require a relationship between traditional and new technologies. The result of this is
that traditional health care systems are not efficient and effective enough to handle new
trends [3]. Hence, DL techniques are good for predicting mental health between two
individuals—those who require help online and are not aware of their condition [10].
A wearable device is used to detect issues in mental health, with a view to progressing
research in this field [4].
Feature Fusion and Early Prediction of Mental Health 419
The LR classifier training data gives a good result in NN, but the test data gives a
good result in ML. The XGBoost classifier, Decision Tree classifier, and RF classifier
and Support vectore classifier SVC give the same results for the F1 score and validation
score for test and train data using ML and NN.
2 Literature Survey
The literature review based on early prediction of mental health by reviewing several
existing modules is discussed in this section.
Baek, J.W., and Chung, K. [1] developed the Context Deep Neural Network (Context-
DNN) method, which is used to predict the risks of depression. For prediction, a multiple
regression model is used. To identify depression correctly, we need to evaluate the person.
To prevent depression, we need to monitor correctly. There is a problem that they work
on common issues only, not individual issues.
Jawad, K., et al. [2] suggested the particle swarm-cuckoo search (PS-CS) optimiza-
tion method. The developer combines these two algorithms to get more accurate results.
Deep learning models depend more on training and validation loss, which is a very diffi-
cult part of DL. This method verified a well-adjusted part of short training and validation
loss. The problem is that we are unable to incorporate multimodal data sources.
Zeberga, K., et al. [3] the inventor uses deep learning methods such as the Bidirec-
tional Long Short-Term Memory (Bi-LSTM) algorithm and response-based knowledge
transfer using BERT. This method is used to improve the intelligent health care system by
detecting human mental health problems. The author focuses on converting unstructured
data collected from social sites into meaningful data. This module has high accuracy.
But the problem is that the text in the given file was not represented by a bag of words.
Coutts, L.V., et al. [4] Developers find that most of the health factors basically depend
on heart rate changeability. Every health factor needs to use the identifier to train the
deep neural network, which is LSTMs. Each health measure’s heart rate variable (HRV)
data is unique. This developer takes three data measures: time, frequency, and HRV. The
wearable device continuously taps the above measures and predicts mental health. The
problem is that no optimization has been done for this combined approach.
Shafiei, S.B., et al. [5] using the Keras dataset, the developer implemented a convo-
lutional neural network and long short-term memory (CNN-LSTM). This module was
capable of being integrated into applications for household mental health monitoring
to be used by patients after oncological operations to recognize patients in danger. The
drawback of this model is that it is not validated for objective and distant monitoring of
mental health.
420 V. G. Kshirsagar et al.
Zhang, Z., et al. [6] to identify the inconsistency and hardness of audio-visual-textual
modes. The developer divides these modes into two groups. The first group is all audio-
visual modes, which work on frame level. Second textual mode, which is managed at the
session level, This approach was effective for multimodal representation knowledge and
the ability to generalize transversely dissimilar mental disorders. It did not investigate the
semantic border between all audio-visual-textual modalities to address the discrepancy
and granularity and compare it with the early fusion strategy in the current framework.
Hassantabar, S., et al. [7] proposed a new model named Mental Health Deep Neural
Networks (MHDeep-DNN). This model can take data from sensors that are present in
wearable devices. Give the results based on three psychological conditions, such as mood
type, most depressing, and mania. The MHDeep strategy was employed for pervasive
diagnosis and daily monitoring. The problem is that it achieved high computational
complexity.
Kour, H., and Gupta, M.K. [8] The CNN-BiLSTM combined hybrid model is used
to achieve less sensitivity. CNN is for images, and LSTM works for text. This developer
works on both image and text data. This scheme achieves less sensitivity. This approach
did not sense other psychological illnesses in combination with depression to capture
complex mental issues permeating an individual’s life.
3 Proposed Methodology
Many people around the world are facing problems with mental health. There is a chal-
lenge to diagnosing the mental health problem. It is a difficult problem that relies on
questionnaires, self-reporting, behavior, and social interaction. There is a necessity for
new strategies for the diagnosis of mental health and the daily monitoring of mental
health conditions. The primary goal of this study is to plan and build a module based on
feature fusion and early prediction of mental health using Squeeze-Mobilenet. Squeeze-
Mobilenet will be merged by the amalgamation of SqueezeNet [22] and Mobilenet [23].
Initially, the input data acquired from the database [21] will be subjected to outlier
detection to detect the outlier in the database usually used in data analysis, which will
be performed by employing holo entropy. After that, outlier removal will be performed
to ensure the quality of the data. Thereafter, a pre-processing phase will be conducted
to remove the clatter present in the data by utilizing missing data imputation [25] and
z score normalization [24]. Then, the pre-processed image will be forwarded to feature
fusion, which will be performed based on the Soergel metric and the Deep Kronecker
Network (DKN) [26]. Moreover, the fused feature will allow for data augmentation,
which will be done by bootstrapping. Finally, the early prediction of mental health
will be accomplished by utilizing Squeeze-Mobilenet, where the layers will be modi-
Feature Fusion and Early Prediction of Mental Health 421
fied. Moreover, the implementation of the proposed strategy will be carried out using a
Python tool using simulation. In addition, performance metrics will be utilized for early
prediction of mental health, namely accuracy, sensitivity, and specificity. Furthermore,
the proposed model will be compared with the current procedures in order to reveal the
effectiveness of the proposed method.
4 System Architecture
Input
data
Pre-
processing
Feature
Outlier detection Missing Fusion
Outlier data
Holo imputation Soergel
entropy removal
z metric
Score
normaliz-
ation
Predicted
Fig. 1. Block diagram of feature fusion and early prediction of mental health
5 Results
Comparison of machine learning and neural networks for different classifiers The result
is stored in the form of training data, test data, validation data, and test data (Fig. 2).
422 V. G. Kshirsagar et al.
100
80
60
40
20
0
Training (%) Test (%) Validaon (%) F1 score (%)
The RF classifier gives the same results for training, testing, and validation. But the
F1 score of the NN model is better in the case of the RF classifier (Fig. 3).
Logisc Regression
120
100
80
60
40
20
0
Training (%) Test (%) Validaon (%) F1 score (%)
The LR classifier training data gives a good result in NN, but the test data gives a
good result in ML. The validation score and F1 score of NN are good as compared to
ML (Fig. 4).
Feature Fusion and Early Prediction of Mental Health 423
Decision Tree
120
100
80
60
40
20
0
Training (%) Test (%) Validaon (%) F1 score (%)
The decision tree classifier training, test validation, and F1 score of NN are good as
compared to ML (Fig. 5).
100
80
60
40
20
0
Training (%) Test (%) Validaon (%) F1 score (%)
The SVC classifier training, test validation, and F1 score of NN are good as compared
to ML (Fig. 6).
424 V. G. Kshirsagar et al.
XGBoost Classifier
120
100
80
60
40
20
0
Training (%) Test (%) Validaon (%) F1 score (%)
The XGBoost classifier Tree classifier training, test validation, and the F1 score of
NN are good as compared to ML.
6 Conclusion
In this study, we have worked on all machine learning algorithms and obtained accuracy
and an F1 score. Random forest, logistic regression, decision tree, support vector, and
extreme gradient boosting. It is observed that, as compared to ML algorithms, neural
networks give the best results. If we use optimization algorithms, we will get better
results as compared to ML and NN.
In the future, if we use optimization algorithms such as ant colony cuckoo algorithms,
we will get better results. The XGBoost classifier tree classifier training, test validation,
and F1 score of NN are good as compared to ML.
References
1. Baek, J.W., Chung, K.: Context deep neural network model for predicting depression risk
using multiple regression. IEEE Access 8, 18171–18181 (2020)
2. Jawad, K., Mahto, R., Das, A., Ahmed, S.U., Aziz, R.M., Kumar, P.: Novel Cuckoo search-
based metaheuristic approach for deep learning prediction of depression. Appl. Sci. 13(9),
5322 (2023)
3. Zeberga, K., Attique, M., Shah, B., Ali, F., Jembre, Y.Z., Chung, T.S.: A novel text mining
approach for mental health prediction using Bi-LSTM and BERT model. Comput. Intell.
Neurosci. (2022)
Feature Fusion and Early Prediction of Mental Health 425
4. Coutts, L.V., Plans, D., Brown, A.W., Collomosse, J.: Deep learning with wearable based
heart rate variability for prediction of mental and general health. J. Biomed. Inform. 112,
103610 (2020)
5. Shafiei, S.B., Lone, Z., Elsayed, A.S., Hussein, A.A., Guru, K.A.: Identifying mental health
status using deep neural network trained by visual metrics. Transl. Psychiatry 10(1), 430
(2020)
6. Zhang, Z., Lin, W., Liu, M., Mahmoud, M.: Multimodal deep learning framework for men-
tal disorder recognition. In: Proceedings of 2020 15th IEEE International Conference on
Automatic Face and Gesture Recognition (FG 2020), pp. 344–350 (2020)
7. Hassantabar, S., Zhang, J., Yin, H., Jha, N.K.: Mhdeep: mental health disorder detection
system based on wearable sensors and artificial neural networks. ACM Trans. Embed. Comput.
Syst. 21(6), 1–22 (2022)
8. Kour, H., Gupta, M.K.: An hybrid deep learning approach for depression prediction from
user tweets using feature-rich CNN and bi-directional LSTM. Multimed. Tools Appl. 81(17),
23649–23685 (2022)
9. Sairam, U., Voruganti, S.: Mental health prediction using deep learning. Int. J. Res. Appl. Sci.
Eng. Technol. 10 (2022)
10. Ameer, I., Arif, M., Sidorov, G., Gòmez-Adorno, H., Gelbukh, A.: Mental illness classification
on social media texts using deep learning and transfer learning. arXiv preprint arXiv:2207.
01012 (2022)
11. Steel, Z., et al.: The global prevalence of common mental disorders: a systematic review and
meta-analysis 1980–2013. Int. J. Epidemiol. 43(2), 476–493 (2014)
12. Shrestha, K.: Machine learning for depression diagnosis using twitter data. Int. J. Comput.
Eng. Res. Trends 5(2) (2018)
13. Tadesse, M.M., Lin, H., Xu, B., Yang, L.: Detection of suicide ideation in social media forums
using deep learning. Algorithms 13(1), 7 (2019)
14. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
15. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text
classification algorithms: a survey. Information 10(4), 150 (2019)
16. Hann, D., Winter, K., Jacobsen, P.: Measurement of depressive symptoms in cancer
patients: evaluation of the Center for Epidemiological Studies Depression Scale (CES-D).
J. Psychosom. Res. 46(5), 437–443 (1999)
17. Manne, S., Schnoll, R.: Measuring cancer patients’ psychological distress and well-being: a
factor analytic assessment of the Mental Health Inventory. Psychol. Assess. 13(1), 99 (2001)
18. Cassileth, B.R., Lusk, E.J., Strouse, T.B., Miller, D.S., Brown, L.L., Cross, P.A.: A
psychological analysis of cancer patients and their next-of-kin. Cancer 55(1), 72–76 (1985)
19. Mystakidou, K., Tsilika, E., Parpa, E., Galanos, A., Vlahos, L.: Brief cognitive assessment
of cancer patients: evaluation of the mini-mental state examination (MMSE) psychometric
properties. Psycho-Oncol. J. Psychol. Soc. Behav. Dimensions Cancer 16(4), 352–357 (2007)
20. Oh, S.L., Ng, E.Y., San Tan, R., Acharya, U.R.: Automated diagnosis of arrhythmia using
combination of CNN and LSTM techniques with variable length heart beats. Comput. Biol.
Med. 102, 278–287 (2018)
21. Mental health in tech survey. https://www.kaggle.com/datasets/osmi/mental-health-in-tech-
survey. Accessed Sept 2023
22. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet:
AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv preprint
arXiv:1602.07360 (2016)
23. Chen, H.Y., Su, C.Y.: An enhanced hybrid MobileNet. In: Proceedings of 2018 9th
International Conference on Awareness Science and Technology (iCAST), pp. 308–312
(2018)
426 V. G. Kshirsagar et al.
24. Fei, N., Gao, Y., Lu, Z., Xiang, T.: Z-score normalization, hubness, and few-shot learning. In:
Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 142–151
(2021)
25. Khan, S.I., Hoque, A.S.M.L.: SICE: an improved missing data imputation technique. J. Big
Data 7(1), 1–21 (2020)
26. Feng, L., Yang, G.: Deep Kronecker network. arXiv preprint arXiv:2210.13327 (2022)
Exploring the Usability of Quantum
Machine Learning for EEG Signal
Classification
1 Introduction
BCIs (Brain-Computer Inte¬rfaces) is an upcoming technology that stands to
form a relationship between the human brain and computers [4]. One of the pri-
mary applications of BCIs is in assistive technology for people with disabilities
or neurodegenerative diseases. BCIs provide a way to set up a direct communica-
tion channel between the brain and external devices [3] and can identify a user’s
intended commands or actions based on the brain activity patterns. Machine
learning methods are primarily employed for BCI implementation [2]. However,
the brain signals are nonlinear and highly unstructured in nature. Hence con-
ventional ML algorithms may prove insufficient and computationally demanding
[8]. Due to high computational costs, the porting of these techniques on edge
devices also proves challenging. To that end, in this work, we propose to explore
the use of Quantum Machine Learning for the classification of the BCI-generated
EEG signals [5–7].
Both traditional ML and QML techniques for classification tasks are tested.
Their performance is compared based on usability, and efficacy. The findings will
guide¬ the development of QML-based BCIs to ensure re¬liable and efficient
intention detection. Moreover, cross-disciplinary efforts will aid in creating BCIs
that empower those with disabilities and open doors in cognitive¬ neuroscience
and healthcare.
By merging principles derived from quantum physics and machine learn-
ing, the aim is to enhance the detection of intentions. Proposed are compara-
tive assessments between QML and conventional machine learning algorithms
in BCI applications. The objective is to assess the performance, computational
efficiency, and practicality of QML in intention detection. The objective is to
provide guidance for the future advancement of efficient and dependable QML-
based BCIs. This progress will greatly benefit individuals with disabilities and
drive advancements in multiple industries.
The paper is organized into five sections. Section 2 offers a concise summary of
the dataset and the data preprocessing stage. In Sect. 3, the proposed framework
and ML and QML methods employed in this work are discussed. Section 4 focuses
on experimental results and discussion followed by Sect. 5 which concludes the
paper.
with various cognitive and physiological processes. The study of cognitive pro-
cesses, event-related potentials (ERPs), neurofeedback, and the diagnosis and
monitoring of neurological illnesses are just a few of the many clinical and scien-
tific settings where EEG is used [20]. Figure 1 are the actual EEG readings taken
256 times per second. Figure 2 depicts the extracted features from each segment
of 256 values in the whole dataset. Figure 3 visualizes the EEG signal values,
showcasing the characteristic electrical patterns recorded during cognitive tasks.
In the data preprocessing stage of this intention detection pipeline, the focus was
on enhancing the quality and reliability of EEG signals. To achieve this, band-
pass filtering [12] was applied to eliminate unwanted noise and artefacts while
preserving the frequency components relevant to the intended brain activity.
Depending on the properties of the EEG signals, the band-pass filtering pro-
cedure required choosing the right parameters, such as the lower and upper
cutoff frequencies. It was essential to select the right filter design to obtain the
intended frequency response and reduce artefacts or distortions. Due to its con-
sistent performance and smooth frequency response, the Butterworth filter was
used in our implementation [12].
The filter parameters are carefully adjusted to reduce any signals that fall
outside the desired range. These parameters consist of a cutoff frequency of
0.1 Hz, an upper cutoff frequency of 30 Hz, a sampling frequency of 1000 Hz and
a pad length of 27. This meticulous adjustment yields a set of EEG signals that
is perfect, for extracting features and conducting further analysis.
Exploring the Usability of Quantum Machine Learning 431
Time Domain
The time-domain feature extraction method is based on the analysis of signals or
data in respect to time [11]. Quantifying a signal’s evolution over time is made
feasible by this method, which is essential when working with EEG data because
these recordings usually consist for several hours at a time. This section will look
at a number of time-domain techniques particular to EEG.
1) Mean: The average electrical activity recorded from the brain establishes a
baseline for EEG data [10]. Studying the central tendency of these signals pro-
vides insight into typical brain function during focus, relaxation, or cognitive¬
tasks. By calculating the mean, researchers ascertain normal brain waves over
time, analyzing fluctuations across different mental states.
2) Variance: The variability (variance) of EEG data provides insight into the
spread of electrical brain activity. Higher variance indicates more fluctuation
in activity, while lower variance¬ suggests more stability. Assessing variance¬
helps detect shifts in brain patterns, including unusual variability associated
with neurological conditions. Specifically, variance¬ measures the deviation
of EEG signals from their average value¬. This statistic comprehensively
depicts the distribution and diversity of electrical rhythms across the brain.
In neurological assessments, variance offers a vital perspective for gauging
stability versus volatility in recordings.
3) Skewness and Kurtosis: Skewness and Kurtosis shows if the brain’s electrical
activity differs from symmetry. Measuring skewness reveals abnormalities.
Kurtosis describes the structure of the¬ brain’s electrical activity. It finds
tiny details and odd e-vents in the signals. Analyzing kurtosis helps classify
brain signals. It also aids in diagnosing brain illnesses
technique brings advantages such as its ability to handle stationary signals, resis-
tance, against noise and reduced spectral leakage.
Instead of relying on analysis techniques that only consider the frequency content
of a signal time frequency domain analysis examines how the frequency compo-
nents change, over time providing a deeper understanding. Firstly, we utilized
Independent Component Analysis (ICA) to prepare the EEG data. Subsequently,
the EEG data was analyzed using the Hilbert Huang Transform (HHT). In order
to distinguish between these underlying sources and locate independent compo-
nents that correlate to various brain activities, we use ICA. After ICA, HHT
was used to further analyze the retrieved independent components.
3 Proposed Framework
In [1], an integration of quantum computing principles with SVM for BCI is pro-
posed. For SVM implementation, the features were extracted using the scaled
and preprocessed data. The 80/20 split for train:test data is used. For identify-
ing the best hyperparameters for the SVM model, Grid-Search was used. SVM
classifier is trained and validated through 5-fold cross- validation to get the opti-
mal set of hyperparameters obtained through the grid search. Our experiments’
outcomes showed that C = 100, gamma = “scale”, and kernel = “linear” was the
best parameter combination for the SVM classifier in sklearn library. It was dis-
covered that these parameter values worked well for assisting our SVM classifier
in achieving high accuracy and generalization capacity (Fig. 4).
Exploring the Usability of Quantum Machine Learning 433
3.3 QSVM
QSVM or Quantum SVM is one of the most popular algorithms used for han-
dling classification tasks. In order to improve classification performance, QSVM
extends the conventional support vector machine (SVM) method to a quan-
tum environment [1]. QSVM improves SVM’s classification powers by utilizing
quantum mechanics concepts like superposition and entanglement. It does this
by using quantum feature spaces, quantum kernels, quantum feature maps and
quantum circuits, which convert classical data into quantum states. Qiskit library
was used to implement QSVM on the dataset. Qiskit offers access to quantum
simulators and real quantum hardware, as well as tools and resources for creating
and running quantum algorithms.
By effectively navigating a huge feature space, parallelism and quantum inter-
ference in quantum computation can speed up the classification process (Fig. 5).
The final part requires us to measure the states of the qubits onto a classical
bit. We then need to define a loss function and update parameters accordingly.
In our qiskit implementation we define the cost function as the average of the
cross-entropy loss of all predictions. The formula for cross entropy is given by:
n
CrossEntropy = − yi log(pi ) (2)
i=1
where:
N
1
M ulticlassSV M Loss = max(0, sj − syi + Δ) (3)
N i=1
j=yi
Suppose we have three classes (A, B, C) and the respective predicted scores
by a model (A: 3.2, B: 2.2, C: 1.2). The true class is A. We calculate the margins:
margin_A = 1.0 (margin between A and B), margin_C = 2.7 (margin between
A and C). Apply loss function: L(A) = max(0, 1.4 − 2.7 + 1) = 0. We calculate
loss for B and C similarly and sum all losses to get the final Multi-class SVM
loss.
The algorithms have been tested on an M1 MacBook Air (2020) with 8
GB of RAM having 4 high-performance cores (3228 MHz) and 4 high-efficiency
(2064 MHz) cores. The platform used was Jupyter Notebook. The programs have
been built and simulated using Qiskit and Pennylane in a Python 3.9.13 envi-
ronment.
Model Classification Class Labels No. of Accuracy F1 Score Training Time Testing
Type Type Features (per sample) Time
SVM Multiclass F1, F2, F3, F4 4 0.6536 0.6083 0.0173 0.0016
Classifier
SVM Multiclass F1, F2, F3, F4 6 0.6241 0.6265 1.5930 0.0020
Classifier
Random Multiclass F1, F2, F3, F4 4 0.6950 0.7304 0.4131 0.0177
Forest
Classifier
Random Multiclass F1, F2, F3, F4 6 0.6950 0.7304 0.4131 0.0177
Forest
Classifier
Model Type Classification Class Labels No. of Accuracy F1 Score Training Time Testing
Type Features (per sample) Time
Qiskit QSVC Multiclass F1, F2, F3, F4 6 60.28 0.603 2.66 5
Custom Model Multiclass F1, F2, F3, F4 4 25 0.19 0.007 0.03
(PennyLane)
Custom Model Multiclass F1, F2, F3, F4 6 31.91 0.21 0.177 0.06
(PennyLane)
Custom Model Binary F1, F2 2 48.8 0.22 0.022 0.007
(PennyLane)
Custom Model Binary F1, F2 6 55 0.44 0.45 0.008
(Qiskit)
References
1. Li, Y., Zhou, R.G., Xu, R., Luo, J., Jiang, S.X.: A quantum mechanics-based frame-
work for EEG signal feature extraction and classification. IEEE Trans. Emerg. Top.
Comput. 10(1), 211–222 (2020)
2. Lotte, F., et al.: A review of classification algorithms for EEG-based brain-
computer interfaces: a 10 year update. J. Neural Eng. 15(3), 031005 (2018)
438 D. Singh et al.
3. Xie, Y., Oniga, S.: A review of processing methods and classification algorithm for
EEG signal. Carpath. J. Electron. Comput. Eng. 13(1), 23–29 (2020)
4. Nicolas-Alonso, L.F., Gomez-Gil, J.: Brain computer interfaces, a review. Sensors
12(2), 1211–1279 (2012)
5. Zhang, Y., Ni, Q.: Recent advances in quantum machine learning. Quantum Eng.
2(1), e34 (2020)
6. Rakotomamonjy, A., Guigue, V.: BCI competition III: dataset II-ensemble of SVMs
for BCI P300 speller. IEEE Trans. Biomed. Eng. 55(3), 1147–1154 (2008)
7. Lal, T.N., et al.: Support vector channel selection in BCI. IEEE Trans. Biomed.
Eng. 51(6), 1003–1010 (2004)
8. Barnova, K., et al.: Implementation of artificial intelligence and machine learning-
based methods in brain-computer interaction. Comput. Biol. Med. 107135 (2023)
9. Benedetti, M., Lloyd, E., Sack, S., Fiorentini, M.: Parameterized quantum circuits
as machine learning models. Quantum Sci. Technol. 4(4), 043001 (2019)
10. Panat, A., Patil, A., Deshmukh, G.: Feature extraction of EEG signals in different
emotional states. In: IRAJ Conference (2014)
11. Singh, A.K., Krishnan, S.: Trends in EEG signal feature extraction applications.
Front. Artif. Intell. 5, 1072801 (2023)
12. Hussin, S.F., Birasamy, G., Hamid, Z.: Design of Butterworth band-pass filter.
Politeknik Kolej Komuniti J. Eng. Technol. 1(1) (2016)
13. Adcock, J., et al.: Advances in quantum machine learning. arXiv preprint
arXiv:1512.02900 (2015)
14. Abohashima, Z., Elhosen, M., Houssein, E.H., Mohamed, W.M.: Classification with
quantum machine learning: a survey. arXiv preprint arXiv:2006.12270 (2020)
15. Khan, T.M., Robles-Kelly, A.: Machine learning: quantum vs classical. IEEE Access
8, 219275–219294 (2020)
16. Amin, H.U., Mumtaz, W., Subhani, A.R., Saad, M.N.M., Malik, A.S.: Classification
of EEG signals based on pattern recognition approach. Front. Comput. Neurosci.
11, 103 (2017)
17. Alam, M.N., Ibrahimy, M.I., Motakabber, S.M.A.: Feature extraction of EEG signal
by power spectral density for motor imagery based BCI. In: 2021 8th International
Conference on Computer and Communication Engineering (ICCCE), pp. 234–237).
IEEE (2021)
18. Bergholm, V., et al.: PennyLane: automatic differentiation of hybrid quantum-
classical computations (2018). arXiv:1811.04968
19. Khosla, A., Khandnor, P., Chand, T.: A comparative analysis of signal process-
ing and classification methods for different applications based on EEG signals.
Biocybern. Biomed. Eng. 40(2), 649–690 (2020)
20. Yi, Y., Billor, N., Liang, M., Cao, X., Ekstrom, A., Zheng, J.: Classification of
EEG signals: an interpretable approach using functional data analysis. J. Neurosci.
Methods 376, 109609 (2022)
21. Rudolph, M.S., Miller, J., Motlagh, D., Chen, J., Acharya, A., Perdomo-Ortiz, A.:
Synergy between quantum circuits and tensor networks: short-cutting the race to
practical quantum advantage. arXiv preprint arXiv:2208.13673 (2022)
Adaptive Coronavirus Mask Protection
Algorithm Enabled Deep Learning for Brain
Tumor Detection and Classification
Abstract. Brain tumor (BT) is a dangerous disease and the process of detecting
BT is difficult. Early detection of this disease plays a critical role in protecting
the life of humans. Hence, this paper introduced an Adaptive Coronavirus Mask
Protection Algorithm (ACMPA)-enabled deep learning technique for detecting
and categorizing BT. First, the Magnetic Resonance Image (MRI) brain images
are pre-processed using Kalman filtering. After that, BT is segmented by utiliz-
ing LadderNet, and the features are extracted which include mean, tumor size,
entropy, kurtosis, variance, Haralick texture features, namely Angular second
moment (ASM), contrast and Spider Local Image Feature (SLIF). Following this,
BT is detected by the Deep Kronecker Network (DKN), where BT is categorized
into normal or abnormal. If the detection is abnormal, then BT is categorized
into Meningiomas, Gliomas, and pituitary tumors using DKN, which is tuned
by the ACMPA. The ACMPA is obtained by integrating the Adaptive concept
and Coronavirus Mask Protection Algorithm (CMPA). Furthermore, the proposed
ACMPA_DKN acquired the value of accuracy to 90.4%, and obtained the value
of TPR and TNR to 91.6% and 92.5%.
1 Introduction
In the human body, the brain is the significant and intricately structured organ. The
presence of a skull around the brain makes it challenging to study the brain’s behavior,
and also makes disease detection complicated [1]. Tumors that are produced from brain
cells or that cover the brain are called primary BTs. The initial cancer cells spreading
to the brain from another part of the body results in secondary [2]. A high-quality
brain image acquired by MRI is commonly used for analyzing tumors [1]. MRI is
a popular medical tool that is employed to detect and examine various diseases, like
epilepsy, neurological conditions, BT, and so on. Generally, a computer-aided method
assists in modernizing the process for acquiring correct and quick results [3, 4]. Unlike,
2 Literature Review
Khairandish, M.O., et al. [1] devised a Hybrid Support Vector Machine (SVM) and
CNN and method for categorizing Malignant and Benign tumors. The features in this
technique were extracted with high speed and effectiveness due to the model’s distinctive
aspects. However, it failed to consider the dimension and accurate position of tumor. The
LeU-Net technique was presented by Rai, H.M. and Chatterjee, K., [9] for identifying
BT from MRI images. This LeU-Net model acquired low complexity and performed
well with excellent simulation time. However, this method was only used with a small
database and failed to produce a better performance in a large amount of training data.
Sharif, M.I., et al. [11] presented a You Only Look Once v2-inceprion v3 (YOLOv2-
inceptionv3) approach for classifying BT. This method perfectly identified the regions
Adaptive Coronavirus Mask Protection Algorithm 441
where, X specifies databases, the total quantity of images is denoted as v and Xr signifies
vth data which is employed for the following process.
F−1
N2 = p ∗ K(p) (3)
p=0
Here, p represents the image grey level, the overall quantity of grey levels is denoted as
F, K(p) specifies the probability of p and N2 mentions mean.
Variance: Variance is termed as the variations of grey levels in an image related to
the mean grey level which is demonstrated in Eq. (4).
F−1
N3 = (p − N2 )2 ∗ K(p) (4)
p=0
Here, N5 is mentioned as contrast, LI (z, ν) denotes (z, v)th element of the segmented
image.
ASM: It is utilized to compute the local uniformity of gray levels. If the similarity
between pixels exists, then ASM will have a large value. This is given in Eq. (7).
N6 = L2I (z, ν) (7)
z ν
Here, DKN [15, 16] is employed to identify the brain tumor by taking the feature vector
N as input. DKN is produced on a structure of Kronecker product which completely
requires a piece-wise even assets on coefficients.
The matrix signified images Je ∈ Rc×x
e and scalar response de are taken into con-
sideration with q samples, where e = 1, 2, ..., q. The response de is expressed in
Eq. (9).
Here, the target of the unknown coefficients matrix is denoted as E ∈ Rc×x e , and the
definite known univariate functions are represented as α(·) and τ (·). Furthermore, the
definite known link function s(·) is articulated in Eq. (10).
T
E= Wmh ⊗ Wm−1
h
⊗ ... ⊗ W1h (11)
h=1
Adaptive Coronavirus Mask Protection Algorithm 445
where, unknown matrix are mentioned as Wmh ∈ Reem ×xm , m = 1, ..., V , h = 1, ..., T and
the dimensions of Wmh are not specified. The Kronecker product is required to satisfy
c = Vm=1 cm and x = Vm=1 xm and it is represented in Eqs. (12) and (13).
m
Wm ⊗ Wm −1 ⊗ · · ·Wm = ⊗ Wm (12)
m=m
T
⊗1m=V Wmh (13)
h=1
where, Hw represents the wth solution and l mentions the size of the population.
Fitness measure: The fitness is exploited to measure the finest value based on Mean
Square Error (MSE). Here, the solution providing low MSE is specified as the best
solution and it is represented in Eq. (15).
1 ∗
P
2
MSE = Zn − Zn (15)
P
n=1
Here, the outcome of DKN is signified as Zn , the expected output is termed as Zn∗ , and
the entire sample count is indicated as P .
Infection stage: In this stage, if the distance between vulnerable and infected persons
is less then the vulnerable person gets affected. The formulation of this stage is specified
in Eqs. (16) and (17).
ao (y + 1) = S(ao (y)) (16)
446 K. A. Bedekar and A. S. Awati
S(ao (y)) = ao (y) + λo (y) × k(y) × ao (y) − af (y) (17)
Here, the location of infected people in yth public contact is mentioned as af (t), the
safety factor of masks for oth person in yth public contact is represented as λo (y), and
social distance is specified as k, ao (y + 1) is the health of oth individual in (y + 1).
Diffusion stage: The expression for this diffusion is signified in Eqs. (18) and (19).
φ (ς / ρ+1) − 1
B(ao (y)) = (ao (y) + o (y) × k(y) × ao (y) − ab (y) ) × (19)
φ−1
Here, the time of virus spread and social actions are signified as ς and ρ, infection factor
is mentioned as φ, location of the immune person in yth public interaction is denoted as
ab (y) and oth the person in yth public interaction is termed as o (y).
Immune stage: If one person gets affected by a disease, maintaining distance from that
person requires more awareness. In this stage, the expression is formulated in Eqs. (20)
and (21).
ao (y + 1) = Q + × (A − Q) (22)
Parameter Value
Epoch 20
Learning rate 0.01
Batch size 32
The newly suggested ACMPA_DKN is executed using the PYTHON tool. The
experimental parameter details are given in Table 1.
In this work, BRATS 2020 and Figshare datasets are employed. BRATS 2020 [18]
employs multi-institutional pre-operative MRI images and mostly concentrates on seg-
mentation task which is essentially heterogeneous in shape and appearance, like gliomas.
Figshare [19] consists of T1-weighted contrast-enhanced images with three types of brain
tumors. The entire dataset is divided into four groups in 4.zip files where each.zip file
includes 766 slices. The dataset details are tabulated in Table 2.
Fig. 2. Experimental outcomes of (a) Input image (b) pre-processed image (c) Segmented image
Fig. 3. Analysis using BRATS 2020 (a) accuracy (b) TPR (c) TNR
The performance measures utilized for comparing the methods based on training data
are accuracy, TPR, and TNR. In the proposed method, Kalman filtering is used in the
pre-processing, which efficiently improves the image quality and removes the noise
present in the input image. Also, LadderNet is used for BT segmentation, which has
many pathways of information flow. Moreover, BT is detected successfully using the
DKN, which is trained by the proposed ACMPA. Thus, the proposed method achieves
better results than the comparative methods.
450 K. A. Bedekar and A. S. Awati
Fig. 4. Analysis using Figshare (a) accuracy (b) TPR (c) TNR
5 Conclusion
In this paper, an ACMPA_DKN model is employed for detecting and categorizing BTs.
For that, input is pre-processed and BT is segmented by utilizing LadderNet. Then, the
important features are extracted. After that, the BT detection is carried out by DKN
which is classified into normal or abnormal. If it is identified as abnormal, the BT is
classified. Moreover, BT is categorized by employing DKN which is tuned by ACMPA
by integrating of Adaptive concept and CMPA. BT is classified with the ACMPA_DKN
which attained the value of accuracy to 90.4%, the value of TPR and TNR to 91.6% and
92.5% with the BRATS 2020 dataset. The future work aims to work on both Machine
Learning (ML) and image processing approaches with many images for categorizing
BT from MRI and it also aims in categorizing BT in 3-Dimensional images.
References
1. Khairandish, M.O., Sharma, M., Jain, V., Chatterjee, J.M., Jhanjhi, N.Z.: A hybrid CNN-SVM
threshold segmentation approach for tumor detection and classification of MRI brain images.
Irbm 43(4), 290–299 (2022)
2. Bhattacharyya, D., Kim, T.H.: Brain tumor detection using MRI image analysis. In: Pro-
ceedings of Ubiquitous Computing and Multimedia Applications: Second International Con-
ference (UCMA 2011), Daejeon, Korea, 13–15 April 2011, Proceedings, Part II, Springer,
Heidelberg, pp. 307–314 (2011)
3. Abd El Kader, I., et al.: Brain tumor detection and classification on MR images by a deep
wavelet auto-encoder model. Diagnostics 11(9), 1589 (2021)
4. Rahman, T., Saiful Islam, M.: MRI brain tumor detection and classification using parallel
deep convolutional neural networks. Meas. Sens. 26 (2023)
Adaptive Coronavirus Mask Protection Algorithm 451
5. Tian, D., Fan, L.: A brain MR images segmentation method based on SOM neural network.
In: Proceedings of 2007 1st International Conference on Bioinformatics and Biomedical
Engineering, IEEE, pp. 686–689 (2007)
6. Bandhyopadhyay, D.S.K., Paul, T.U.: Segmentation of brain MRI image–a review. Int. J. Adv.
Res. Comput. Sci. Softw. Eng. 2(3) (2012)
7. Muhammad Arif, F., Ajesh, S.S., Geman, O., Izdrui, D., Vicoveanu, D.: Brain tumor detection
and classification by MRI using biologically inspired orthogonal wavelet transform and deep
learning techniques. Advances in Feature Transformation based Medical Decision Support
Systems for Health Informatics (2022)
8. Kumar Gupta, R., Bharti, S., Kunhare, N., Sahu, Y., Pathik, N.: Brain tumor detection and
classification using cycle generative adversarial networks. Interdiscip. Sci. Comput. Life Sci.
14, 485–502 (2022)
9. Rai, H.M., Chatterjee, K.: 2D MRI image analysis and brain tumor detection using deep
learning CNN model LeU-Net. Multimed. Tools Appl. 80, 36111–36141 (2021)
10. Fox, V., Hightower, J., Liao, L., Schulz, D., Borriello, G.: Bayesian filtering for location
estimation. IEEE Pervasive Comput. 2(3), 24–33 (2003)
11. Sharif, M.I., Li, J.P., Amin, J., Sharif, A.: An improved framework for brain tumor analysis
using MRI based on YOLOv2 and convolutional neural network. Complex Intell. Systems 7,
2023–2036 (2021)
12. Zhuang, J.: LadderNet: multi-path networks based on U-Net for medical image segmentation.
arXiv preprint arXiv:1810.07810 (2018)
13. Zayed, N., Elnemr, H.A.: Statistical analysis of haralick texture features to discriminate lung
abnormalities. J. Biomed. Imaging 2015, 12 (2015)
14. Fausto, F., Cuevas, E., Gonzales, A.: A new descriptor for image matching based on bionic
principles. Pattern Anal. Appl. 20, 1245–1259 (2017)
15. Jagtap, A.D., Shin, Y., Kawaguchi, K., Karniadakis, G.E.: Deep Kronecker neural networks:
a general framework for neural networks with adaptive activation functions. Neurocomputing
468, 165–180 (2022)
16. Feng, L., Yang, G.: Deep Kronecker Network. arXiv preprint arXiv:2210.13327 (2022)
17. Yuan, Y., et al.: Coronavirus mask protection algorithm: a new bio-inspired optimization
algorithm and its applications. J. Bionic Eng. 1–19 (2023)
18. BRATS 2020 dataset will be taken from https://www.kaggle.com/datasets/awsaf49/brats2
020-training-data?select=BraTS20+Training+Metadata.csv. Accessed Oct 2023
19. Figshare dataset will be taken from https://figshare.com/articles/brain_tumor_dataset/151
2427. Accessed Oct 2023
20. Lessa, V., Marengoni, M.: Applying artificial neural network for the classification of breast
cancer using infrared thermographic images. In: Chmielewski, L., Datta, A., Kozera, R.,
Wojciechowski, K. (eds.) Computer Vision and Graphics (ICCVG 2016). LNCS, vol. 9972,
pp. 429–438. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46418-3_38
Enhancing Hex Strategy: AI Based
Two-Distance Pruning Approach
with Pattern-Enhanced Alpha-Beta Search
Fr. Conceicao Rodrigues Institute of Technology (FCRIT), Navi Mumbai 400073, India
[email protected]
1 Introduction
Hex is a classic two-player board game, invented by Piet Hein in 1942 [1]. Hex is played
on a hexagonal grid with a goal to connect opposite sides before the opponent. While
simple in rules, it offers deep strategy and is used in AI research. There are no draws
[2] and a winning strategy exists for the first player. It has been proven to be PSpace
complete [3].
The intricate and complex nature of Hex has been a compelling source of motivation
for the interest in the game. As I delved into this captivating world, I discovered a plethora
of fascinating theorems that further ignited my curiosity. The theorems, outlined below,
offer a glimpse into the rich landscape of Hex and serve as a prelude to the purpose
behind the contributions from the author for this paper.
1. One of the earliest theorems asserts that on any board of regular size, there exists a
winning opening move [1].
2. Another fundamental result establishes that adding a friendly piece or removing an
enemy piece is never disadvantageous [4].
3. Beck’s theorem reveals that, on any board size, there exists a losing opening move.
Beck has found two opening moves that proved to be losses [5].
2 Related Work
Jack Van Rijswijck developed a program known as Queenbee [7] where he introduced the
concept of two-distance evaluation function. The Queenbee uses iterative deepening α–β
search with pruning at different depths followed by two-distance evaluation capturing
the second-best alternative.
Young and Hayward [8] developed a strategy for playing Reverse Hex, a variant in
which a player who connects two sides loses. Fabiano and Hayward [9] have introduced
new Hex fill patterns (mutual and near dead) which has reduced the solving time on 8
× 8 opening.
Broderick, Hayward, and Philip [10] developed a Monte Carlo Tree Search in Hex.
They developed the Monte Carlo Tree Search algorithm which is guided by the outcome
of random game simulations. MoHex’s MCTS is built on the code base of Fuego, the
Go program developed by Martin Muller et al., at the University of Alberta.
Chao, Hayward and Müller [3] have used Deep Convolutional Neural Network for
move prediction in Hex. They have used the self-play games of MoHex 2.0 neural
network for training by canonical Max Likelihood and the trained model was evaluated
by Wolve and MoHex 2.0. The improved version MoHex 2.0 uses a virtual connection
engine that finds the smaller connection sets in half the time. Also, MoHex 2.0 introduces
an optimization tool CLOP to tune parameters, extend unstable search and improve
MCTS formula. MoHex 2.0 [11] an improved algorithm using MCTS is the current
champion in the Hex game winning the latest competition.
Young, Vasan and Hayward [12] developed a Deep Q Learning (DQL) algorithm
to train NeuroHex and with two weeks training NeuroHex achieved a success rate of
20.4% as first player and 21% as second player against MoHex, the IGCA Olympiad
Hex Champion.
454 S. S. Inampudi
Apart from these, there are many algorithms and concepts for solving Hex and
making a good move selection. These include popular Hex programs like Queenbee [7]
and Hexy [13].
Chao, Siqi, Hayward, and Müller [14] have tested transferring of neural net knowl-
edge learned in one board size to the other. They have shown that when only board
independent neurons are used, the trained neural net is effectively used for other board
sizes, larger or smaller.
Woodcok, Uscategui, and Corrales [15] evaluated the Hex game through game the-
ory and graph theory by analyzing the short term and long term strategies, method of
attacking and discussed several important proofs. They have shown the existence of
weakly dominated and strongly dominated strategies within the game.
Hedan Liu and Xiaofu Du [16] have developed strategies based on cell geometry
and chessboard region for Hex game and combined them with Monte Carlo tree search
to achieve optimal results at the current level for reducing the scope of search space.
They have developed human-AI and AI-AI interface to the Hex game.
The algorithms developed for the Hex game primarily concentrated on board evaluation
[7] and pruning of search tree [1]. In this paper attempt has been made to use two-distance
heuristic based sorting of the moves before board evaluation thereby improving the
quality significantly. The core algorithm, developed, employs the two-distance method
for both board evaluation and for sorting of the moves. For empty board positions, the
sum of two-distances from both ends is calculated to indicate the position’s weight and
is used for sorting.
The two-distance concept is a unique way to measure distances between cells on a
board for the Hex game. In contrast to conventional distance metrics, which measure the
number of free moves needed to connect a cell to the edge, two-distance provides a more
sophisticated approach. It calculates distances by considering not only the direct path
but also the best second-best alternative, allowing it to account for strategic depth. This
means it factors in how a player can block their opponent’s optimal path, forcing them
into a less advantageous position. Two-distance considers the whole neighborhood of a
cell, accommodating different perspectives for White and Black due to existing game
pieces. This nuanced approach is invaluable for evaluating moves and developing AI for
Hex.
At the start of the game, AI is biased to play at the central region by giving more
weight to board positions using |i − j| and |(i + j/2)-mid|, where i and j being the horizontal
and vertical coordinates of the board (Fig. 1)1 . Center of the Hex board and thus the
central region is important as it takes the advantage of the symmetry properties [17].
1 All figures in this manuscript are from the “Simple Hex Board game with AI”. Link: https://play.
google.com/store/apps/details?id=com.SamgoGames.SimpleHex. The game can be installed
and played on android based mobile devices with OS 7.0 and above.
Enhancing Hex Strategy: AI Based Two-Distance Pruning Approach 455
At every depth two-distance pruning is used. The details are further given in sub-
sequent section. Pattern alpha beta search based pruning is used for simplifying the
move search [18]. Depth represents the current depth of the search or recursion in the
algorithm. This is often used to limit the depth of search or to determine when to stop
recursive calls. The given algorithm performs two-distance based ranked heuristic sort-
ing on every depth and two-distance board evaluation at the zeroth depth after which
the AI picks the move to play. In easy mode the algorithm searches for the best possible
move up to two depths, whereas in medium mode, algorithm searches up to four depths
and in hard mode up to six depths. By increasing the depth, the accuracy and quality of
the move is improved at the cost of processing time by AI.
At each level of depth in the algorithm, a strategy is employed to select a limited
number of sorted moves. This strategic move selection process serves the purpose of
narrowing down the set of potential moves, effectively reducing the overall count of
moves considered. The selection criteria vary depending on the game board size and the
difficulty mode.
For instance, when playing on a 7 × 7 board, at each of the available depths (2
depths for easy mode, 4 depths for medium mode and 6 depths for hard mode), the first
11 sorted moves for the initial 2 moves are considered. Similarly, on a 9 × 9 board, this
strategy is extended to the first 4 moves, again choosing the top 11 sorted moves at each
of the available depths depending on the difficulty of the mode and for 11 × 11 board,
11-sorted moves are picked at each depth for the first 8 moves. After these initial moves,
first 21 sorted moves are picked at each depth for any mode and board size except that
in hard mode.
For example, in the easy mode of the game, when the user makes a move on the
board, the AI begins by selecting a limited number of moves from the total available
moves during the first depth of the search. Then, it proceeds to the second depth, focusing
on the selected moves from the previous step. Next, the AI changes its perspective as
if it were the user and once again assesses the remaining moves on the board from that
position. It then sorts through these moves and chooses a few. Following this process,
the board is evaluated at the zeroth depth. Finally, the AI makes its move based on the
position that gives it the most favourable outcome.
In the algorithm, regardless of the board size and difficulty mode, the AI computes
several move calculations equal to fanOut raised to the power of depth in order to sort
and select a move to play. The term fanOut in the algorithm refers to the maximum
number of sorted moves selected at each depth.
456 S. S. Inampudi
This method is responsible for ranking and sorting potential moves based on two-distance
heuristic evaluation (Algorithm 1) which is the primary contribution to the Hex game.
The heuristic aims to estimate the desirability of a given move.
It first calculates distances and scores for various game board cells using the twoDis-
tanceForBoardPositions function (refer Sect. 4.2). Specifically, it calculates distances
for the cell’s reachability towards opposite ends of the board and is summed up. This
is done both for the player and the opponent. The function then initializes a priority
queue (a data structure that keeps elements in priority order). It uses a custom compar-
ison function to order elements based on their heuristic scores. This allows the AI to
prioritize moves with higher scores. It considers a certain number of moves (controlled
by the fanOut variable) and returns this sorted list. Overall, the algorithm plays a crucial
role in the AI’s decision-making process by evaluating the game state and by selecting
the most promising moves.
4.2 TwoDistanceForBoardPositions
This function applies the concept of two-distance in the Hex game [7]. It works by
systematically exploring the game board and keeping track of distances from various
cells to potential winning conditions. Below is how this function works:
It initializes two important data structures: isTwoDistance (a boolean array) and
twoDistance (an integer array). These arrays are used to mark visited cells and store
distances, respectively. Initially, all cells are marked as unvisited, and their distances are
set to a large value.
Enhancing Hex Strategy: AI Based Two-Distance Pruning Approach 457
The function uses a breadth-first search (BFS) approach. It starts by adding certain
initial cells to a queue for exploration. It then enters a loop where it dequeues cells from
the queue one by one and explores their neighbors. For each neighbor, it checks if it has
been visited and if it is a valid move according to the game rules. If the neighbor has not
been visited, it updates its distance and marks it as visited. If it belongs to the player,
it sets the distance to be the same as the previous cell’s distance. If it is a neutral cell
(white), it increments the distance by one. This process continues until all relevant cells
have been visited and distances calculated.
Finally, the function returns the twoDistance array, which contains the calculated
distances for each cell on the board. These distances are crucial for AI to assess the
game state which is used in the TwoDistanceRankAndPrune method for sorting moves.
This TwoDistanceRankAndPrune method is called by the Alpha Beta MiniMax
method (Algorithm 2). The Algorithm 2 is similar to the algorithm developed by
Rasmussen [18] where line 12 is the main contribution of the paper.
458 S. S. Inampudi
Further, the algorithm’s performance in terms of computing time has been studied
with and without fanOut parameter. Under without fanOut conditions the AI took sig-
nificantly longer time to complete the game with a total execution time of 2 min. And 28
s in medium mode of 7 × 7 board. Whereas, by using fanOut parameter the AI was able
to complete the same move sequence with a significantly reduced execution time of only
58 s. The time taken in each case is recorded using a laptop with i5 processor. However,
it may be noted that the above timings are recorded where the AI has already ranked and
460 S. S. Inampudi
Table 1. Estimated number of moves required to be assessed with and without fanOut parameter
and the RankAndPrune concept in easy mode of 7 × 7 Board
Move No. Total No. of moves without No. of moves with fanOut Played by
fanOut
1 49 × 48 = 2352 11 × 11 = 121 User
2 48 × 47 = 2256 11 × 11 = 121 AI
3 47 × 46 = 2162 11 × 11 = 121 User
4 46 × 45 = 2070 21 × 21 = 441 AI
5 45 × 44 = 1980 21 × 21 = 441 User
6 44 × 43 = 1892 21 × 21 = 441 AI
7 43 × 42 = 1806 21 × 21 = 441 User
8 42 × 41 = 1722 21 × 21 = 441 AI
9 41 × 40 = 1640 21 × 21 = 441 User
10 40 × 39 = 1560 21 × 21 = 441 AI
11 39 × 38 = 1482 21 × 21 = 441 User
12 38 × 37 = 1406 21 × 21 = 441 User
13 37 × 36 = 1332 21 × 21 = 441 AI
14 36 × 35 = 1260 21 × 21 = 441 User
15 35 × 34 = 1190 21 × 21 = 441 AI
16 34 × 33 = 1122 21 × 21 = 441 User
17 33 × 32 = 1056 21 × 21 = 441 AI
18 32 × 31 = 992 21 × 21 = 441 User
19 31 × 30 = 930 21 × 21 = 441 AI
20 30 × 29 = 870 21 × 21 = 441 User
21 29 × 28 = 812 21 × 21 = 441 AI
22 28 × 27 = 756 21 × 21 = 441 User
23 27 × 26 = 702 21 × 21 = 441 User
24 26 × 25 = 650 21 × 21 = 441 AI
25 25 × 24 = 600 21 × 21 = 441 User
26 24 × 23 = 552 21 × 21 = 441 AI
27 23 × 22 = 506 21 × 21 = 441 User
28 22 × 21 = 462 21 × 21 = 441 AI
29 21 × 20 = 420 420 User
30 20 × 19 = 380 380 AI
. . . .
(continued)
Enhancing Hex Strategy: AI Based Two-Distance Pruning Approach 461
Table 1. (continued)
Move No. Total No. of moves without No. of moves with fanOut Played by
fanOut
47 3×2=6 6 User
48 2×1=2 2 AI
49 0 0 User
Total moves by AI 19000 7284
Note: After move no. 28 the number of move possibilities are equal for with and without fanOut
parameter being used. This is because the number of moves would be less than 441
sorted the moves. The computing time of algorithm and results would change when the
moves are not ranked and sorted. Figure 2 presented below shows the game with specific
sequence of moves played by both the user and AI where first move was played by user.
The same sequence is played by AI in both cases (with and without fanOut parameter
and RankAndPrune). This demonstrates the contribution of TwoDistanceRankAndPrune
function (In Sect. 4.1) which makes the AI computation significantly faster.
Fig. 2. Move sequence played for demonstration of computation time with and without “fanOut”
and RankAndPrune concept in medium mode and 7 × 7 board size
This experiment underscores the crucial role of fanOut and rankAndPrune in opti-
mizing the performance of the algorithm. By limiting the number of considered moves
and employing a ranking mechanism, the algorithm effectively narrows down the set of
potential moves, reducing the computational load and speeding up the AI’s decision-
making process. The results highlight the effectiveness of these strategies in achieving a
balance between computational efficiency and quality gameplay. This finding provides
valuable insights into the practical implications of these components in the algorithm’s
design, their impact on overall gameplay speed and importance of hyper parameter tun-
ing to evaluate optimal AI performance. Similar results are observed in all modes across
different board sizes.
Fig. 3. Initial Moves of both Player and AI Algorithm in easy and hard Modes (Total four moves:
Blue moves by player – Move no. 1 and 3; Red moves by AI Algorithm – Move no. 2 and 4)
(Color figure online)
Fig. 4. Sixth move played by AI in response to fifth move by the player in easy (4a) and hard
modes (4b) (Color figure online)
• As shown in Fig. 4a, in easy mode AI played a move (Red Tile 6) which would
eventually give advantage to the opponent for connecting blue to the top of the board
and this should have resulted in wastage of the move played by AI. Figure 5a and 5b
Enhancing Hex Strategy: AI Based Two-Distance Pruning Approach 463
presents the two combinations in which the opponent (Blue Tiles) connects to the top
of the board.
Fig. 5. Demonstration of Player (Blue Tile) connecting to top of the board in response to move
no.6 by AI in easy Mode (Color figure online)
• Also, even if AI played move no.6 as shown below in Fig. 6a then also the blue player
connecting to the top of the board cannot be avoided.
Fig. 6. Demonstration of Player (Blue Tile) connecting to top of the board in response to move
no.6 by AI in easy Mode (Color figure online)
This shows the player (Blue Tile) by playing move no.5 ensured connection to the
top of the board irrespective of AI (Red Tile) playing any position. Easy Mode could
not evaluate this as it is computing move combinations only up to depth 2. Whereas, AI
in hard mode, by estimating move combinations up to depth 6, has chosen to play move
no.6 in the lower portion of the board (Fig. 4b) to prevent the blue from connecting to the
bottom of the board. The two-distance board evaluation and two-distance sorting with
depth 6 has helped the AI to choose a good move.
In this paper, a novel two-distance-based algorithm for Hex board strategy game has
been successfully implemented across various board sizes, including 7 × 7, 9 × 9, and
11 × 11. The algorithm developed not only employs the two-distance method for board
464 S. S. Inampudi
evaluation but also for move sorting and combines it with efficient fanOut parameter. The
results of the efficiency analysis clearly demonstrate the algorithm’s ability to reduce
computational load and expedite the AI’s decision-making process. Furthermore, the
integrated alpha-beta pruning and pattern search-based must-play region pruning in the
algorithm, enhanced its overall performance. This comprehensive approach has proven to
be a valuable addition to the Hex game’s AI strategy. The study highlights the significance
of these innovations in advancing game-playing AI and provides a strong foundation for
further exploration and improvement in this field.
In the future, the research will focus on a comprehensive exploration and fine-tuning
of critical hyperparameters that have a substantial impact on the Hex algorithm’s perfor-
mance. Experiments with varying values of fanOut, game depth, alpha and beta param-
eters, and the weighting of factors within the evaluation function. Additionally, it is
planned to extend algorithm’s applicability to larger board sizes, such as 13 × 13, to
assess its scalability and strategic capabilities. Also, applying the concepts of reinforce-
ment learning like AlphaZero [19] to the hex game and making AI learn on its own and
beat its own algorithm sounds quite interesting.
Acknowledgments. This work has been carried out as part of Summer Internship under the
guidance of Mr. Naga Srinivas Vemuri, Google IT Services India Pvt Ltd, Hyderabad in his
personal capacity. The author is deeply indebted to Dr. Naga Srinivas Vemuri, who is the primary
developer of the code, for mentoring at every stage during the development of algorithm for
building AI for the Hex board strategy game and for support during testing and performance
evaluation.
References
1. Hayward, R.B., Toft, B.: HEX the full story. CRC Recreational Mathematics Series (2019)
2. Pierce, J.R.: Symbols, Signals and Noise. Harper and Brothers, pp. 10–13 (1961)
3. Chao, G., Hayward, R., Müller, M.: Move prediction using deep convolutional neural networks
in Hex. IEEE Trans. Games 10(4), 336–343 (2017)
4. Rijswijck, J.V.: Set colouring games. PhD Thesis, Department of Computing Science,
University of Alberta, Canada (2006)
5. Beck, A., Bleicher, M.N., Crowe, D.W.: Excursions into Mathematics, pp. 317–387. Chapter
Games, New York (1969)
6. Yang, J., Liao, S., Pawlak, M.: New winning and losing positions for 7×7 Hex. In: Schaeffer,
J., Müller, M., Björnsson, Y. (eds.) Computers and Games (CG 2002). LNCS, vol. 2883,
pp. 230–248. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-40031-8_16
7. Rijswijck, J.V.: Computer Hex: are bees better than fruitflies? Thesis of Master of Science,
p. 37, Department of Computing Science, University of Alberta, Canada (2000)
8. Young, K., Hayward, R.B.: A reverse Hex solver. In: Plaat, A., Kosters, W., van den Herik, J.
(eds.) Computers and Games (CG 2016), LNCS, vol. 10068, pp. 137–148. Springer, Cham
(2016). https://doi.org/10.1007/978-3-319-50935-8_13
9. Fabiano, N., Hayward, R.: New Hex patterns for fill and prune. In: Cazenave, T., van den
Herik, J., Saffidine, A., Wu, I.C. (eds.) Advances in Computer Games (ACG 2019). LNCS,
vol. 12516, pp. 79–90. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65883-0_7
10. Broderick, A., Hayward, R.B., Philip, H.: Monte Carlo tree search in Hex. IEEE Trans.
Comput. Intel. AI Games 2(4), 251–258 (2010)
Enhancing Hex Strategy: AI Based Two-Distance Pruning Approach 465
11. Huang, S.C., Arneson, B., Hayward, R.B., Müller, M., Pawlewicz, J.: MOHEX 2.0: a pattern-
based MCTS Hex player. In: International Conference on Computers and Games, Computers
and Games (CG 2013), pp. 60–71 (2013)
12. Young, K., Vasan, G., Hayward, R.: NeuroHex: a deep q-learning Hex agent. In: Workshop on
Computer Games, International Workshop on General Intelligence in Game Playing Agents
(CGW 2016, GIGA 2016), Computer Games (2016)
13. Anshelevich, V.V.: A hierarchical approach to computer Hex. Artif. Intell. 134(1–2), 101–120
(2002). https://doi.org/10.1016/S0004-3702(01)00154-0
14. Chao, G., Siqi, Y., Hayward, R., Müller, M.: A transferable neural network for Hex. ICGA J.
40(3), 224–233 (2018)
15. Woodcok, M., Uscategui, F., Corrales, D.: Basic analysis of Hex game. Econógrafos, Escuela
de Economía 13417, Universidad Nacional de Colombia, FCE, CID (2015)
16. Liu, H., Du, X.: Strategy and implementation of Hex. In: Proceedings of the 2020 4th Interna-
tional Conference on Electronic Information Technology and Computer Engineering (EITCE
2020), pp. 800–805 (2020)
17. Yang, J., Simon, L., Mirek, P.: A new solution for a 7×7 Hex game (2002)
18. Rasmussen, R.: Algorithmic approaches for playing and solving Shannon games. PhD Disser-
tation, Faculty of Information Technology, Queensland University of Technology, pp. 24–26,
49–52, 108–111 (2007)
19. David, S., et al.: A general reinforcement learning algorithm that masters chess, shogi, and Go
through self-play. Science 362, 1140–1144 (2018). https://doi.org/10.1126/science.aar6404
IRBM: Incremental Restricted Boltzmann
Machines for Concept Drift Detection
and Adaption in Evolving Data Streams
1 Introduction
In today’s big data era, advanced applications like weather forecasting, e-
commerce, fraud detection, and telecommunications produce vast amounts of
real-time data, often referred to as data streams [1–4]. These streams, owing to
their sheer volume, require real-time processing. However, they can be affected
by “concept drifts” that either alter the data distribution or the relationship
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
D. Garg et al. (Eds.): IACC 2023, CCIS 2053, pp. 466–475, 2024.
https://doi.org/10.1007/978-3-031-56700-1_37
IRBM Concept Drift Detection and Adaption in Evolving Data Streams 467
between target labels and input features [3,5,6]. Two categories of concept drift
are present. In the first, known as ‘real drift’, the relationship between the target
label and input features changes, affecting decision boundaries. The second, ‘vir-
tual drift’, only involves changes in the feature distribution. Drift can occur in
various patterns, including sudden, gradual, incremental, or recurrent [1,3,4]. If
drifts are left unaddressed, they can compromise the performance and reliability
of a Machine Learning (ML) model, leading to degraded performance over time.
Traditional ML and deep learning methods (e.g. Naive Bayes, SVM, Deci-
sion Tree, Long Short Term Memory, OzaBagging) [7–11,18–21] are built on
the assumption that data remains consistent over time, resulting in static model
structures. These structures can become less accurate when data evolves. Given
the changing nature of data streams, there’s a pressing need for adaptable
models that maintain consistent performance even when the data changes over
time. There are numerous significant advantages of using Restricted Boltzmann
Machines (RBMs) [12–15] for drift detection in changing data streams. Funda-
mentally, RBMs are designed to model the underlying probability distribution
of their input data. This property, combined with their free energy metric, offers
a direct and intuitive means to access the consistency of new data against this
learned distribution. When the data starts to change or drift from the original
patterns, this causes noticeable shifts in the free energy. Unlike traditional drift
detection methods, which frequently rely on predetermined statistical tests or
sliding windows, RBMs inherently detect drift without the need for arbitrar-
ily chosen parameters or thresholds. The generative capabilities of RBMs also
enable them to adapt to evolving data trends.
The contributions can be summarized in the following points:
– Presents a restricted boltzmann machine (RBM) for concept drift detection.
– Introduces the incremental RBM for adapting internal representations and
maintaining model performance, with a focus on handling evolving data
streams to ensure sustained accuracy.
– Demonstrated the ability to rapidly detect concept drifts, maintaining robust
performance.
– Compares the current methodologies and the proposed approach across a
range of diverse datasets.
The remaining paper is structured as follows: Sect. 2 presents the Literature
Review, while Sect. 3 provides an in-depth explanation of the Methodology. In
Sect. 4, we delve into the Experimental Design, focusing on the datasets used.
Section 5 encompasses the Results and Analysis, and the paper concludes in the
subsequent section.
2 Literature Survey
Data streams inherently impose numerous challenges due to their evolving char-
acteristics and the emergence of concept drift [2–4]. In this context, concept
drift signifies the moments of distribution shift of input feature attributes, like
468 S. Suryawanshi et al.
means and standard deviations, or when relationships between target labels and
features evolve.
These challenges have opened the way for a variety of research opportu-
nities and led to the focus of researchers in this area. Over the years, both
machine learning Naive Bayes [6,8], Support Vector Machine (SVM) [7,8], Deci-
sion Tree [8,10], Incremental OzaBagging [20] and deep learning methodologies
such as Incremental LSTM [20], and Extreme Learning Machine [6,8,10,18,19]
have been explored extensively for drift detection and adaptation. Extreme
Learning Machine (ELM) is distinguished by its efficiency, broad approxima-
tion capabilities, generalization, and straightforwardness, leading to the develop-
ment of several ELM-centric techniques for identifying and adapting to concept
drift [18,19].
Amongst the wide range of methodologies, Restricted Boltzmann Machines
(RBMs) have gained notable attention [12–15]. Their intrinsic ability to address
drift without extensive parameter optimization makes them a favoured choice.
This literature review aims to delve into the various approaches and the promi-
nence of using RBMs to overcome the difficulties presented by dynamic streams.
The study [13] proposes the application of Restricted Boltzmann Machines
(RBMs) to detect drifts in the mining of time-varying data streams. RBMs can
learn joint probability distributions of attribute values and classes, providing a
condensed representation of the data distribution. Training an RBM on a portion
of the data stream allows the identification of potential changes in the proba-
bility distribution. Two assessment measures are utilized to detect sudden or
gradual shifts in the data stream. The efficacy of these techniques for detecting
concept drift is demonstrated by experimental results on synthetic datasets. This
paper [14] introduced a new drift detection by using the Restricted Boltzmann
Machine (RBM). This detector is, capable of monitoring multiple classes, inde-
pendently recognizing changes, managing imbalanced distributions, and adjust-
ing to local concept drifts in underrepresented classes. By employing a skew-
insensitive loss function, the detector efficiently works with multiple imbalanced
distributions. Leveraging the reconstruction error of the RBM, it detects the
changes within individual classes, thus facilitating adjustments to shifting class
dynamics and localized drifts in minority classes.
The study [15] presented the Restricted Boltzmann Machine (RBM) can be
optimized for data stream mining with incomplete datasets. By introducing two
novel modifications to the RBM algorithm, missing values are managed effi-
ciently. Experimental validation shows their effectiveness in detecting concept
drift in incomplete data streams. The study [16] presented an anomaly detection
system for video surveillance that is trained on unlabeled raw pixels while avoid-
ing conventional obstacles. Anomalies are found by contrasting input videos with
model reconstructions using RBMs and Deep Boltzmann Machines (DBMs). The
results demonstrate their ability in detection as well as simultaneous scene clus-
tering and reconstruction. The study [17] presented a concept drift detection
and adaptation algorithm that was proposed using an online RBM. The integra-
tion of an attention mechanism within the RBM allows for targeted updating
IRBM Concept Drift Detection and Adaption in Evolving Data Streams 469
3 Methodology
Fig. 1. IRBM: Incremental Restricted Boltzmann Machines for Concept Drift Detec-
tion and Adaption
where: as and bt are the biases. wst is the weight between visible unit s
and hidden unit t.
For drift detection, the key idea is to monitor how well the RBM can
reconstruct new incoming data compared to previous data. If the recon-
struction error increases significantly, it may be indicative of concept
drift. Given an input vector v, the RBM will activate the hidden units
shown in Eq. 2
P (ht = 1|v) = σ(bt + vs wst ) (2)
s
where σ is the logistic sigmoid function. The RBM will then reconstruct
a visible vector v from these hidden activation:
P (vt = 1 h) = σ(at + hs wst ) (3)
t
Train the RBM on a batch of data. As new data comes in, pass it through
the RBM and compute the reconstruction error. If the error exceeds a
predefined threshold or shows a consistent increasing trend, it can be
taken as an indication of concept drift. if error (vb , vb )-error(vb−1 , vb−1
)
>δ
For a set of consecutive b (time steps or data batches), we might infer a
drift. Here, δ is a predefined threshold.
Step 4 Incremental RBM Fine-tuning
– If drift is detected, the RBM can be retrained or fine-tuned on the most
recent data to capture the new concept.
– The weight and bias are updated as follows by considering Eq. 2 and 3:
Modify the weights and biases using the differences between the input
and reconstruction errors as shown in Eq. 5, 6 and 7:
4 Experiment Design
The section provides a detailed description of the datasets employed for testing
and the evaluation metrics applied.
4.1 Dataset
Below is a detailed overview of the datasets used for the study:
– The electricity dataset has 45,312 instances. The dataset contains 8 attributes,
with a target label describing the UP and DOWN in electricity prices. Prices
are influenced by supply and demand [22].
– This dataset includes harmful web pages as well as the nslkdd dataset. It
was primarily utilized to develop an intrusion detection system capable of
discriminating between malicious (attack) and benign (regular) connections.
It has two target label categories: ‘attack’ and ‘normal’. It consists of 46
features and 11,055 instances [25].
– The dataset weather dataset is a real-world concept drift dataset that contains
18159 features with labels on whether there will be rain or not [24].
– In the synthetic dataset rotating hyperplane the position and orientation of a
hyperplane are continuously adjusted. The dataset contains ten features and
a class label [23].
In Fig. 2, the datasets are plotted along the x-axis, and their corresponding
accuracy is on the y-axis. This figure differentiates the performance of several
methodologies, with each method represented by its unique color. Additionally,
when compared to existing approaches, Fig. 2 indicates significant improvements
of 5% for electricity, 3% for weather, 4% for phishing, and another 4% for rotat-
ing hyperplane datasets. While accuracy is essential, other metrics such as preci-
sion, recall, and the F1 score are also used to assess a model’s true efficiency. The
proposed approach method not only shows better accuracy but also shows out-
standing performance across different measures such as Precision, Recall and F1
score, promising an efficient and robust model. In comparison with other exist-
ing techniques, our approach outperforms them. Its inherent strength lies in its
IRBM Concept Drift Detection and Adaption in Evolving Data Streams 473
models remain current, effective, and relevant in incremental drift detection and
adaption.In the future, the evolving Restricted Boltzmann Machine (RBM) could
be applied to concurrently tackle various forms of drift detection and adaptation.
References
1. Aggarwal, C.C.: Chapter 9 A survey of stream classification algorithms (2015)
2. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the
6th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, pp. 71–80 (2000)
3. Tsymbal, A.: The problem of concept drift: definitions and related work. Computer
Science Department, Trinity College, Dublin, Ireland, Technical report, vol. 106
(2004)
4. Žliobaitė, I., Pechenizkiy, M., Gama, J.: An overview of concept drift applications.
In: Japkowicz, N., Stefanowski, J. (eds.) Big Data Analysis: New Algorithms for
a New Society. SBD, vol. 16, pp. 91–114. Springer, Cham (2016). https://doi.org/
10.1007/978-3-319-26989-4 4
5. Hesse, G., Lorenz, M.: Conceptual survey on data stream processing systems, pp.
798–803 (2015)
6. Mehta, S.: Concept drift in streaming data classification algorithms, platforms and
issues. Procedia Comput. Sci. 122, 804–811 (2017)
7. Ayad, O.: Learning under concept drift with support vector machines. In: Wermter,
S., Weber, C., Duch, W., Honkela, T., Koprinkova-Hristova, P., Magg, S., Palm, G.,
Villa, A.E.P. (eds.) ICANN 2014. LNCS (LNAI and LNB), vol. 8681, pp. 587–594.
Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11179-7 74
8. Gama, J., Žliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on
concept drift adaptation. ACM Comput. Surv. 46(4), 44 (2014)
9. Cohen, L., Avrahami-Bakish, G., Last, M., Kandel, A., Kipersztok, O.: Real-time
data mining of non-stationary data streams from sensor networks. Inform. Fusion
9(3), 344–353 (2008)
10. Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept
drift: a review. IEEE Trans. Knowl. Data Eng. 31, 2346–2363 (2018)
11. Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environ-
ments: a survey. IEEE Comput. Intell. Mag. 10(4), 12–25 (2015). https://doi.org/
10.1109/MCI.2015.2471196
12. Fischer, A., Igel, C.: An introduction to restricted Boltzmann machines. In:
Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) CIARP 2012. LNCS, vol.
7441, pp. 14–36. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-
33275-3 2. ISBN 978-3-642-33274-6
13. Jaworski, M., Duda, P., Rutkowski, L.: On applying the restricted Boltzmann
machine to active concept drift detection (2017). https://doi.org/10.1109/SSCI.
2017.8285409
14. Korycki, L ., Krawczyk, B.: Concept drift detection from multi-class imbalanced
data streams. arXiv Learning (2021)
15. Jaworski, M., Duda, P., Rutkowska, D., Rutkowski, L.: On handling missing values
in data stream mining algorithms based on the restricted Boltzmann machine. In:
Gedeon, T., Wong, K.W., Lee, M. (eds.) ICONIP 2019. CCIS, vol. 1143, pp. 347–
354. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36802-9 37
IRBM Concept Drift Detection and Adaption in Evolving Data Streams 475
16. Vu, H., Nguyen, T.D., Phung, D.: Detection of unknown anomalies in streaming
videos with generative energy-based Boltzmann models. arXiv Computer Vision
and Pattern Recognition (2018)
17. Zhu, Q., Zhou, J., Wang, W.: Concept drift detection and update algorithm based
on online restricted Boltzmann machine. In: Liang, Q., Wang, W., Mu, J., Liu, X.,
Na, Z. (eds.) AIC 2022. LNEE, vol. 871, pp. 305–311. Springer, Singapore (2023).
https://doi.org/10.1007/978-981-99-1256-8 36
18. Xu, S., Wang, J.: Dynamic extreme learning machine for data stream classification.
Neurocomputing 238, 433–449 (2017)
19. Xiulin, Z., Peipei, L., Xindong, W.: Data stream classification based on extreme
learning machine: a review. Big Data Res. 30, 100356 (2022). https://doi.org/10.
1016/j.bdr.2022.100356
20. Neto, Á.C.L., Coelho, R.A., de Castro, C.L.: An incremental learning approach
using long short-term memory neural networks. J. Control Autom. Electr. Syst.
33, 1457–1465 (2020)
21. Oza, N.C., Russell, S.J.: Online bagging and boosting. In: Richardson, T.S.,
Jaakkola, T.S. (eds.) Proceedings of the Eighth International Workshop on Arti-
ficial Intelligence and Statistics, pp. 229–236. PMLR (2001). https://proceedings.
mlr.press/r3/oza01a.html
22. Dataset. Electricity dataset (2014). https://github.com/scikit-multiflow/
streaming-datasets/blob/master/elec.csv. Accessed September 2023
23. Dataset. Rotating hyperplane dataset (2014). https://github.com/scikit-
multiflow/streaming-datasets/blob/master/hyperplane.csv. Accessed Septem-
ber 2023
24. Dataset. Weather dataset (2014). https://github.com/scikit-multiflow/streaming-
datasets/blob/master/weather.csv. Accessed September 2023
25. Dataset. Phishing dataset (2014). https://github.com/ogozuacik/concept-drift-
datasets-scikit-multiflow/tree/master/real-world/phishing.csv. Accessed Septem-
ber 2023
Revisiting Class Imbalance: A Generalized
Notion for Oversampling
1 Introduction
2 Related Work
Sampling approaches intend to balance the class distribution by either increasing
the minority class data instances (i.e. oversampling), decreasing majority class
data instances (i.e. undersampling), or achieving a trade-off by combining both
oversampling and undersampling approaches (i.e. hybrid sampling) [4,24]. We
provide a brief overview of these sampling approaches below.
Oversampling Approaches: Chawla et al. [3] introduced SMOTE, which
is a novel approach to generating synthetic instances by interpolating minor-
ity data instances in the dataset. It produces synthetic samples by blending
attributes from minority samples with their neighbors, resulting in enhanced
generalizability compared to random oversampling methods. The rising acclaim
for this technique has led numerous subsequent studies to adopt SMOTE as their
foundational methodology. The method presented in [18] integrates the widely
recognized k-means clustering algorithm with SMOTE oversampling to tackle
the issue of imbalanced datasets. This sets itself apart with both simplicity and
strategic sample distribution: a clustering phase segments the input space, fol-
lowed by a meticulous cluster selection for oversampling, demonstrating a unique
and effective methodology. He et al. [8] introduced a pioneering method called
Adaptive Synthetic Sampling (ADASYN) to tackle minority oversampling using
density distribution. They assign varying weights to different minority instances,
considering their complexity in discrimination. Concentrating on the difficult
classification instances, ADASYN carefully generates synthetic data, improving
dataset representation with precision and effectiveness. Two innovative exten-
sions to SMOTE were presented by Han et al. [7], namely borderline-SMOTE1
and borderline-SMOTE2. These adaptations focus exclusively on oversampling
minority instances located near the borderline. These selective approaches tar-
get instances that are inherently challenging for machine learning algorithms to
distinguish.
Other Sampling Approaches: While much attention has traditionally been
devoted to oversampling techniques, some research studies have diverged from
this by focusing on the reduction of majority data instances to achieve class
balance. These works concentrate on imbalanced classification through under-
sampling techniques [2,21,27,28]. Some works explore a combined approach that
incorporates both oversampling and undersampling methods [4,12,16].
Our work is essentially dissimilar to existing sampling approaches in the
literature owing to the fact that we focus on introducing a generalized notion
for the oversampling process rather than limiting to a specific technique. Our
method can accommodate a wide variety of oversampling functions.
3 Methodology
3.1 Preliminaries
We begin by introducing the notation that will be used throughout the paper.
Let D = {X, Y } be a dataset. X = {x1 , . . . xn } is a set of data instances where
Revisiting Class Imbalance: A Generalized Notion for Oversampling 479
df : S → Rd (1)
d'f
df
d*f
(a) (b)
Fig. 1. (a) Oversampling function balances the class distribution (b) Different functions
(i.e. df and d∗f ) generate samples with different characteristics
Mean. Calculation of x̃ using mean values takes the average value of the func-
tion over its domain.
x ∈N ˆ
(x)
x
x̃ = (4)
|N ˆ(x)|
x ∈N ˆ
w ×x
(x) x
x̃ = (5)
x ∈N ˆ
w
(x) x
x ∈N ˆ
x
(x) 1,r
μr = (8)
|N ˆ(x)|
2
x ∈N ˆ(x) x1,r − μr
x̃1,r := (9)
|N ˆ(x)
x ∈N ˆ(x)
(wx × 1x ∈S∪x ) × x
x̃ = (11)
x ∈N ˆ
(x)
(wx × 1x ∈S∪x )
4 Experimental Design
Through our experiments, we seek to answer the following questions.
– To which extent do the proposed oversampling functions perform well to
alleviate the class imbalance problem?
– Is there any impact from the input size on the performance of these functions?
– How well do these functions work with noisy input data?
In this section, we describe the datasets, training algorithms, and the implemen-
tation details of our experiments.
482 P. Velayuthan et al.
4.1 Datasets
measures such as accuracy and ROC AUC score because they do not effectively
capture the nuances of imbalanced classification scenarios.
TP
P recision = (12)
TP + FP
TN TP
Balanced − Accuracy = 0.5 × + (13)
TP + FN TP + FN
TP
F 1 − score = (14)
T P + 12 (F P + F N )
Precision of class
P recisionmacro = (15)
Number of classes
class
Balanced accuracy of class
Balanced − Accuracymacro = (16)
Number of classes
class
F1-score of class
F 1 − scoremacro = (17)
Number of classes
class
From the results, it is evident that there is no single function that per-
forms well for every dataset. The best-performing function always depends on
the dataset characteristics. However, non-linear functions such as minimum and
maximum show solid performance in all datasets compared to linear functions.
We attribute this to the high capacity of these functions as they can model
complex relationships between data instances using non-linearity. Weighted
arithmetic mean consistently outperforms other linear functions such as mean,
and SMOTE. Mean and SMOTE are special instances of weighted arithmetic
mean. Therefore, the weighted average mean inherently contains higher model-
ing capacity and, thus performs comparably or better. Standard deviation is the
worst-performing model. We believe this is mainly due to its inability to capture
data distribution accurately, especially when the data distributions are skewed.
Wisconsin Dataset
97.5
97
F1 Score (%)
96.5
96
SMOTE
SUM
28 MEAN
WAM
27.5 MIN
MAX
STD
27
26.5
0 1 2 3 4 5 6 7 8 9
Input Size (k)
(a)
Pima Diabetes Dataset
74
73
F1 Score (%)
72
SMOTE
71 SUM
MEAN
WAM
70 MIN
MAX
STD
40
39
0 1 2 3 4 5 6 7 8 9 10
Input Size (k)
(b)
Fig. 2. Classification performance (F1-Score) against input size for (a) Wisconsin and
(b) Diabetes datasets
Revisiting Class Imbalance: A Generalized Notion for Oversampling 487
Glass Dataset
70
60
50
40
F1-Score
30
20
10
0
SMOTE SUM MEAN WAM MIN MAX STD
Functions
(a)
Thyroids Dataset
80
60
F1-Score
40
20
0
SMOTE SUM MEAN WAM MIN MAX STD
Functions
(b)
Fig. 3. Noise robustness for (a) Glass and (b) Thyroid datasets
488 P. Velayuthan et al.
References
1. Asela, H.: Binary and multi-class classification using supervised machine learning
algorithms and ensemble model (2021)
2. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Mute: majority under-
sampling technique. In: 2011 8th International Conference on Information, Com-
munications & Signal Processing, pp. 1–4. IEEE (2011)
3. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic
minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
4. Choirunnisa, S., Lianto, J.: Hybrid method of undersampling and oversampling for
handling imbalanced data. In: 2018 International Seminar on Research of Informa-
tion Technology and Intelligent Systems (ISRITI), pp. 276–280. IEEE (2018)
5. Dong, Y., Wang, X.: A new over-sampling approach: random-SMOTE for learning
from imbalanced data sets. In: Xiong, H., Lee, W.B. (eds.) KSEM 2011. LNCS
(LNAI), vol. 7091, pp. 343–352. Springer, Heidelberg (2011). https://doi.org/10.
1007/978-3-642-25975-3_30
6. Douzas, G., Bacao, F., Last, F.: Improving imbalanced learning through a heuristic
oversampling method based on k-means and smote. Inf. Sci. 465, 1–20 (2018)
7. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling
method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang,
G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005).
https://doi.org/10.1007/11538059_91
8. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling app-
roach for imbalanced learning. In: 2008 IEEE International Joint Conference on
Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–
1328. IEEE (2008)
9. He, H., Ma, Y.: Imbalanced learning: foundations, algorithms, and applications
(2013)
10. Hevapathige, A.: Evaluation of deep learning approaches for anomaly detection. In:
2021 5th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI),
pp. 1–5. IEEE (2021)
11. Imambi, S., Prakash, K.B., Kanagachidambaresan, G.: Pytorch. Programming with
TensorFlow: Solution for Edge Computing Applications, pp. 87–104 (2021)
12. Junsomboon, N., Phienthrakul, T.: Combining over-sampling and under-sampling
techniques for imbalance dataset. In: Proceedings of the 9th International Confer-
ence on Machine Learning and Computing, pp. 243–247 (2017)
Revisiting Class Imbalance: A Generalized Notion for Oversampling 489
1 Introduction
A financial institution (FI) plays a crucial role in providing loans to individ-
uals and businesses, helping them to meet their financial needs and achieve
their goals. Automation brings efficiency and convenience to FIs, prioritizing
customer service and ensuring customers’ personalized assistance. It has been
widely known that FIs consist of complex and time-consuming structures to
provide loan solutions [9,14]. Hence, it is required to optimize these complex
structures by enhancing the experience and maximizing revenue generation.
Process mining is a technology that uses event data generated during process
execution, to learn and gain insights about the processes. The foundation of PM
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
D. Garg et al. (Eds.): IACC 2023, CCIS 2053, pp. 490–499, 2024.
https://doi.org/10.1007/978-3-031-56700-1_39
Unveiling AI Efficiency 491
is the event log, which contains information such as activities, timestamps, and
unique identifiers for each event. PM encompasses three main building blocks:
process discovery, process conformance, and process enhancement. The process
discovery uncovers the underlying process model without any prior knowl-
edge, by analyzing the event logs to extract the process structure and behavior.
Process conformance focuses on comparing an existing process model with
the event log to evaluate their alignment by assessing the degree of confor-
mity between the observed behavior in the event log and the expected behavior
outlined in the process model. Finally, process enhancement improves and
optimizes the overall process by incorporating insights from the event logs [1–3].
By employing these building blocks in organizations’ functioning, identification
of inefficiencies, and deviations from usual behavior, bottleneck identification
etc can be achieved. The organization can gain valuable insights, behavior, and
pattern for the smooth functioning.
Hence, the work proposes to use process mining for optimization of the loan
application process in the financial application domain. Through this approach,
various challenges like delayed loan approval, difficulty in fraud detection, etc
can be resolved. The proposed work focuses on developing a process model by
employing various process discovery algorithm using loan datasets [17,18]. The
developed model is checked for its different parameter using different process
conformance algorithms. The entire execution is deployed on PM4PY tool [8].
Figure 1 represents the workflow of the proposed approach, which develops a
model, representing the actual activities being performed for the loan applica-
tion initiation to its final acceptance or rejection stage.
In the rest of the paper, Sect. 2 delves into reviewing the existing research
works on process mining in the loan application process domain. Section 3 pro-
vides essential background for the process mining by exploring the different
algorithms of process discovery and conformance. Similarly, Sect. 4 outlines
the proposed methodology and execution of process models on PM4PY tool.
Section 5 provides the results of proposed implementation along with analyzing
the obtained results. At last, Sect. 6 discusses the conclusion of the work by
outlining the future scope of the proposed work.
2 Related Work
The proposed work focuses on applying process mining for the optimization of
the loan application process. In this regard the work [13], focused on developing
a process model for the loan application process by using BPI challenge 2017
dataset [17]. The process models are developed through four PM tool named
SQL, Power BI, Disco, and ProM. Similarly, Authors in [15], applied PM to ana-
lyze the loan application process using BPI Challenge 2017 dataset and provided
the critical analysis about the frequencies of accepted, approved, and cancelled
loans either by user or bank along with denied applications, representing the
relative analysis of all above frequencies. In [14], the author modeled the real-
life financial loan application process of the Netherlands that can accurately
represent the loan application tasks. For the deployment and analysis, the BPI
Challenge 2012 dataset [18] is used which consists of 262,200 event logs from
13,087 credit applications.
The work [16], employed PM techniques, spreadsheet-based approaches,
Disco for PM capabilities, and Classification and Regression Tree (CART) for
exploratory analytics using BPI Challenge 2012 dataset. The analysis findings
are presented, highlighting a deep understanding of the process gained from the
event log data. The study identifies potential areas for operational efficiency
improvements within the bank and explores the use of process execution knowl-
edge to predict loan application outcomes.
The work has explored the state-of-the-art for the loan application process
and it has been observed that a very limited research work is performed on the
loan application process in the financial domain. Hence, it is very much required
to perform a significant study on the loan application process, so that the banking
processes can be standardized up to a significant level. Additionally, none of the
previous research work has performed an exploratory analysis of possible PM
models corresponding to the loan application process. Therefore, the analysis
of different possible models through different available algorithms is performed
using the PM4PY tool on the loan application event log in this proposed work.
3 Background
Process Mining is a growing technology that deals with obtaining insights from
the processes and optimize it for the benefit of the organizations. The optimiza-
tion process in PM is data-driven which uses the event logs generated from the
activity being performed in the organizations. The PM is applicable in various
domains including loan process, healthcare process, e-commerce process etc.,
which uses different algorithms of process discovery and process conformance as
discussed below, for the efficient process modelling.
to each trace and aggregating the results, the overall conformance of the event
log with the Petri net model can be established.
1 rts 1 mts
F itness(petrinets, trace) = (1 − ) + (1 − ) (1)
2 pts 2 cts
Where, rts is the remaining tokens, pts is the processed tokens, mts is
the missing token, and cts is consumed token.
2. Alignment Algorithm: Alignment involves the task of matching an actual
trace with a process model to find the best possible match between the model
and traces. The purpose of employing this algorithm-based replay is to deter-
mine the optimal pair between model and traces, which yields a collection of
pairs for each trace. Each pair consists of a trace event or (representing
the absence of an event) and a model transition.
3. Footprint Algorithm: Footprints are matrices that depict the causal rela-
tionships between activities. Likewise, a process model also acquires a foot-
print through its execution and recording of activity sequences. To ensure
thoroughness, all activities that can transpire consecutively should correspond
with the footprints matrix. Through a comparison of the footprints from the
event log to the discovered process model, discrepancies can be identified and
conformance between the event log and model can be evaluated.
4.1 Dataset
In the proposed work, BPI Challange 2017 [17] (2016 to 2017) and BPI Challange
2012 [18] (October 2011 to March 2012), datasets are employed to generate loan
applications process models. Table 1 represents the technical specifications of
these datasets.
In process mining, the process models generated for any application domain
undergoes for testing using four quality metrics: simplicity, fitness, generaliza-
tion, and precision, discussed as follows. The evaluation of the model is based
on the values obtained for these metrics, ranges between 0 and 1 [6].
1. Fitness Metric: As its name suggests, it specifies how much the model fits
with the event log. Here, Fitness expresses if every possible trace is covered
by the process model.
2. Precision Metric: It illustrates the concept of underfitting, emphasizing
that the model should not excessively fit to the new traces generating in
event log and should only accept the already available traces.
3. Generalization Metric: It expresses that, the model may be overfitted with
the event log, measuring the presence of those activities that are valid but
are not recorded in a generated event logs. It is completely the opposite of
precision. It evaluates how well the process model generalizes the observed
behavior in the event log.
4. Simplicity Metric: Simplicity is a measure that quantifies the complexity
of a discovered model. A good process model should aim to minimize unnec-
essary complexities and entities, keeping the model as simple as possible.
496 A. Tripathi et al.
Fig. 2. Loan Process Model of BPI Challenge 2012 with Alpha Miner
The generated models using discovery algorithms are gone through con-
formance checking where it is re-validated with event logs. To perform this,
three conformance-checking algorithms, e.g. Footprint matrix (FP), align-
ment (ALT), and token replay (TR), were used in which four quality met-
rics i.e. Fitness (FT), precision (PR), generalization (GN), and simplicity (SM),
are employed to show the performance. The corresponding obtained results are
shown in table 2, which represents the comparative results of all three confor-
mance algorithms, drawing the following inferences.
i For the simplicity of the model, the alpha Miner algorithm performs well
as compared to other discovery algorithm e.g. token replay, alignment, and
footprint matrix algorithms. showing the significantly higher value for both
the datasets.
Unveiling AI Efficiency 497
Fig. 3. Loan Process Model BPI Challenge 2017 with Alpha Miner
ii For the model’s fitness, the algorithms inductive miner and ILP outper-
form others with a value of 1.0 in every case.
iii For the precision of models, the heuristic miner algorithm has shown its super
capability more than others in both datasets.
iv For the generalization, it has been observed that alpha, inductive and heuris-
tic algorithms have shown almost similar performance, but the ILP algo-
rithm shows its best performance, compared with other algorithms in both
the employed datasets.
Finally, after analyzing all the algorithms of process discovery and process
conformance, it has been observed that the discovery algorithms are successful
in generating the process model for the loan application process and the process
conformance verifies the model by checking the event logs and the model for
the conformance between them. All the algorithms have shown their capability
in generating a process model for loan application approval systems with their
significant strength. To evaluate the quality of the model generated by each
algorithm, 4 quality metrics fitness, precision, simplicity, and generalization are
used and their individual value corresponding to each algorithm is calculated.
Since all 4 quality metrics have their significant importance and none of them
can be prioritized, therefore choosing the best algorithm for model generation
depends on what kind of quality metrics should be in the model. Hence, based
on such quality metrics a specific algorithm can be selected.
498 A. Tripathi et al.
References
1. Book: Process Mining Wil van der Aalst Data Science. Accessed 28 Sep 2023
2. Aalst, W.: Process discovery: capturing the invisible. IEEE Comput. Intell. Mag.
5, 28–41 (2010)
3. Carmona, J., Dongen, B., Solti, A., Weidlich, M.: Conformance Checking, p. 56.
Springer, Switzerland (2018). https://doi.org/10.1007/978-3-319-99414-7
4. Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured
process models from event logs – a constructive approach. In: Colom, J.M., Desel,
J. (eds.) Application and Theory of Petri Nets and Concurrency. PETRI NETS
2013. LNCS, vol. 7927, pp 311–329. Springer, Berlin, Heidelberg (2013). https://
doi.org/10.1007/978-3-642-38697-8_17
Unveiling AI Efficiency 499
5. Weijters, A.J.M.M., Van der Aalst, W.M., Alves De Medeiros, A.K.: Process mining
with the heuristics miner-algorithm. Technische Universiteit Eindhoven, pp. 1–34
(2006). Technical report. WP 166.July 2017
6. Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: On the role of fitness,
precision, generalization and simplicity in process discovery. In: Meersman, R.,
et al. (eds.) OTM 2012. LNCS, vol. 7565, pp. 305–322. Springer, Heidelberg (2012).
https://doi.org/10.1007/978-3-642-33606-5_19
7. Berti, A., van der Aalst, W.M.P.: A novel token-based replay technique to speed
up conformance checking and process enhancement. In: Koutny, M., Kordon, F.,
Pomello, L. (eds.) Transactions on Petri Nets and Other Models of Concurrency
XV. LNCS, vol. 12530, pp. 1–26. Springer, Heidelberg (2021). https://doi.org/10.
1007/978-3-662-63079-2_1
8. Berti, A., Van Zelst, S.J., van der Aalst, W.: Process mining for python
(pm4py): bridging the gap between process-and data science (2019). arXiv preprint
arXiv:1905.06169
9. Werner, M., Wiese, M., Maas, A.: Embedding process mining into financial state-
ment audits. Int. J. Account. Inf. Syst. 41, 100514 (2021)
10. F. I. for Applied Information Technology, pm4py - Process Mining for Python.
https://pm4py.fit.fraunhofer.de/docs. Accessed 28 Dec 2022
11. Van der Aalst, W., Weijters, T., Maruster, L.: Workflow mining: discovering process
models from event logs. IEEE Trans. Knowl. Data Eng. 16(9), 1128–1142 (2004)
12. Verbeek, H., Aalst, W.: Decomposed process mining: the ILP case. In: Business
Process Management Workshops: BPM 2014 International Workshops, Eindhoven,
The Netherlands, 7–8 September 2014, Revised Papers 12, pp. 264–276 (2015)
13. Blevi, L., Delporte, L., Robbrecht, J.: Process mining on the loan application pro-
cess of a Dutch Financial Institute. BPI Challenge, pp. 328–343 (2017)
14. Moreira, C., Haven, E., Sozzo, S., Wichert, A.: Process mining with real world
financial loan applications: improving inference on incomplete event logs. PLoS
ONE 13, e0207806 (2018)
15. Carvallo, A., et al.: Applying Process Mining for Loan Approvals in a Banking
Institution. Computer Science Department, School Of Engineering Pontificia Uni-
versidad Catolica De Chile, Santiago, Chile (2017)
16. Bautista, A., Wangikar, L., Akbar, S.: Process mining-driven optimization of a
consumer loan approvals process. BPI Challenge (2012)
17. van Dongen, B.: BPI Challenge 2017. Version 1. 4TU.ResearchData. dataset (2017).
https://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b
18. van Dongen, B.: BPI Challenge 2012. Version 1. 4TU.ResearchData. dataset (2012).
https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f
Author Index
A C
Aadhithya, Aadharsh II-213 Chadha, Shikha II-90
Agarwal, Mohit II-311, II-386 Chauhan, Rosey II-90
Alpuri, Kousar Nikhath II-67 Chen, Fulei I-82, I-91
Ambala, Srinivas II-171 Chimma, Akhil II-376
Amjad, S. Abdul II-81 Chirag I-169
Ansari, Sakil Ahmad II-410 Chiwhane, Shwetambari I-178, II-171,
Anuradha, T. II-81 II-322
Aqeel, Khan Hassan I-427 Chodak, Grzegorz I-333
Aquib, Mohd II-311 Choudhary, Dharna II-123
Arora, Yogita I-321 Choudhary, Pushpa II-188, II-410
Ashisha, G. R. II-148 Choudhury, Tanupriya II-135, II-148, II-286
Awati, Anupama Sanjay I-439 Chowdhury, Subrata II-148
D
Dahotre, Rutuja Pravin I-36
B Datta, T. Manasa II-81
Babu, Kancharagunta Kishan II-376 Deshpande, Santosh. L. I-232
Babu, U. Mahesh II-81 Devarajan, Ganesh Gopal II-197
Bagane, Pooja II-104, II-171, II-259 Dewan, Basudha II-53
Bagyamani, J. II-26 Dhawan, Reha I-397
Bahuguna, Renu I-13 Dhumane, Amol I-178, II-171, II-322
Bajaj, Peehu I-284 Divya Priya, R. II-26
Baldua, Shreya II-104 Dongre, Sumedh II-104
Banodha, Umesh I-374 Dubey, Arun Kumar I-100, I-321
Bansal, Payal II-53 Duraivelu, Vivek I-3
Bathla, Priyanka II-286 Dwivedi, Amit Kumar II-386
Baviskar, Vaishali I-258, I-267
Bayas, Ameysingh I-193, II-322 F
Bedekar, Kalyani Ashok I-439 Fang, Tianrui I-91
Bhalerao, Ritesh I-57
Bhandarkar, Mrunalini S. II-53 G
Bhardwaj, Saurabh I-13 Galustian, Andre I-242
Bhargava, Parth I-397 Gandhi, Rashmi II-246
Bhasin, Harsh I-169, II-180 Gaurav, Shashadhar II-259
Bhatia, Abhay I-24 Gehlot, Vansh I-205
Bhosale, Amrut P. I-72 Gite, Shilpa I-193
Bhosale, Devashri I-267 Gogna, Manas II-322
Bhosale, Digvijay G. I-72 Goswami, Anurag I-466
Błażyczek, Klaudia I-333 Gowda, Y. Jeevan I-47
Bongale, Arunkumar I-284 Gull, Karuna C. I-111
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2024
D. Garg et al. (Eds.): IACC 2023, CCIS 2053, pp. 501–504, 2024.
https://doi.org/10.1007/978-3-031-56700-1
502 Author Index