0% found this document useful (0 votes)
46 views

Employee Attrition Analysis of Data Driven Models

This document discusses predicting employee attrition using machine learning techniques. It analyzes various machine learning models on an IBM dataset containing personnel data. The most successful model was a feedforward neural network (FNN) deep learning technique, achieving 97.5% accuracy, 83.93% recall, and 91.26% F1-score in predicting employee attrition. Previous studies on this topic used methods like random forests, support vector machines, and K-nearest neighbors to analyze employee data and identify factors influencing attrition. Accurately predicting attrition can help companies reduce costs from hiring and training replacements.

Uploaded by

Jess Alex
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Employee Attrition Analysis of Data Driven Models

This document discusses predicting employee attrition using machine learning techniques. It analyzes various machine learning models on an IBM dataset containing personnel data. The most successful model was a feedforward neural network (FNN) deep learning technique, achieving 97.5% accuracy, 83.93% recall, and 91.26% F1-score in predicting employee attrition. Previous studies on this topic used methods like random forests, support vector machines, and K-nearest neighbors to analyze employee data and identify factors influencing attrition. Accurately predicting attrition can help companies reduce costs from hiring and training replacements.

Uploaded by

Jess Alex
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

EAI Endorsed Transactions

on Internet of Things Research Article

Employee Attrition: Analysis of Data Driven Models


Manju Nandal1,*, Veena Grover2, Divya Sahu3, Mahima Dogra4

1, 2, 3, 4
Assistant professor, Noida Institute of Engineering & Technology, Greater Noida

Abstract

Companies constantly strive to retain their professional employees to minimize the expenses associated with recruiting and
training new staff members. Accurately anticipating whether a particular employee is likely to leave or remain with the
company can empower the organization to take proactive measures. Unlike physical systems, human resource challenges
cannot be encapsulated by precise scientific or analytical formulas. Consequently, machine learning techniques emerge as
the most effective tools for addressing this objective. In this paper, we present a comprehensive approach for predicting
employee attrition using machine learning, ensemble techniques, and deep learning, applied to the IBM Watson dataset.
We employed a diverse set of classifiers, including Logistic regression classifier, K-nearest neighbour (KNN), Decision
Tree, Naïve Bayes, Gradient boosting, AdaBoost, Random Forest, Stacking, XG Boost, “FNN (Feedforward Neural
Network)”, and “CNN (Convolutional Neural Network)” on the dataset. Our most successful model, which harnesses a
deep learning technique known as FNN, achieved superior predictive performance with highest Accuracy, recall and F1-
score of 97.5%, 83.93% and 91.26%.

Keywords: Employee attrition, Ensemble learning, Deep learning, Machine learning

Received on 02 November 2023, accepted on 23 December 2023, published on 03 January 2024

Copyright © 2023 M. Nandal et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA
4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the
original work is properly cited.

doi: 10.4108/eetiot.4762

*
Corresponding author. Email: [email protected]

environment, excessively long office hours, and inadequate


compensation. This deliberate decision to leave, initiated by
1. Introduction the employees themselves, is commonly referred to as
voluntary attrition. The primary aim of HR departments is to
The competitiveness among organizations and companies comprehend the underlying reasons for voluntary employee
hinges significantly on workforce productivity. Creating and attrition and formulate a corresponding strategy for
sustaining an appropriate environment is the essential factor mitigation. Recognizing and harnessing the existing talent
that fosters stable and cooperative employees. The Human within an organization stands as one of the foremost
Resource (HR) department plays a pivotal role in shaping challenges and critical priorities in talent management. In
such an environment through the analysis of employee any organization, human resources assume a pivotal role in
database records [1]. A robust workforce contributes to shaping strategic decisions. Contented, deeply motivated,
heightened productivity, cost-efficiency, and overall and committed employees form the bedrock of a company,
profitability for a company. These benefits are unattainable subsequently influencing the productivity of the entire
without the pivotal role played by human resources. When organization.
an organization struggles to retain its employees, it can lead The role of the Human Resources (HR) department in
to sustained losses over the long term. This phenomenon, fostering such an atmosphere is pivotal, and it is achieved
often referred to as employee attrition [2], is a phenomenon through the thorough examination of employee database
within an organization wherein employees choose to depart records. This analysis equips the administration with the
for a variety of reasons. These factors may encompass tools to improve decision-making processes, effectively
personal or professional motivations, an unsuitable work addressing the challenge of employee attrition. Historically,

EAI Endorsed Transactions on


Internet of Things
1 | Volume 10 | 2024 |
M. Nandal et al.

inquiries related to employee attrition and retention have attributes impact the predictive variable known as 'Attrition'.
been approached through qualitative and anecdotal methods. It consists of a total of 1,470 instances and encompasses 35
Typically, HR personnel conduct exit interviews when an attributes, providing a comprehensive dataset for analysis.
employee tenders their resignation, aiming to uncover the
underlying reasons behind their departure. In the current age
marked by the fourth industrial revolution, powered by 2. Related Work
advanced technologies such as predictive analytics that
employ statistical modelling techniques and machine Employee attrition issues were studied by researchers
learning, predicting the likelihood of an employee departing from various viewpoints. Researchers harnessed machine
from an organization is now within reach. Organizations learning techniques to predict employee attrition by
utilize machine learning algorithms to forecast the analysing data pertaining to the employees. This
probability of employee attrition and proactively implement investigation involved the utilization of several machine
measures to prevent such occurrences [3]. learning methods, including Random Forests (RF), Support
Machine learning represents a facet of artificial intelligence Vector Machines (SVM), and K-Nearest Neighbours (KNN)
(AI) technology that equips systems with the capability to while exploring various parameter configurations [8], There
autonomously acquire knowledge and refine their are researchers in [9] chose to utilize Classification Trees
performance through experience, mirroring human-like and Random Forest for the purpose of predicting employee
intelligence without the need for explicit programming [4]. attrition. Their approach commenced with dataset pre-
Machine learning (ML) stands as one of the most rapidly processing, where they excluded less influential variables
advancing research fields, showcasing successful based on Pearson correlation analysis.
development and application across a diverse array of real- A study utilizing the [4] IBM HR Employee Attrition &
world domains. Due to the expenses associated with hiring Performance dataset revealed an inherent data imbalance
employees, providing training, and acquiring intellectual issue. During the data exploration phase, the researchers
property, it becomes paramount to ensure a minimal attrition employed correlation plots and histogram visualizations to
rate (employee turnover) within organizations [5]. Employee assess the relationships among continuous variables in the
attrition imposes significant financial burdens on a model. Following this analysis, the “Synthetic Minority
company, encompassing expenses such as business Oversampling Technique (SMOTE)” was utilized to rectify
disruption costs, recruitment and onboarding of new the imbalance within the Attrition class [10]. To tackle the
employees, and training of newcomers [6]. While recruiting challenge of predicting employee turnover, we introduced a
top talent is vital for organizations, it is equally crucial to novel approach: a weighted quadratic random forest
ensure their satisfaction and retention. Employees have their algorithm. The algorithm was utilized with a dataset of
unique criteria for selecting and committing to an employees gathered from a branch of a telecommunications
organization, and if their expectations are not met, they may company located in China [10]. The researchers presented a
choose to resign. This can result in employee attrition, often comprehensive three-stage framework for predicting
referred to as the phenomenon of employee churn [7]. attrition. In the first stage, they applied the "max-out"
Lately, leading companies such as IBM, HCL, TCS, and feature selection method to refine the data. Following this,
others have grappled with employee attrition challenges. By in the second stage, a logistic regression model was trained
gathering employee feedback regarding various aspects, for prediction. Finally, the third stage involved conducting
including the company's culture, work environment, confidence analysis to enhance the reliability of the
workload, job satisfaction, and more, organizations can prediction model. However, it's worth noting that the system
employ statistical methods to predict attrition status. faces challenges, including suboptimal accuracy and
Hence, attrition must be dealt with utmost importance and elevated complexity due to the preprocessing and
measures must be taken by organizations to prevent this [4]. postprocessing step [11]. Taylor et al. [7] Tree-based
Consequently, forecasting employee attrition and models, specifically light Gradient Boosted Trees and
pinpointing the key factors that contribute to attrition random forests were utilized to make predictions regarding
emerge as crucial objectives for organizations seeking to employee attrition. These models demonstrated robust
bolster their human resource strategies. This paper delves performance, with the light gradient boosted trees exhibiting
into the application of classification and clustering particularly strong results. The study utilized a custom
techniques for analysing attrition. It conducts a comparative dataset comprising 5550 samples for their analysis. Machine
assessment to evaluate the accuracy of different data mining learning serves a wide array of applications, encompassing
algorithms using Weka, a collection of machine learning tasks from prediction to the classification of various HR data
algorithms employed for data mining purposes. In this parameters and features [12] the study focuses on the early
study, we employed the IBM Human Resource Analytics prediction of employee turnover, considering variables like
Performance dataset and Employee Attrition which is a absenteeism, tardiness, and employee indifference as
publicly accessible dataset accessible through the Kaggle significant factors influencing employee performance
Dataset Repository. This dataset was generated by IBM data forecasting. Fallucchi et al [13] conducted research and used
scientists for research purposes and comprises four primary a variety of machine learning approaches to identify the
components: seniority, employee satisfaction, income, and circumstances that may cause an employee to leave the
demographic information. Inside the dataset, numerous organization. The best recall value was provided by the

EAI Endorsed Transactions on


Internet of Things
2 | Volume 10 | 2024 |
Employee Attrition: Analysis of Data Driven Models

Gaussian Nave Bayes classifier, which contributes to the two critical characteristics examined in our trials were job
classifier's capacity to detect positive occurrences. The study satisfaction and job involvement. Attrition affected
[14] provided a hybrid model for anticipating client attrition. approximately 28.29% of employees with low job
satisfaction or job involvement. Furthermore, approximately
31.25% of employees with an unfavourable work-life
balance left the organization, compared to 17.65% of
3. Dataset Description departing employees with a good work-life balance. Figure
1 shows the correlation between the target variable i.e.,
The study used a dataset of 1,470 instances, which employee attrition and other variables Heatmap analysis
comprised detailed information about all employees and 35 reveals a strong correlation between job satisfaction,
features, including the target class. When the gender overtime, job level, monthly income, job involvement and
variable was examined, it was determined that 60% of the age with respect to attrition in the dataset.
employees were male and 40% were female. Surprisingly,

Figure 1: Dataset

4. Data Analysis
Based on heatmap the highly correlated attributes with
target variable employee attrition are Overtime, job
satisfaction, job level, monthly income, age and job
involvement. The Figure 2 shows the bar plot of the
correlated variables with the target variable i.e. Employee
attrition.
According to the plots employee attrition is higher when
overtime is increased, employee attrition is lower when job
satisfaction, job level, age and job involvement is higher.

EAI Endorsed Transactions on


Internet of Things
3 | Volume 10 | 2024 |
M. Nandal et al.

Figure 2: Plot of various attributes with respect to


attrition

5. Proposed Methodology
The proposed methodology (figure 3) consists of five
phases: data collection, data preprocessing,
classification using Ensemble Machine learning,
machine learning and Deep learning algorithms with
hyper parameter tuning, performance metric evaluation
to assess the effectiveness of the various algorithms,
and finally, selecting the best model of employee
attrition based on a comparative study of performance
metrics. After using Principal component analysis
feature selection, the data is divided into two parts:
75% for testing and 25% for training.

EAI Endorsed Transactions on


Internet of Things
4 | Volume 10 | 2024 |
Employee Attrition: Analysis of Data Driven Models

Figure 3: Proposed Methodology

study of the results and to visualize the relationships


between different variables.
5.1 Data Encoding
To enable machine learning algorithms to process 5.3 Feature scaling
categorical features such as 'Department,' 'Education,'
'Gender,' and 'Work-Life Balance' from the dataset, it was In HR datasets, it is common for features to exhibit
imperative to transform them into numerical varying scales. For instance, in the IBM Attrition dataset,
representations. To accomplish this, Label Encoding was employee ages span a range from 18 to 60 years, while
employed to create numeric representations for these monthly income varies from $2,094 to $26,999. However,
categorical features. Following the feature selection such significant disparities in feature scales can impede
process, these categorical features were further the efficiency of optimization algorithms like gradient
transformed into distinct binary columns containing descent. Consequently, feature scaling plays a pivotal role
values of 0 and 1. This expansion of dimensions was in potentially enhancing both classification performance
achieved by generating a separate column for each unique and learning efficiency in certain machine learning
value present in every original column within our dataset. algorithms.

5.2 Model Training 5.4 Dataset Preprocessing


Furthermore, the dataset is pre-processed to make it There are 1470 occurrences and 34 attributes with no
appropriate for model training. Following data pre- missing values in the dataset. As dataset contains no
processing, the model moves on to the training phase, missing value so we have applied Min- Max scaling to
where the dataset is divided into 75% training and 25% scale numerical features within a specific range. This
test sets. Following that, data modelling occurs. Support technique transforms the data in such a way that it falls
Vector Machines (SVM), Decision Tree Classifier (DTC), within a specified range, often between 0 and 1.
Random Forest Classifier (RFC), Gaussian Naive Bayes
(GNB), Logistic Regression (LR), and K-Neighbors
(KNN) are among the machine learning algorithms used
to determine the algorithm with the best accuracy.
Multiple graphs were used to provide a more detailed

Figure 4: Data Pre – processing Min- Max Scaling

EAI Endorsed Transactions on


Internet of Things
5 | Volume 10 | 2024 |
M. Nandal et al.

5.6 Ensemble Machine learning


5.5 Machine Learning Models
5.6.1 Gradient Boosting: Gradient Boosting Model is
5.5.1. Support Vector Machine: Support Vector a machine learning ensemble technique used primarily for
Machine (SVM) is indeed a technique used for supervised learning tasks, such as classification and
classification tasks. It constructs a hyperplane to separate regression. It is designed to improve the predictive
two classes and aims to maximize the margin between this accuracy of a model by combining the predictions of
hyperplane and the closest data points from each class. multiple weaker models (typically decision trees) into a
These data points are represented as vectors in a high- more powerful and accurate ensemble model.
dimensional space, and SVM finds the hyperplane that
best separates them. 5.6.2 AdaBoost: Ada Boost is a machine learning
algorithm that uses boosting to improve the performance
5.5.2 logistic regression: It is a powerful analytical of weaker learners [27]. To begin, an initial classifier is
tool for predicting binary outcomes. It transforms the trained on the original dataset. Following then, new
relationship between independent and dependent variables copies of the classifier are trained over several iterations,
into probabilities, facilitating the assessment of event each with the explicit objective of fixing errors caused by
occurrence likelihood. Moreover, it provides a range of its predecessor. Various subsets of the dataset are formed
performance metrics that aid in evaluating and fine-tuning during these cycles by assigning variable weights to
the model's predictive capabilities. Some of the key individual data components. Instances that were
results that can be derived from logistic regression include misclassified in previous rounds are given higher weights,
accuracy, recall, F1 score, ROC (Receiver Operating improving their chances of inclusion in subsequent
Characteristic) curve, precision, and the construction of a subgroups. This iterative procedure is repeated numerous
confusion matrix. These metrics help assess the model's times, resulting in the sequential training of several
performance, its ability to discriminate between the two models. To produce a robust classifier, these initially
classes, and its precision in predicting outcomes. weaker classifiers are integrated using a specified cost
function. The accuracy of each individual classifier
5.5.3 K-Nearest Neighbor (KNN): It is a influences the final prediction, with higher accuracy
straightforward algorithm that relies on the storage of all classifiers bearing greater weight in the ensemble.
available data cases to classify new, unseen data points. Random Forest is a classification system based on
KNN is often referred to as a "Lazy Learner" because it decision tree concepts. This method, true to its name,
lacks a discriminative function derived from the training creates a forest out of several individual trees. It is under
data. Instead, it retains and memorizes the entire training the umbrella of the ensemble algorithm category, which
dataset without undergoing a traditional model learning includes techniques that create predictions by combining
phase. various algorithms.

5.5.4 Naive Bayes: It is a probabilistic machine 5.6.3 Random Forest: Random Forest constructs an
learning technique used for text classification and ensemble of decision trees from random subsets of the
classification. It is based on Bayes' theorem, which training dataset. This method is done iteratively with
calculates the likelihood of a specific event occurring different random subsets, with a majority consensus
based on past knowledge of conditions that may be among these trees determining the conclusion.
associated with the event. The chance that a given data
point (such as a document or an item) belongs to a 5.6.4 Stacking model: Stacking model also known as
specific class or category is calculated using Naive Bayes. stacked generalization, or stacking ensemble, is an
It is assumed that the features used for categorization are advanced machine learning technique used for improving
conditionally independent, which means that the presence predictive performance. It combines the predictions of
or absence of one trait has no bearing on the presence or multiple base models (often diverse in nature) by training
absence of another. This is a "naive" assumption that a meta-model, or "stacker," on top of them. Stacking can
simplifies calculations and allows the algorithm to be significantly improve predictive performance compared to
more tractable. individual base models because it leverages the strengths
of different models and combines them to produce a more
5.5.6 Decision Tree: It is a graphical representation robust and accurate prediction. It is a powerful technique
resembling a tree that helps in the decision-making in machine learning and is often used in competitions and
process. Each branch of the tree represents a potential real-world applications where achieving the best possible
decision, event, or response. Decision Trees can be predictive accuracy is crucial.
employed for both classification and regression tasks. In
classification, they are used to categorize data into 5.6.5 XG Boost: XG Boost stands for Extreme
discrete classes, whereas in regression, they predict Gradient Boosting, is a highly popular and powerful
numerical or continuous values. machine learning algorithm that is widely used for both
regression and classification tasks. It belongs to the

EAI Endorsed Transactions on


Internet of Things
6 | Volume 10 | 2024 |
Employee Attrition: Analysis of Data Driven Models

𝑡𝑡𝑝𝑝
ensemble learning family and is specifically designed to Precision = (3)
improve the accuracy and efficiency of decision tree- 𝑡𝑡𝑝𝑝++𝑓𝑓𝑛𝑛
based models.
5.7 Deep Learning Models F1-score: The F1 score is a well-defined metric that
represents the harmonic mean of precision and recall .
5.7.1 Feedforward Neural Network (FNN): A F1 score = 2 × (4)
Feedforward Neural Network (FNN), sometimes known
as a Multilayer Perceptron (MLP), is a deep learning Where: 𝑡𝑡𝑡𝑡 is correctly predicted, 𝑓𝑓𝑓𝑓 is incorrectly
artificial neural network. Its architecture is distinguished predicted instances, 𝑡𝑡𝑡𝑡 is negatively predicted instances
by many layers of neurons, including an input layer, one and 𝑓𝑓𝑓𝑓 is the negatively predicted instances.
or more hidden layers, and an output layer. A
Feedforward Neural Network (FNN) is a deep learning
model made up of layers of artificial neurons that are 6. Results
coupled. It is intended to process and transform incoming
data through a sequence of mathematical operations, This section provides an analysis of the results
creating an output in the end. obtained from different machines and deep learning
classification models. The aim of this study is to evaluate
5.7.2 Convolutional Neural Network (CNN): the classification effectiveness of both machine learning
“Convolutional Neural Network (CNN)” is a deep and deep learning algorithms when applied to the task of
learning model specifically tailored for tasks involving categorizing employee attrition. In this study, a wide array
visual data, characterized by its use of convolutional of learning algorithms was utilized and assessed using the
layers to automatically learn and extract features from employee attrition dataset. The ML algorithms
images or other grid-like data. A “Convolutional Neural encompassed traditional methods such as SVM, logistic
Network (CNN)” is a deep learning model specifically regression, KNN, decision tree, and naive Bayes.
tailored for tasks involving visual data, characterized by Additionally, ensemble machine learning algorithms,
its use of convolutional layers to automatically learn and including Gradient boosting, XG-Boost, AdaBoost,
extract features from images or other grid-like data. The random forest, and stacking, were employed.
following metrics are examined to determine a model's Furthermore, the study also incorporated deep learning
effectiveness. techniques, specifically “Convolutional Neural Networks
To evaluate the effectiveness of a model the following (CNN)” and feedforward neural network (FNN). To
metrics are examined: assess the performance of these models, multiple
evaluation metrics were employed namely recall, F1
Accuracy: It is a performance metric used to assess the score, precision, accuracy, area under the ROC and
model's overall effectiveness when all classes carry equal precision-recall curve. The evaluation of results includes
significance. It is calculated as the ratio of correctly the use of performance metrics such as recall
predicted instances to the total number of predictions (Sensitivity), F1-score, precision, accuracy, and AUC,
made. This metric provides a measure of how well the with the corresponding scores detailed in Table 3.
model performs across all classes.
6.1. Machine Learning Models
tp+tn
Accuracy = (1) Table 1 provides an extensive assessment of the machine
tp+tn + fp+fn learning models. Out of the various models evaluated, the
Naïvebayes model demonstrated superior performance,
Recall, also known as sensitivity or true positive rate, achieving an accuracy and F1-Score of 0.541 % and
measures the model's ability to correctly recognize 0.908% respectively. Furthermore, both the Naïve bayes
positive samples. It is derived by dividing the total and Logistic Regression models provide highest precision,
number of positive samples by the number of correctly reaching at 0.769 and 0.727 respectively.
categorized positive samples. A greater recall value
suggests that the model accurately identifies more positive
samples. Table 1: Performance analysis of Machine Learning
𝑡𝑡𝑝𝑝
Recall = (2) model for Employee Attrition
𝑡𝑡𝑝𝑝++𝑓𝑓𝑝𝑝
Precision is a performance statistic that assesses the
model's ability to categories positive samples properly. It
is derived by dividing the total number of positive Model Accuracy Precision F1- Recall
samples by the number of correctly categorized positive Score
samples. that the model predicted as positive, whether SVM 0.878 0.636 0.236 0.146
they were classified correctly or incorrectly. Precision KNN 0.870 0.500 0.200 0.125
gauges how effectively the model identifies true positives Naïve 0.908 0.769 0.541 0.417
among all positive predictions. bayes

EAI Endorsed Transactions on


Internet of Things
7 | Volume 10 | 2024 |
M. Nandal et al.

Decision 0.861 0.440 0.301 0.229 selected positive instance compared to a randomly
Tree selected negative instance. The closer the curve
Logistic 0.897 0.727 0.457 0.333 approaches the top-left corner, the more effective the
Regression classifier is. The ROC curves for each Machine learning
classifier are shown in Figure 5. Ensemble Machine
6.2. Ensemble Machine Learning Models learning in Figure 6. and Deep learning classifier in
Figure 7.
In machine learning, logistic regression has the highest
Table 2 provides a depth analysis of Ensemble machine
ROC score of 0.827. In ensemble machine learning
learning models. Among these models, the Stacking
methods, gradient boosting does even better with a ROC
model achieved the highest recall, accuracy, and F1-Score
score of 0.84. However, in the field of deep learning, the
respectively. Additionally, Gradient boosting, AdaBoost
Feedforward Neural Network (FNN) outperforms them
and XG- Boost techniques exhibited good accuracy
all, with the highest ROC score of 0.92.
levels.

Table 2: Performance analysis of Ensemble Machine


Learning Model for Employee Attrition

Model Accuracy Precision F1- Recall


Score
Gradient 0.894 0.765 0.400 0.271
Boosting
AdaBoost 0.883 0.619 0.377 0.271
Random 0.872 1.00 0.41 0.021
Forest
Stacking 0.899 0.789 0.448 0.312
XG-Boost 0.886 0.650 0.382 0.271

6.2. Deep learning Models

Table 3 provides a detailed analysis of Deep learning


models. Among deep learning models, the FNN model
outperforms with highest Accuracy, recall and F1-score of
97.5%, 83.93% and 91.26%, respectively. Furthermore,
FNN exhibited the highest precision score of 100%. Fig5. ROC Plot for ML models.

Table 3: Performance analysis of Deep Learning


Model for Employee Attrition

Model Accuracy Precision F1- Recall


Score
FNN 0.975 1.00 0.9126 0.8393
CNN 0.942 0.8148 0.8000 0.7857

6.3. Performance of the Models based on


ROC plot.
The Receiver Operating Characteristic curve shows
how the threshold change affects the connection between
true positive rate and false positive rate. The ROC curve
provides an assessment of the classifier's overall
predictive performance. It quantifies the likelihood that
the classifier will assign a higher rank to a randomly

EAI Endorsed Transactions on


Internet of Things
8 | Volume 10 | 2024 |
Employee Attrition: Analysis of Data Driven Models

7. Comparison of existing work

Autho Datase Method Result Proposed


r t (Accuracy Work
) (Accuracy
)
[12] IBM KNN 86.00 % 87.00%
Dataset
[3] IBM Naïve 80.9% 90.8%
Dataset Bayes
[14] IBM Logistic 79.60 89.7%
Dataset Regressio
n
[15] IBM Decision 82.31% 86.1%
Dataset tree
[16] IBM AdaBoost 87.03% 88.3%
Dataset
[17] IBM Decision 84.73% 86.1%
Dataset tree

Conclusion:
This paper explored the impact of voluntary attrition on
organizations and underscored the significance of
Fig6. ROC Plot for Ensemble Models
predictive modeling in addressing this issue. It provided
an overview of various supervised learning classification
algorithms employed to tackle the problem of predicting
employee attrition, using the IBM HR dataset for
evaluation. Initially, five foundational models were
trained and assessed. Subsequently, five ensembles were
created by leveraging various combinations of these five
base models. Two deep learning models were tested. The
findings revealed that linear models outperformed others
in terms of accuracy, recall, and AUC. Furthermore, deep
learning models, particularly the FNN approach, exhibited
exceptional accuracy. In contrast, other machine learning
models displayed a wider range of accuracy, spanning
from 86% to 94%. These results emphasize the potential
of both deep learning and ensemble machine learning
techniques in achieving high classification accuracy. As a
result, the authors recommend employing the FNN
classifier for precise predictions of employee attrition
within an organization. This approach empowers HR to
take proactive measures in retaining employees identified
as being at risk of leaving.

Fig7. ROC Plot for DL models.

EAI Endorsed Transactions on


Internet of Things
9 | Volume 10 | 2024 |
M. Nandal et al.

References Machine Learning and Ensemble Methods. Int. J. Mach.


Learn. Comput. 11, 110–114 (2021).
https://doi.org/10.18178/ijmlc.2021.11.2.1022.
[16] Arqawi, S., Abu Rumman, M.A., Zitawi, E., Rabaya, A.,
[1] Mohbey, K.: Employee’s Attrition Prediction Using Sadaqa, A., Abunasser, B., Abu-Naser, S.: PREDICTING
Machine Learning Approaches. Presented at the January 1 EMPLOYEE ATTRITION AND PERFORMANCE
(2020). https://doi.org/10.4018/978-1-7998-3095-5.ch005. USING DEEP LEARNING. 100, 6526–6536 (2022).
[2] Alduayj, S.S., Rajpoot, K.: Predicting Employee Attrition [17] Kamath, R., Jamsandekar, S., Naik, P.: Machine Learning
using Machine Learning. In: 2018 International Approach for Employee Attrition Analysis. Int. J. Trend
Conference on Innovations in Sci.Res. Dev. Special Issue, 62–67 (2019).
Information Technology (IIT). pp. https://doi.org/10.31142/ijtsrd23065.
93–98 (2018).
https://doi.org/10.1109/INNOVATIONS.2018.8605976.
[3] Yedida, R., Reddy, R., Vahi, R., Jana, R., Gv, A.,
Kulkarni, D.: Employee Attrition Prediction. (2018).
[4] Mansor, N., S Sani, N., Aliff, M.: Machine Learning for
Predicting Employee Attrition. Int. J. Adv. Comput. Sci.
Appl. 12, 435–445 (2021).
https://doi.org/10.14569/IJACSA.2021.0121149.
[5] Abdulkareem, A.B., Sani, N., Sahran, S., Abdi, Z.,
Alyessari, A., Adam, A., Rahman, A.H.A., Abdulkarem,
A.: Predicting COVID-19 Based on Environmental Factors
WithMachine Learning. Presented at the (2021).
[6] Pratt, M., Boudhane, M., Cakula, S.: Employee Attrition
Estimation Using Random Forest Algorithm. Balt. J. Mod.
Comput. 9, (2021).
https://doi.org/10.22364/bjmc.2021.9.1.04.
[7] El-rayes, N., Smith, M., Taylor, S.M.: An Explicative and
Predictive Study of Employee Attrition using Tree-based
Models, https://papers.ssrn.com/abstract=3397445, (2019).
https://doi.org/10.2139/ssrn.3397445.
[8] Srivastava, Dr.P., Eachempati, P.: Intelligent Employee
Retention System for Attrition Rate Analysis and Churn
Prediction: An Ensemble Machine Learning and Multi-
Criteria Decision-Making Approach. J. Glob. Inf. Manag.
29, 1– 29 (2021).
https://doi.org/10.4018/JGIM.20211101.oa23.
[9] Yadav, S., Jain, A., Singh, D.: Early Prediction of
Employee Attrition using Data Mining Techniques.
Presented at the December 1 (2018).
https://doi.org/10.1109/IADCC.2018.8692137.
[10] Najafi, S., Shams Gharneh, N., Nezhad, A., Zolfani, S.: An
Improved Machine Learning-Based Employees Attrition
Prediction Framework with Emphasis on Feature
Selection. (2021). https://doi.org/10.3390/MATH9111226.
[11] Gao, X., Wen, J., Zhang, C.: An Improved Random Forest
Algorithm for Predicting Employee Turnover. Math. Probl.
Eng. 2019, 1–12 (2019).
https://doi.org/10.1155/2019/4140707.
[12] Bhatta, S., Zaman, I.U., Raisa, N., Fahim, S.I., Momen, S.:
Machine Learning Approach to Predicting Attrition
Among Employees at Work. In: Silhavy, R. (ed.) Artificial
Intelligence Trends in Systems. pp. 285–294. Springer
International Publishing, Cham (2022).
https://doi.org/10.1007/978-3-031-09076-9_27.
[13] Fallucchi, F., Coladangelo, M., Giuliano, R., De Luca, E.:
Predicting Employee Attrition Using Machine Learning
Techniques. Computers. 9, 86 (2020).
https://doi.org/10.3390/computers9040086.
[14] Joseph, R., Udupa, S., Jangale, S., Kotkar, K., Pawar, P.:
Employee Attrition Using Machine Learning And
Depression Analysis. Presented at the May 6 (2021).
https://doi.org/10.1109/ICICCS51141.2021.9432259.
[15] Qutub, A., Al-Mehmadi, A., Al-Hssan, M., Aljohani, R.,
Alghamdi, H.: Prediction of Employee Attrition Using

EAI Endorsed Transactions on


Internet of Things
10 | Volume 10 | 2024 |

You might also like