Nishanth Project
Nishanth Project
PERFORMANCE PREDICTION
A PROJECT REPORT
submitted to the
SRM Institute of Science and Technology (Deemed to be University), Chennai
in partial fulfillment of the requirements for the award of the Degree of
NISHANTH R
[DA22523050147]
Dr. G. Dinesh
(Assistant Professor, Department of Computational Intelligence)
BONAFIDE CERTIFICATE
This is to certify that the Project Work entitled “HR Analytics for Employee Attrition
and Performance Prediction” submitted by NISHANTH. R [DA2252305010147] of MBA,
Directorate of Distance Education, SRM Institute of Science and Technology, Kattankulathur
is a Bonafide
Record of Project Work carried out by him in partial fulfillment of the requirement for the
award of degree of Master of Business Administration.
SIGNATURE SIGNATURE
Dr. Dinesh G Dr. Daniel Rajkumar M
SUPERVISOR PROGRAM COORDINATOR,
Assistant Professor, Assistant Professor,
Dept. of CINTEL, MBA Department,
SRMIST, KTR SRMIST-DDE, KTR
ii
DECLARATION
I hereby declare that the Project Work entitled “HR Analytics for Employee Attrition
and Performance Prediction” submitted by me for partial fulfillment of the degree of Master
of Business Administration, under the guidance of Dr. G. Dinesh, (Assistant Professor,
Department of Computational Intelligence) SRMIST-DDE, KTR. SRM Institute of Science and
Technology is my original work and has not been submitted earlier to any other
University/Institutions. The matter presented in this project report has not been submitted
elsewhere for the award of any other degree/diploma. I declare that I have faithfully
acknowledged, given credit to and referred to the research workers wherever their works
have been cited in the text and the body of the project I further certify that I have not willfully
lifted up some other’s work, Para, text, data, results, etc., reported in the journals, books,
magazines, reports, dissertations, theses, etc., or available at web-sites and have not included
them in this project report and cited as my own work.
Declaration:
I am aware of and understand the University’s policy on Academic misconduct and
plagiarism and certify that this assessment is my own work, except were indicated by
referring, and that I have followed the good academic practices noted above.
Nishanth R
DA22523050147
iii
ACKNOWLEDGEMENT
I express my deep sense of gratitude to the Director Dr. R. Rajagopal, and Program
Coordinator Dr. M. Daniel Rajkumar, for their whole-Hearted encouragement. I am indebted
to my Research Supervisor, Dr. G. Dinesh., Assistant Professor (CINTEL Department) for his
continuous guidance and encouragement to complete my Project Work in a successful
manner.
I am also thankful to all the faculty and staff members of the Department of Distance
Education for their support and Guidance.
Nishanth R
[DA2252305010147]
iv
TABLE OF CONTENTS
1. Introduction to the Study ………………………………………………………………………………………… 1
1.1 Introduction 1
1.2 Employee Attrition 2
1.3 Employee Retention 2
1.4 Objective 3
1.5 Motivation 3
1.6 Scope of Work 3
1,7 Research Questions 4
1.8 Importance 4
1.9 Relevance 4
2. Literature Review …………………………………………………………………………………………………… 5
3. Problem Statement ………………………………………………………………………………………………… 11
3.1 Description 11
3.2 Importance to Solve the Problem 11
3.3 Background Information Necessary to Understand the Problem 12
4. Methodology ………………………………………………………………………………………………………….. 13
4.1 Approach 14
4.2 Techniques 14
5. System Architecture ………………………………………………………………………………………………… 16
5.1 Architecture Diagram 16
5.2 Explanation of Components and Interaction 16
6. Implementation ……………………………………………………………………………………………………… 19
7. Data Exploration and Processing ……………………………………………………………………………. 22
8. Data Visualization …………………………………………………………………………………………………… 25
8.1 Dashboard Visuals 25
8.2 Attrition Page 26
8.3 Visualising Employee Attrition Rate 27
8.4 Analysing Employee Attrition by Gender 28
8.5 Analysing Employee Attrition by Age 29
8.6 Analysing Employee Attrition by Business Travel 30
v
8.7 Analysing Employee Attrition by Department 31
8.8 Analysing Employee Attrition by Daily Rate 32
8.9 Analysing Employee Attrition by Distance From Home 33
8.10 Analysing Employee Attrition by Education 34
8.11 Analysing Employee Attrition by Education Field 35
8.12 Analysing Employee Attrition by Environment Satisfaction 36
8.13 Analysing Employee Attrition by Job Roles 37
8.14 Analysing Employee Attrition by Job Level 38
8.15 Analysing Employee Attrition by Job Satisfaction 39
8.16 Analysing Employee Attrition by Marital Status 40
8.17 Analysing Employee Attrition by Monthly Income 41
8.18 Analysing Employee Attrition by Work Experience 42
8.19 Analysing Employee Attrition by Overtime 43
8.20 Analysing Employee Attrition by Salary Hike 44
8.21 Analysing Employee Attrition by Performance Rating 45
8.22 Analysing Employee Attrition by Relationship Satisfaction 46
8.23 Analysing Employee Attrition by Work Life Balance 47
8.24 Analysing Employee Attrition by Total Working Experience 48
8.25 Analysing Employee Attrition by Years at Company 49
8.26 Analysing Employee Attrition by Years in Current Role 50
8.27 Analysing Employee Attrition by Years since last promotion 51
8.28 Analysing Employee Attrition by Years with current Manager 52
9. Statistical Analysis ………………………………………………………………………………………………….. 53
10. Data Modeling ……………………………………………………………………………………………………… 55
11. Conclusion ……………………………………………………………………………………………………………. 56
vi
ABSTRACT
Keywords: Machine Learning, Power BI, Data Visualization, Attrition rate prediction
vii
CHAPTER 1
INTRODUCTION TO THE STUDY
1.1 INTRODUCTION
HR analytics is a business intelligence tool that allows Human Resource teams to track, analyze and
report on HR KPIs. Modern, interactive dashboards leverage an HR analytics platform which makes it easy
to combine data from all systems and to deeply explore this data directly within the dashboard. This way,
HR teams can quickly find insights that will improve recruiting, optimize workplace management, and
enhance employee performance. Employee performance dashboards help HR teams and business
managers understand the effectiveness, satisfaction, and goal progress of their workforce. To analyze
compensation vs. performance this project shows the number of active employees by rating level and
salary by employee rating. Employees are the most important asset within an organization. This HR
dashboard project shows an HR leader training program metrics such as completion percentage, hours,
and cost. It takes the employee’s data set and based on this dataset it predicts the attrition rate and due
for promotion using Random Forest Algorithm. HR executives strive to maintain a diverse and balanced
workforce, so they need to fully understand the demographic characteristics of their employees. HR
dashboard analysis allows them to deeply analyze data on age, gender, location, department, and ethnic
groups. Using an interactive dashboard, HR professionals can dig deeper into demographic data and
analyze one variable, such as ethnic diversity. Managing and analyzing such vast amounts of HR data
manually is time consuming and prone to mistakes. This is where Machine Learning comes in. These
decision-making models, when provided with the data and information, can deliver excellent error-free
decisions, catch important trends in the data and provide actionable insights, which can be used to help
1
1.2 EMPLOYEE ATTRITION
Employee attrition refers to the percentage of employees who depart from an organization within
a specified timeframe due to various reasons such as illness, job dissatisfaction, inadequate wages,
marriage, retirement, or death, compared to the average number of employees on the payroll during the
same period. Essentially, it encompasses both the inflow and outflow of labor within an enterprise.
In today’s fiercely competitive business landscape, attrition poses challenges not only to the smooth
functioning of the organization but also to the morale of remaining employees. While a certain level of
attrition may be tolerable, exceeding a certain threshold incurs significant expenses associated with
replacing departed employees. These expenses include recruitment and selection costs, as well as training
and development expenditures. Moreover, the loss of key performers within the organization represents
a substantial setback, as finding suitable replacements for such individuals proves to be a daunting task.
Employee retention is a top priority for organizations, leading them to invest significant efforts in
devising and implementing retention strategies. It poses a considerable challenge for employers to create
and enforce effective retention policies. Retention efforts entail ongoing initiatives aimed at fostering an
environment conducive to meeting the diverse needs of employees, thereby incentivizing them to remain
Employees are the lifeblood of any organization, and their departure can precipitate a range of
issues, including financial costs, operational disruptions, and the loss of valuable talent, which in turn
impacts the morale of remaining staff. Recognizing these challenges, organizations place a strong emphasis
on employee retention as a means of mitigating such problems. Effective retention strategies focus on
2
attracting and retaining innovative, efficient, and dedicated personnel. It is not merely about managing
retention but rather about effectively managing the human resources of the organization, which inherently
1.4 OBJECTIVE
● Assess the degree of employee satisfaction regarding their job and working environment.
● Identify the elements that contribute to employee dissatisfaction with the company's policies and
guidelines.
● Pinpoint the areas where the company is falling short or facing shortcomings.
● Understand the underlying causes of attrition within the company.
● Develop strategies and methods to minimize attrition rates within the organization.
1.5 MOTIVATION
This project originates from the possibility of enhancing employee contentment, cutting down
expenses, elevating organizational efficiency, and fostering a favorable workplace atmosphere. It
represents a chance to leverage data and analytics to enact substantial improvements that are
advantageous to both employees and the entire organization.
This project attempted to investigate the importance of Data Analytics and People Analytics to
improve efficiency in HR functions. “Due to useful analytics results regarding how organizations find, hire,
maintain, and retain employees, HR data analysis plays a significant role in operational activities of any
business. How can data analytics help HR departments work more efficiently?”
3
1.7 RESEARCH QUESTIONS
There are two main questions that will be discussed in this project:
1. What techniques have made an impact on HR functions in the last five years?
2. How can Data Analytics and Artificial Intelligence tools help to improve analytics in HR functions?
1.8 IMPORTANCE
The remainder of the thesis is organized as follows. First, previous work related to the prediction of
employee attrition and the challenges associated with this task will be discussed. Then, the general
approach used in this thesis will be explained by introducing and motivating the various computational
algorithms used. After this, the dataset and experimental setup will be described. Thereafter, the results
will be presented and discussed in a concise manner. The thesis will end with the discussion of the work
performed, directions for future work, and an overall conclusion.
1.9 RELEVANCE
Related Work Organizations face significant impacts on achieving the pre-defined goals with the
rise in employee turnover rate. Employee attrition also impacts the ongoing work and the productivity of
current workers. The recruiting of new workers often wastes the organization's useful financial capital. In
this section the causes of employee attrition found in literature will be mentioned. Also, the algorithms
used to predict employee attrition and the methods to handle imbalanced data to select the most
contributing features are discussed. Lastly, the contributions of thesis to the existing literature are
formulated.
4
CHAPTER 2
LITERATURE REVIEW
2.1 INTRODUCTION
In the literature review, collected differing viewpoints and assessments have given idea on the
various factors of employee attrition. The highlighted statistics are the outcomes of concentrated literature
review and the references helped to understand employee engagement and organizational structure,
conceptualizing the importance of HR Analytics in attrition reduction.
The reviewed research articles and projects are broadly grouped and prescribed here under six
heads viz., causes of employee attrition, impact of employee attrition, measurement of employee attrition,
employee retention, employee attrition and retention and models of employee attrition.
2.2 REVIEWS
Sandeep Yadav et.al (2018) in their research on “Early Prediction of Employee Attrition using Data
Mining Techniques” have focused on the importance of data mining techniques to forecast the attrition
patterns in employees. Researchers have taken an IT Industry along with different departments of the
organization for their research on prediction of employee attrition. Authors have used – Name of the
employee, Number of Projects handled, Average Monthly Hours given, Job Satisfaction Level, Time spent
in the organization (Years), Last Evaluation, Departments, Work mishap, Absent, Promotion & Rewards last
5 years, compensation, salary level as a variable for measuring the reasons of attrition. Techniques as Data
Pre-processing, Featuring Engineering, Data Modelling & Comparison are employed the research Hot
Encoding process is used to steely various department as IT, Accounting, Management, HR, Product
Management, Sales, Support, Technical. Logistic Regression, SVM Model are used for getting the
Categorization Results of Hot Programming Method. The outcome of these study shows, Employee attrition
5
can influence an organization in many disciplines like reputation, revenues, position in the market, and cost
in terms of both time and money. So, during hiring if the organization able to take preventive measure that
will help the organization to reduce attrition. It was understood that compensation, career development,
promotions are not the only justifications at the back of the employee attrition. During hiring other aspects
also, company needs to consider. And through this concept the firm can build dependable and correct
models which can improve the hiring and preservation cost of quality employees and enable to establish
the attrition position of employee by using the appropriate data extracting methods.
I Setiawan (2020)
I Setiawan et.al done the research on “HR Analytics-Employee Attrition using Logistic Regression.”
Researchers used Logistic Regression for analyzing the employee attrition. For more understanding the
authors apply R studio for data mixing, empirical data analysis, data planning, logistic regression, model
assessment, and visualization. Authors created 5 stages of their study – data gathering and business
awareness, data pre-managing, experimental data analysis, model selection and preparation, and analysis
& assessment of the model. Employee attrition is studied with the help of variables as - number of
companies operated, total work-experience, years with current supervisor, frequent professional travel,
poor work environment satisfaction, department HR, marital status – separation, marital status – wedded,
poor job satisfaction, early logout, and working overtime. The Employee one of the main reasons, and The
Company reason were also used. It is clear from the study, the retention rates, the company needs to
develop the human resource department by assessing the working atmosphere, work or job satisfaction,
amount of work of employee, and communication between manager, leader, and subordinates.
Christopher Boomhower et.al researched and knowing us the “Employee Attrition-What makes an
Employee Quit.” The study attempts to understand the underlying reasons for employee quit with special
reference to civil workers. The authors examined publicly available data from office of Personnel
Management, the Bureau of Labor Statistics, and IBM. The researchers applied the principal Component
6
Analysis methodology for exploring the reasons of attrition. The findings of the study, that pay scale is one
of the major reasons for quitting the job. The significant highlights of the study show that, significant
reduction in chances of an employee resigning as his or her service length increases, chances increase or
decrease dependent on employee age, and chances of leaving are low if the employee is in the specification
pay plan.
The research focus to evaluate the co-relation between job attraction-selection & attrition with
help of a model. The authors have considered the Public Sector for the study and studied public service
motivation between private and public sector lawyers. The other important into understand the influence
of PSM and how PSM influences employee sector choice. The data gathered by the American Bar
Association (ABA) to analyze the employment trends of lawyers. Authors designed survey and participants
for more understanding and Panel Respondent Demographics by Sector of Employment for testing the job
attraction – selection – Attrition model. Researchers concluded, relationship is not straightforward
between PSM and sector employment. Even though the data used in this research do not rule out the
opportunity that these answers are due to adaptation rather than magnetism–selection procedures, these
results do not fully support either mechanism’s opportunities that employee reward preferences will
accord with the purpose each sector assists.
The objectives of the present research are to study the Impact the culture of the organization and
job pleasure or fulfilment and intention to resign or quit in fitness industries. “The Cultural Index for Fitness
Organizations (CIFO)” was established to evaluate views of organizational culture in the fitness industry to
understand the significance of fitness organizations, researchers conducted interviews with many fitness
company managers, leaders, and experienced staff. The data was collected by administering the
questionnaire with 7-point lid kart scale. The Research included 11 dimensions namely: presence of
organization, successful members, connections, reinforcement, innovation, marketing-sales,
7
organizational reliability, health and ability, maintenance, work ethic, environment. Experimental factor
analysis is employed which discovered eight components that characterize common dimensions of culture
to this context: staff proficiency, ambience, connections among employees, validation, sales, service-gear,
service systems, and organizational existence. Organizational culture proportions, job pleasure and
intention to quit developed by using path analysis. The outcomes of the research showed 14.3% of the
difference in job fulfilment and 50.3% of the difference with intention to quit the organization. It was clear
that, multi-characteristic of organizational culture and its complication in the fitness industry.
Authors have used Social Cognitive Career Theory Model to predict the turnover intentions among
women employees. SCCT model is used to predict the correlation among outcome opportunities, select
career goals, effective actions, benefits, self-ability, as they might play the role for developing career. For
further understanding, researchers used STEM theory and found the movement of women's representative
in engineering and physical sciences. Also, they came to know about their intentional declination from
post-degree and the occupation. (Ref: NSF, 2012; Preston, 2004; Society of Women Engineers, 2007).
Authors developed 6 Hypothesis and predicted that - Job mindsets, Relationship between social
intellectual influences and job attitudes, Self-ability will facilitate the relationship between progressive
opportunities at work and job position or job attitudes. Job Attitudes and Self-ability positively corelated
with each other, Job attitudes should be related with Outcome opportunities or expectations. Outcome
expectations will facilitate the correlation between growth opportunities at employment and job attitudes.
Self-ability and outcome prospects will build the correlation between growth opportunities at workplace
and job attitudes.
Kashyap Bhuva et.al (2018) done the research on Machine Learning Techniques for Forecasting the
Employee Attrition rate. Researchers considered IT organization and collected the sample of the employee
database of IBM USA. The current study is based on a workflow for the analytics project and considered
8
segments - 1) Raw Dataset 2) Data Processing 3) Feature Selection & Scaling 4) Modelling 5) Model
Evaluation & Tuning 6) Deployment & Monitoring Index-Terms-Ridge, Lasso, Logistic Failure, Judgmental
structure, Random Forest, straight Discriminant Analysis, Support Vector System analytical techniques are
the analytical tools used in the research. Researchers predicted the attrition of IBM, USA data having 35
data mining techniques and machine learning algorithms by using different algorithms and combinations
of several target attributes. The authors found, why intelligent and effective employee attrition prediction
using data mining as Employee Attrition is one of the biggest Business Problem. Concluded the result of
the prediction by data mining techniques with the similar dataset exposes that Linear Discriminant
Analytical result beats than other one followed by Logistic Regression Model for a particular dataset if
exactness is the metric favored.
The researchers done their research on Job Satisfaction and Resignations and used the wealth
maximization theory of quitting behavior on the German Socioeconomic Panel (1985-2003). Authors
observed, job satisfaction is one of the most important reasons for employees who has exceptionally good
experience and competency. Researchers considered Voluntary Retirement, Satisfaction level from the job,
Surprises for the employees, Wealth Maximization model for explaining the research. Authors explained
the reasons for quitting, an employee's tendency to leave or resign from the organization must be
comparable with the present values of future results and the outside opportunities compare to current
job. The results confirmed that through a simple subjective survey questionnaire about work structure,
satisfaction level - Economists can draw or found plenty of hidden information or evidence easily. In-fact
through this survey, a firm or industry can gain a prospect to improve them instead of using any traditional
tools.
Rupesh Khare et.al conducted their research on Employee Attrition Risk Assessment using Logistic
Regression Analysis and they applied logistic regression technique and predicted employee attrition
9
probability in an organization which is dependent on demographic data of divided employees. Researchers
executed the research based on real life project and collected demographic information from separated
and existing employees. The questionnaire authors prepared, evaluate quitting probability with current
staffs. The authors divided overall Attrition Risk into two part 1) Demographic Risk and 2) Behavioral Risk.
In the present study, based on demographic information, Logistic Regression mainly utilized to predict
employee attrition risk. Along with this a proper retention plan has been mapped to focus on the risk
categories developed.
The Author, Reshma Sundrani (2019), done her research on “Study on Employee Attrition &
Retention Exploring the Issues and Challenges”. The research investigated & considerable economical and
intangible costs correlated with losing, expert and competent employees. Researcher observed, the best
practices of HR must be reflected in the organization and organizational policy which are related to
employees. The HR strategy should grow from a transactional support role to partnering in the
organizations business strategy.
10
CHAPTER 3
PROBLEM STATEMENT
Element of performance management difficult to make hiring and recruitment plan (How long does
it take to hire employees). Difficult to Measuring employee performance management identify patterns of
employee engagement, employee satisfaction and performance. Difficult to make plan for employee
learning and development are learning and development initiatives having an impact on employee
performance.
Tracking employment contract status, develop strategies and make decisions that will improve the
work environment and engagement levels. Having data-backed evidence means that organizations can
focus on making the necessary improvements and plan for future initiatives.
3.1 DESCRIPTION
The project titled "IBM HR Analytics for Employee Attrition and Performance Prediction" aims to
address the challenge of employee turnover in companies, particularly in the software industry. The study
utilizes IBM's HR Analytics datasets to analyze and predict employee attrition and performance using
various statistical methods and machine learning algorithms. The analysis involves data cleaning, data
visualization, and the application of algorithms like Logistic Regression, Random Forest, Support Vector
Machine, XGBoost, CatBoost, AdaBoost, and LightGBM to understand and mitigate employee attrition.
Solving the problem of employee attrition is crucial for maintaining organizational stability and
performance. High attrition rates can lead to significant operational challenges, including decreased
productivity, disrupted team dynamics, and increased recruitment and training costs. By identifying the
11
factors contributing to employee dissatisfaction and attrition, companies can develop targeted strategies
to improve employee retention, thereby enhancing overall organizational efficiency and creating a more
stable and productive work environment.
To understand the problem of employee attrition and the project's approach to solving it, several key
background elements are necessary:
1. Employee Attrition: Understanding what employee attrition is and its impact on organizations. This
includes voluntary turnover (employees leaving by choice) and involuntary turnover (employees
leaving due to layoffs or dismissals).
2. HR Analytics: Familiarity with HR analytics and its role in analyzing workforce data to make informed
decisions about employee management and retention.
3. Data Analysis Techniques: Knowledge of statistical tests such as ANOVA and Chi-Square tests used
to analyze the significance of various factors in employee attrition.
4. Machine Learning Algorithms: Understanding how different machine learning algorithms (e.g.,
Logistic Regression, Random Forest, XGBoost, etc.) can be applied to predict employee behavior
and identify key factors leading to attrition.
5. Dataset Characteristics: An overview of the IBM HR Analytics dataset, including its features (both
numerical and categorical), and the process of data cleaning and preprocessing to ensure accurate
analysis.
12
CHAPTER 4
METHODOLOGY
The methodology for IBM HR Analytics Employee Attrition and Performance Prediction is as follows
13
● Model Performance Comparison: The hvPlot library is used to visualize the ROC curve diagram
comparing the performance of all models used.
4.1 APPROACH
The project employs a structured approach to address the problem of employee attrition through data
analysis and predictive modeling. The main steps include:
1. Data Collection: Utilizing the IBM HR Analytics Attrition Dataset, which contains comprehensive
information on employee demographics, job satisfaction, and performance metrics.
2. Data Exploration and Preprocessing: Conducting initial exploration to understand the dataset,
followed by cleaning and preprocessing steps such as handling missing values, removing redundant
features, and converting categorical variables into numerical formats.
3. Data Visualization: Creating visual representations of the data to identify trends and patterns
related to employee attrition.
4. Statistical Analysis: Applying statistical tests to determine the significance of various features in
predicting attrition.
5. Machine Learning Model Implementation: Training and evaluating multiple machine learning
algorithms to predict employee attrition.
6. Model Evaluation and Comparison: Assessing the performance of different models using metrics
like accuracy, confusion matrix, and ROC curve.
4.2 TECHNIQUES
14
2. Data Visualization:
● Matplotlib and Seaborn: Tools used for creating plots and charts to visualize the
relationships between different features and employee attrition.
● Visualization Types: Bar plots, histograms, and other charts to illustrate the distribution and
impact of various features on attrition.
3. Statistical Analysis:
● ANOVA Test: Used to identify the significance of numerical features in predicting attrition.
● Chi-Square Test: Applied to categorical features to determine their importance in the
prediction model.
4. Machine Learning Algorithms:
● Logistic Regression: A basic yet powerful algorithm for binary classification tasks.
● Random Forest: An ensemble learning method that uses multiple decision trees to improve
prediction accuracy.
● Support Vector Machine (SVM): A robust classifier that finds the optimal hyperplane for
separating data points.
● XGBoost, CatBoost, AdaBoost, LightGBM: Advanced boosting algorithms that enhance
model performance by combining weak learners to form a strong predictor.
5. Model Evaluation:
● Accuracy Score: Measures the proportion of correctly predicted instances out of the total
instances.
● Confusion Matrix: Provides insights into the true positives, true negatives, false positives,
and false negatives.
● ROC Curve: Plots the true positive rate against the false positive rate to visualize the
performance of classification models.
15
CHAPTER 5
SYSTEM ARCHITECTURE
5.1 ARCHITECTURE DIAGRAM:
1. Data Collection
Interaction: The project begins with collecting data from IBM HR datasets, which include
comprehensive details about employees such as demographics, job roles, satisfaction levels, and
performance metrics. This data forms the foundation for subsequent analysis and model building.
16
2. Data Preprocessing
Interaction: Raw data often contains inconsistencies such as missing values, duplicates, and outliers. The
data preprocessing step involves cleaning the dataset to handle these issues. This includes:
Normalization: Adjusting numerical values to a common scale without distorting differences in the ranges
of values.
3. Feature Selection
Interaction: Feature selection is critical for model accuracy and performance. Statistical tests like ANOVA
and Chi-Square are conducted to identify significant features that influence employee attrition. These tests
help in understanding the relationship between different variables and employee attrition, ensuring that
only relevant features are used in the model.
4. Model Training
Interaction: Various machine learning algorithms are trained on the preprocessed dataset. Each algorithm
interacts with the data to learn patterns and relationships that can predict employee attrition. The key
algorithms used are:
Logistic Regression: A statistical model that predicts the probability of a binary outcome.
Random Forest: An ensemble learning method using multiple decision trees to improve prediction
accuracy.
17
5. Evaluation
Interaction: Once the models are trained, their performance is evaluated using various metrics. The ROC
(Receiver Operating Characteristic) curve is particularly useful as it illustrates the true positive rate
against the false positive rate at various threshold settings. Cross-validation is used to ensure that the
models generalize well to unseen data, providing a reliable measure of their predictive power.
6. Data Visualization
Interaction: Visualizations are crucial for interpreting the results of the analysis. They include:
Attrition Rate Visualization: Graphs showing trends in employee attrition across different features such
as age, job role, and satisfaction level.
Model Performance Comparison: Charts comparing the ROC curves of different models, helping in
identifying the best-performing model.
Interaction: The final component involves deriving actionable insights from the data analysis and model
predictions. These insights help HR departments in formulating strategies to reduce employee attrition by
addressing the identified factors contributing to turnover.
18
CHAPTER 6
IMPLEMENTATION
6.1 OVERVIEW:
The implementation of this final year project involves several key stages, integrating R programming
for data analysis and Power BI for visualization. The project aims to analyze and predict employee attrition
using the IBM HR Analytics Attrition Dataset, providing actionable insights through interactive dashboards.
The steps in the implementation process include data collection, data cleaning, statistical analysis,
predictive modeling, and visualization. Below is a detailed description of each stage.
Dataset Description:
During website session, browsing information about visited pages is collected and features are extracted
as follows:
Min Max
Feature Name Feature Description Std. Dev
Value Value
19
It is all about an individual's feelings
EnvironmentSatisfaction about the work environment and 1 4 1.09
organization culture.
20
Table 2: Categorical Features used in the User Attrition Analysis Model.
Number of Categorical
Feature Name Feature Description Values
Attrition Attrition in business describes a gradual but deliberate
reduction in staff numbers that occurs as employees 2
retire or resign, [NOTE: Target Variable] (0=no, 1=yes)
BusinessTravel
Business travel is travel undertaken for work or 3
business purposes, as opposed to other types of travel
(1=No Travel, 2=Travel Frequently, 3=Travel Rarely)
Department
Consists three departments that contribute to the 3
company's overall mission. (1=HR, 2=R&D, 3=Sales)
Education field of the employees (1=HR, 2=Life 6
EducationField
Sciences, 3=Marketing, 4=Medical Sciences, 5=others,
6= Technical)
Gender of the employee (1=Female, 2=Male) 2
Gender
JobRole
These refer to the specific activities or work that the 9
employee will perform. (1=HC Rep, 2=HR, 3=Lab
Technician, 4=manager, 5= Managing Director, 6=
Research Director, 7= Research Scientist, 8=sales
Executive, 9= Sales Representative)
MaritalStatus
Marital Status of the employee (1=divorced, 2=married, 3
3=single)
Over18 (1=Yes, 2=No) 2
(1=No, 2=Yes) 2
Overtime
21
CHAPTER 7
DATA EXPLORATION AND PROCESSING
COMPUTE SIZE:
In first step, we try to understand the dataset's size and structure briefly by computing its size.
DATASET:
22
Fig.7.1 Compute Size of Dataset.
The code reveals that the "employee_data" DataFrame contains 1,470 rows and 35
columns, providing a quick overview of its size and structure.
23
DAX FUNCTION USED
24
CHAPTER 8
DATA VISUALIZATION
By analyzing employee data, we can identify factors that contribute to employee attrition, such
as job satisfaction, compensation, and work-life balance. This information can be used to develop
strategies to retain top talent and reduce turnover rates. HR analytics can help identify high-
performing employees by analyzing data related to performance metrics, such as productivity, quality,
and customer satisfaction. This information can be used to develop strategies to retain top talent and
improve overall organizational performance.
25
8.2 DASHBOARD: ATTRITION PAGE
26
8.3 VISUALIZING THE EMPLOYEE ATTRITION RATE
27
8.4 ANALYZING EMPLOYEE ATTRITION BY GENDER
Inference:
✓ The number of male employees in the organization accounts for a higher proportion than female
employees by more than 20%.
✓ Male employees are leaving more from the organization compared to female employees.
28
8.5 ANALYZING EMPLOYEE ATTRITION BY AGE.
Inference:
29
8.6 ANALYZING EMPLOYEE ATTRITION BY BUSINESS TRAVEL.
Inference:
30
8.7 ANALYZING EMPLOYEE ATTRITION BY DEPARTMENT
Inference:
31
8.8 ANALYZING EMPLOYEE ATTRITION BY DAILY RATE
Inference:
✓ Employees with Average DailyRate & High Daily Rate are approximately equal.
✓ But the attrition rate is very high of employees with average Daily Rate compared to
the employees with High DailyRate.
✓ The attrition rate is also high of employees with low DailyRate.
✓ Employees who are not getting High Daily Rate are mostly leaving the organization.
32
8.9 ANALYZING EMPLOYEE ATTRITION BY DISTANCE FROM HOME.
✓ In the organization there is all kind of employees staying close or staying far from
the office.
✓ The feature Distance from Home does not follow any trend in attrition rate.
✓ Employees staying close to the organization are mostly leaving compared to
employees staying far from the organization
33
8.10 ANALYZING EMPLOYEE ATTRITION BY EDUCATION
Inference:
✓ Most of the employees in the organization have completed Bachelors or Masters as their education
qualification.
✓ Very few employees in the organization have completed Doctorate degree as their education
qualification.
✓ We can observe a trend of decreasing in attrition rate as the education qualification increases.
34
8.11 ANALYZING EMPLOYEE ATTRITION BY EDUCATION FIELD:
Inference:
✓ Most of the employees are either from Life Science or Medical Education Field.
✓ Very few employees are from Human Resources Education Field.
✓ Education Fields like Human Resources, Marketing, and Technical is having very high
attrition rate.
✓ This may be because of work load because there are very few employees in these
education fields compared to education field with less attrition rate.
35
8.12 ANALYZING EMPLOYEE ATTRITION BY ENVIRONMENT SATISFACTION
Inference:
✓ Most of the employees have rated the organization environment satisfaction High & Very
High.
✓ Though the organization environment satisfaction is high still there's very high attrition in
this environment.
✓ Attrition Rate increases with increase in level of environment satisfaction.
36
8.13 ANALYZING EMPLOYEE ATTRITION BY JOB ROLES
Inference:
37
8.14 ANALYZING EMPLOYEE ATTRITION BY JOB LEVEL.
Inference:
✓ Most of the employees in the organization are at Entry Level or Junior Level.
✓ Highest Attrition is at the Entry Level.
✓ As the level increases the attrition rate decreases.
38
8.15 ANALYZING EMPLOYEE ATTRITION BY JOB SATISFACTION.
Inference:
✓ Most of the employees have rated their job satisfaction as high or very high.
✓ Employees who rated their job satisfaction low are mostly leaving the organization.
✓ All the categories in job satisfaction are having high attrition rate.
39
8.16. ANALYZING EMPLOYEE ATTRITION BY MARTIAL STATUS
Inference:
40
8.17 ANALYZING EMPLOYEE ATTRITION BY MONTHLY INCOME
Inference:
✓ Most of the employees are getting paid less than 10000 in the organization.
✓ The average monthly income of employee who has left is comparatively low with employee who is
still working.
41
8.18 ANALYZING EMPLOYEE ATTRITION BY WORK EXPERIENCE
Inference:
42
8.19 ANALYZING EMPLOYEE ATTRITION BY OVERTIME
Inference:
43
8.20 ANALYZING EMPLOYEE ATTRITION BY SALARY HIKE
44
8.21 ANALYZING EMPLOYEE ATTRITION BY PERFORMANCE RATING
Inference:
45
8.22 ANALYZING EMPLOYEE ATTRITION BY RELATIONSHIP SATISFACTION
Inference:
✓ Most of the employees are having high or very high relationship satisfaction.
✓ Though the relationship satisfaction is high there is a high attrition rate.
✓ All the categories in this feature are having a high attrition rate.
46
8.23 ANALYZING EMPLOYEE ATTRITION BY WORK LIFE BALANCE.
Inference:
✓ More than 60% of employees are having a better work life balance.
✓ Employees with Bad Work Life Balance are having Very High Attrition Rate.
✓ Other Categories is also having High attrition Rate.
47
8.24 ANALYZING EMPLOYEE ATTRITION BY TOTAL WORKING EXPERIENCE
Inference:
✓ Most of the employees are having a total of 5 to 10 years of working experience. But their
Attrition Rate is also very high.
✓ Employees with working experience of less than 10 years are having High Attrition Rate.
✓ Employees with working experience of more than 10 years are having Less Attrition Rate.
48
8.25 ANALYZING EMPLOYEE ATTRITION BY YEARS AT COMPANY.
Inference:
49
8.26 ANALYZING EMPLOYEE ATTRITION BY YEARS IN CURRENT ROLE
Inference:
✓ Most employees have worked for 2 to 10 years for the same role in the organization.
✓ Very few employees have worked for less than 1 year or more than 10 years in the same
role.
✓ Employee who has worked till 2 years in the same role are having very high attrition rate.
✓ Employee who has worked for 10+ years in the same role are having low attrition rate.
50
8.27 ANALYZING EMPLOYEE ATTRITION BY YEARS SINCE LAST PROMOTION
Inference:
51
8.28 ANALYZING EMPLOYEE ATTRITION BY YEARS WITH CURRENT MANAGER
Inference:
✓ Almost 51% of employees have worked for 2-5 years with the same manager.
✓ Almost 38% of employees have worked for 5-10 years with the same manager.
✓ Employee who has worked for 10+ year with the same manager is having very low
attrition rate.
✓ Other Categories is having high attrition rate.
52
CHAPTER 9
STATISTICAL ANALYSIS
Statistical analysis plays a crucial role in HR analytics by helping organizations make informed
decisions about their human resources and workforce management. It enables evidence-based
decision-making, enhances workforce planning strategies, and fosters a deeper understanding of the
organization's human capital dynamics.
1. Perform ANOVA Test: ANOVA test is used to analyzing the impact of different numerical features on
a response categorical feature.
Inference:
The following features show a strong association with attrition, as indicated by their high F-
scores and very low p-values.
1. Age
2. DailyRate
3. HourlyRate
4. MonthlyIncome
5. MonthlyRate
6. NumCompaniesWorked
7. PercentSalaryHike
8. TotalWorkingYears
9. TrainingTimesLastYear
10. YearsAtCompany
11. YearsWithCurrManager
53
The following features do not show significant relationship with attrition because of their moderate
F-scores and extremely high p-values.
1. DistanceFromHome
2. StockOptionLevel
3. YearsInCurrentRole
4. YearsSinceLastPromotion
It is important for the organization to pay attention to the identified significant features and consider
them when implementing strategies to reduce attrition rates
2. Perform CHI-SQUARE Test: CHI-SQUARE test is used to analyzing the impact of different categorical
features.
Inference:
The following features showed statistically significant associations with employee attrition:
6. JobRole
1. Department
7. JobSatisfaction
2. EducationField
8. MaritalStatus
3. EnvironmentSatisfaction
9. OverTime
4. JobInvolvement
10. WorkLifeBalance
5. JobLevel
The following features did not show statistically significant associations with attrition.
1. Gender
2. Education
3. PerformanceRating
4. RelationshipSatisfaction
It is important for the organization to pay attention to the identified significant features and consider
them when implementing strategies to reduce attrition rates.
54
CHAPTER 10
DATA MODELING
Data modeling plays a significant role in HR analytics when integrating machine learning
techniques. Machine learning algorithms leverage data models to make predictions, classifications,
and recommendations based on patterns and relationships found in the HR data.
The data set was split into 70% for training and 30% for testing and we have considered
Attrition as target feature.
Fitting the different machine learning models:
Random Forest Model.
Importing Libraries:
Lists all files in the input directory to understand the available datasets.
55
Loads the dataset into a DataFrame and displays the first few rows.
56
Provides the shape of the dataset, statistical summary of numerical columns, and checks for missing
values.
Out [7]:
Age 0
Attrition 0
BusinessTravel 0
DailyRate 0
Department 0
DistanceFromHome 0
Education 0
EducationField 0
EmployeeCount 0
EmployeeNumber 0
EnvironmentSatisfaction 0
Gender 0
HourlyRate 0
JobInvolvement 0
JobLevel 0
JobRole 0
JobSatisfaction 0
MaritalStatus 0
57
MonthlyIncome 0
MonthlyRate 0
NumCompaniesWorked 0
Over18 0
OverTime 0
PercentSalaryHike 0
PerformanceRating 0
RelationshipSatisfaction 0
StandardHours 0
StockOptionLevel 0
TotalWorkingYears 0
TrainingTimesLastYear 0
WorkLifeBalance 0
YearsAtCompany 0
YearsInCurrentRole 0
YearsSinceLastPromotion 0
YearsWithCurrManager 0
dtype: int64
In [7]:
attrition_count
58
59
60
61
Out[16]:
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f56383e4110>
62
Out[17]:
plt.figure(figsize = (10,6))
sns.heatmap(df.corr())
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f56383a8850>
63
Data Preprocessing
Converting String columns into integers
Out[19]:
if df[column].dtype==np.number:
continue
else:
df[column]=LabelEncoder().fit_transform(df[column])
Model Building
Splits the data into training and testing sets, trains a RandomForestClassifier, and evaluates its performance on the
training data.
In [20]:
random_state = 0)
In [21]:
x = df.drop(['Yes'], axis = 1)
64
y = df['Yes']
In [22]:
random_state = 0)
In [23]:
x_train.head()
Out[24]:
rf.fit(x_train, y_train)
Out[24]:
In [25]:
rf.score(x_train, y_train)
Out[25]: 0.9815354713313897
65
Predicting for x_test
In [26]:
pred = rf.predict(x_test)
In [27]:
accuracy_score(y_test, pred)
Out[28]:
0.8526077097505669
Makes predictions on the test set and calculates the accuracy of the model, which is 85.26%.
66
CHAPTER 11
CONCLUSION
67
REFERENCES
[1] Yadav, Sandeep, Aman Jain, and Deepti Singh. "Early Prediction of Employee Attrition using Data
Mining Techniques." 2018 IEEE 8th International Advance Computing Conference (IACC). IEEE, 2018.
[2] Setiawan, I., et al. "HR analytics: Employee attrition analysis using logistic regression." IOP
Conference Series: Materials Science and Engineering. Vol. 830. No. 3. IOP Publishing, 2020.
[3] Frye, Alex, et al. "Employee Attrition: What Makes an Employee Quit?" SMU Data Science Review
1.1 (2018): 9.
[4] Wright, Bradley E., and Robert K. Christensen. "Public service motivation: A test of the job
attraction–selection–attrition model." International Public Management Journal 13.2 (2010): 155-
176.
[5] MacIntosh, Eric W., and Alison Doherty. "The influence of organizational culture on job satisfaction
and intention to leave." Sport Management Review 13.2 (2010): 106-117.
[6] Singh, Romila, et al. "Stemming the tide: Predicting women engineers' intentions to leave." Journal
of Vocational Behaviour 83.3 (2013): 281-294.
[7] Brown, Larry K., et al. "Predictors of retention among HIV/hemophilia health care professionals."
General hospital psychiatry 24.1 (2002): 48-54.
[8] Bhuva, Kashyap, and Kriti Srivastava. "Comparative Study of the Machine Learning Techniques for
Predicting the Employee Attrition." IJRAR-International Journal of Research and Analytical Reviews
(IJRAR) 5.3 (2018): 568-577. [9]. Lévy-Garboua, Louis, Claude Montmarquette, and Véronique
Simonnet.
68