0% found this document useful (0 votes)
47 views

Nishanth Project

Uploaded by

nishanth
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Nishanth Project

Uploaded by

nishanth
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

HR ANALYTICS FOR EMPLOYEE ATTRITION AND

PERFORMANCE PREDICTION

A PROJECT REPORT
submitted to the
SRM Institute of Science and Technology (Deemed to be University), Chennai
in partial fulfillment of the requirements for the award of the Degree of

MASTER OF BUSINESS ADMINISTRATION


Submitted by

NISHANTH R
[DA22523050147]

Under the guidance of

Dr. G. Dinesh
(Assistant Professor, Department of Computational Intelligence)

SRMIST-DDE, KTR, DEPARTMENT OF MANAGEMENT

DIRECTORATE OF ONLINE AND DISTANCE EDUCATION


SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Deemed to be university u/s 3 of UGC Act, 1956) CHENGALPATTU – 603203
JUNE 2024
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR - 603203

BONAFIDE CERTIFICATE

This is to certify that the Project Work entitled “HR Analytics for Employee Attrition
and Performance Prediction” submitted by NISHANTH. R [DA2252305010147] of MBA,
Directorate of Distance Education, SRM Institute of Science and Technology, Kattankulathur
is a Bonafide
Record of Project Work carried out by him in partial fulfillment of the requirement for the
award of degree of Master of Business Administration.

SIGNATURE SIGNATURE
Dr. Dinesh G Dr. Daniel Rajkumar M
SUPERVISOR PROGRAM COORDINATOR,
Assistant Professor, Assistant Professor,
Dept. of CINTEL, MBA Department,
SRMIST, KTR SRMIST-DDE, KTR

Signature of the Internal Examiner Signature of the External Examiner

ii
DECLARATION

I hereby declare that the Project Work entitled “HR Analytics for Employee Attrition
and Performance Prediction” submitted by me for partial fulfillment of the degree of Master
of Business Administration, under the guidance of Dr. G. Dinesh, (Assistant Professor,
Department of Computational Intelligence) SRMIST-DDE, KTR. SRM Institute of Science and
Technology is my original work and has not been submitted earlier to any other
University/Institutions. The matter presented in this project report has not been submitted
elsewhere for the award of any other degree/diploma. I declare that I have faithfully
acknowledged, given credit to and referred to the research workers wherever their works
have been cited in the text and the body of the project I further certify that I have not willfully
lifted up some other’s work, Para, text, data, results, etc., reported in the journals, books,
magazines, reports, dissertations, theses, etc., or available at web-sites and have not included
them in this project report and cited as my own work.

Declaration:
I am aware of and understand the University’s policy on Academic misconduct and
plagiarism and certify that this assessment is my own work, except were indicated by
referring, and that I have followed the good academic practices noted above.

Nishanth R
DA22523050147

iii
ACKNOWLEDGEMENT

We express our humble gratitude to Dr. Muthamizhchelvan, Vice chancellor, SRM


Institute of Science and Technology, for the facilities extended for the project work and his
continued support.

I express my deep sense of gratitude to the Director Dr. R. Rajagopal, and Program
Coordinator Dr. M. Daniel Rajkumar, for their whole-Hearted encouragement. I am indebted
to my Research Supervisor, Dr. G. Dinesh., Assistant Professor (CINTEL Department) for his
continuous guidance and encouragement to complete my Project Work in a successful
manner.

I am also thankful to all the faculty and staff members of the Department of Distance
Education for their support and Guidance.

I also acknowledge with a deep sense of reverence, my gratitude towards my parents


and members of my family who have always supported me morally as well as economically. I
take this opportunity to thank all those who have helped me to complete my Project Work
within the scheduled time

Nishanth R
[DA2252305010147]

iv
TABLE OF CONTENTS
1. Introduction to the Study ………………………………………………………………………………………… 1
1.1 Introduction 1
1.2 Employee Attrition 2
1.3 Employee Retention 2
1.4 Objective 3
1.5 Motivation 3
1.6 Scope of Work 3
1,7 Research Questions 4
1.8 Importance 4
1.9 Relevance 4
2. Literature Review …………………………………………………………………………………………………… 5
3. Problem Statement ………………………………………………………………………………………………… 11
3.1 Description 11
3.2 Importance to Solve the Problem 11
3.3 Background Information Necessary to Understand the Problem 12
4. Methodology ………………………………………………………………………………………………………….. 13
4.1 Approach 14
4.2 Techniques 14
5. System Architecture ………………………………………………………………………………………………… 16
5.1 Architecture Diagram 16
5.2 Explanation of Components and Interaction 16
6. Implementation ……………………………………………………………………………………………………… 19
7. Data Exploration and Processing ……………………………………………………………………………. 22
8. Data Visualization …………………………………………………………………………………………………… 25
8.1 Dashboard Visuals 25
8.2 Attrition Page 26
8.3 Visualising Employee Attrition Rate 27
8.4 Analysing Employee Attrition by Gender 28
8.5 Analysing Employee Attrition by Age 29
8.6 Analysing Employee Attrition by Business Travel 30

v
8.7 Analysing Employee Attrition by Department 31
8.8 Analysing Employee Attrition by Daily Rate 32
8.9 Analysing Employee Attrition by Distance From Home 33
8.10 Analysing Employee Attrition by Education 34
8.11 Analysing Employee Attrition by Education Field 35
8.12 Analysing Employee Attrition by Environment Satisfaction 36
8.13 Analysing Employee Attrition by Job Roles 37
8.14 Analysing Employee Attrition by Job Level 38
8.15 Analysing Employee Attrition by Job Satisfaction 39
8.16 Analysing Employee Attrition by Marital Status 40
8.17 Analysing Employee Attrition by Monthly Income 41
8.18 Analysing Employee Attrition by Work Experience 42
8.19 Analysing Employee Attrition by Overtime 43
8.20 Analysing Employee Attrition by Salary Hike 44
8.21 Analysing Employee Attrition by Performance Rating 45
8.22 Analysing Employee Attrition by Relationship Satisfaction 46
8.23 Analysing Employee Attrition by Work Life Balance 47
8.24 Analysing Employee Attrition by Total Working Experience 48
8.25 Analysing Employee Attrition by Years at Company 49
8.26 Analysing Employee Attrition by Years in Current Role 50
8.27 Analysing Employee Attrition by Years since last promotion 51
8.28 Analysing Employee Attrition by Years with current Manager 52
9. Statistical Analysis ………………………………………………………………………………………………….. 53
10. Data Modeling ……………………………………………………………………………………………………… 55
11. Conclusion ……………………………………………………………………………………………………………. 56

vi
ABSTRACT

A HR dashboard is an advanced analytics tool that displays important HR metrics using


interactive data visualizations. It helps the human resources department to improve
recruiting processes, optimize the workplace management as well as to monitor and enhance
the overall employee performance. HR Analytics applies various analytic tools and generates
reports. It provides a better insight to the various issues related to the HR activities. The aim
in this project is to present the history of the organization how many employees working,
gender and salary structure and performance using machine learning algorithm and predicts
promotion chances. It also predicts the attrition rate of an organization. This visualization
shows statical information on a dashboard. Human resource HR analytics have the potential
to bring great value to the decision making the ability of HR leaders on human and
organizational capital. Human resource analytics are useful for improving employee
performance and getting an optimal return on investment on its human capital.

Keywords: Machine Learning, Power BI, Data Visualization, Attrition rate prediction

vii
CHAPTER 1
INTRODUCTION TO THE STUDY

1.1 INTRODUCTION
HR analytics is a business intelligence tool that allows Human Resource teams to track, analyze and

report on HR KPIs. Modern, interactive dashboards leverage an HR analytics platform which makes it easy

to combine data from all systems and to deeply explore this data directly within the dashboard. This way,

HR teams can quickly find insights that will improve recruiting, optimize workplace management, and

enhance employee performance. Employee performance dashboards help HR teams and business

managers understand the effectiveness, satisfaction, and goal progress of their workforce. To analyze

compensation vs. performance this project shows the number of active employees by rating level and

salary by employee rating. Employees are the most important asset within an organization. This HR

dashboard project shows an HR leader training program metrics such as completion percentage, hours,

and cost. It takes the employee’s data set and based on this dataset it predicts the attrition rate and due

for promotion using Random Forest Algorithm. HR executives strive to maintain a diverse and balanced

workforce, so they need to fully understand the demographic characteristics of their employees. HR

dashboard analysis allows them to deeply analyze data on age, gender, location, department, and ethnic

groups. Using an interactive dashboard, HR professionals can dig deeper into demographic data and

analyze one variable, such as ethnic diversity. Managing and analyzing such vast amounts of HR data

manually is time consuming and prone to mistakes. This is where Machine Learning comes in. These

decision-making models, when provided with the data and information, can deliver excellent error-free

decisions, catch important trends in the data and provide actionable insights, which can be used to help

propel the growth of the organization.

1
1.2 EMPLOYEE ATTRITION

Employee attrition refers to the percentage of employees who depart from an organization within
a specified timeframe due to various reasons such as illness, job dissatisfaction, inadequate wages,
marriage, retirement, or death, compared to the average number of employees on the payroll during the
same period. Essentially, it encompasses both the inflow and outflow of labor within an enterprise.

In today’s fiercely competitive business landscape, attrition poses challenges not only to the smooth
functioning of the organization but also to the morale of remaining employees. While a certain level of
attrition may be tolerable, exceeding a certain threshold incurs significant expenses associated with
replacing departed employees. These expenses include recruitment and selection costs, as well as training
and development expenditures. Moreover, the loss of key performers within the organization represents
a substantial setback, as finding suitable replacements for such individuals proves to be a daunting task.

1.3 EMPLOYEE RETENTION

Employee retention is a top priority for organizations, leading them to invest significant efforts in

devising and implementing retention strategies. It poses a considerable challenge for employers to create

and enforce effective retention policies. Retention efforts entail ongoing initiatives aimed at fostering an

environment conducive to meeting the diverse needs of employees, thereby incentivizing them to remain

with the organization for extended periods.

Employees are the lifeblood of any organization, and their departure can precipitate a range of

issues, including financial costs, operational disruptions, and the loss of valuable talent, which in turn

impacts the morale of remaining staff. Recognizing these challenges, organizations place a strong emphasis

on employee retention as a means of mitigating such problems. Effective retention strategies focus on

2
attracting and retaining innovative, efficient, and dedicated personnel. It is not merely about managing

retention but rather about effectively managing the human resources of the organization, which inherently

addresses the issue of employee retention.

1.4 OBJECTIVE

● Assess the degree of employee satisfaction regarding their job and working environment.
● Identify the elements that contribute to employee dissatisfaction with the company's policies and
guidelines.
● Pinpoint the areas where the company is falling short or facing shortcomings.
● Understand the underlying causes of attrition within the company.
● Develop strategies and methods to minimize attrition rates within the organization.

1.5 MOTIVATION

This project originates from the possibility of enhancing employee contentment, cutting down
expenses, elevating organizational efficiency, and fostering a favorable workplace atmosphere. It
represents a chance to leverage data and analytics to enact substantial improvements that are
advantageous to both employees and the entire organization.

1.6 SCOPE OF WORK

This project attempted to investigate the importance of Data Analytics and People Analytics to
improve efficiency in HR functions. “Due to useful analytics results regarding how organizations find, hire,
maintain, and retain employees, HR data analysis plays a significant role in operational activities of any
business. How can data analytics help HR departments work more efficiently?”

3
1.7 RESEARCH QUESTIONS
There are two main questions that will be discussed in this project:
1. What techniques have made an impact on HR functions in the last five years?
2. How can Data Analytics and Artificial Intelligence tools help to improve analytics in HR functions?

1.8 IMPORTANCE
The remainder of the thesis is organized as follows. First, previous work related to the prediction of
employee attrition and the challenges associated with this task will be discussed. Then, the general
approach used in this thesis will be explained by introducing and motivating the various computational
algorithms used. After this, the dataset and experimental setup will be described. Thereafter, the results
will be presented and discussed in a concise manner. The thesis will end with the discussion of the work
performed, directions for future work, and an overall conclusion.

1.9 RELEVANCE
Related Work Organizations face significant impacts on achieving the pre-defined goals with the
rise in employee turnover rate. Employee attrition also impacts the ongoing work and the productivity of
current workers. The recruiting of new workers often wastes the organization's useful financial capital. In
this section the causes of employee attrition found in literature will be mentioned. Also, the algorithms
used to predict employee attrition and the methods to handle imbalanced data to select the most
contributing features are discussed. Lastly, the contributions of thesis to the existing literature are
formulated.

4
CHAPTER 2
LITERATURE REVIEW
2.1 INTRODUCTION

In the literature review, collected differing viewpoints and assessments have given idea on the
various factors of employee attrition. The highlighted statistics are the outcomes of concentrated literature
review and the references helped to understand employee engagement and organizational structure,
conceptualizing the importance of HR Analytics in attrition reduction.

The reviewed research articles and projects are broadly grouped and prescribed here under six
heads viz., causes of employee attrition, impact of employee attrition, measurement of employee attrition,
employee retention, employee attrition and retention and models of employee attrition.

2.2 REVIEWS

Sandeep Yadav (2018)

Sandeep Yadav et.al (2018) in their research on “Early Prediction of Employee Attrition using Data
Mining Techniques” have focused on the importance of data mining techniques to forecast the attrition
patterns in employees. Researchers have taken an IT Industry along with different departments of the
organization for their research on prediction of employee attrition. Authors have used – Name of the
employee, Number of Projects handled, Average Monthly Hours given, Job Satisfaction Level, Time spent
in the organization (Years), Last Evaluation, Departments, Work mishap, Absent, Promotion & Rewards last
5 years, compensation, salary level as a variable for measuring the reasons of attrition. Techniques as Data
Pre-processing, Featuring Engineering, Data Modelling & Comparison are employed the research Hot
Encoding process is used to steely various department as IT, Accounting, Management, HR, Product
Management, Sales, Support, Technical. Logistic Regression, SVM Model are used for getting the
Categorization Results of Hot Programming Method. The outcome of these study shows, Employee attrition
5
can influence an organization in many disciplines like reputation, revenues, position in the market, and cost
in terms of both time and money. So, during hiring if the organization able to take preventive measure that
will help the organization to reduce attrition. It was understood that compensation, career development,
promotions are not the only justifications at the back of the employee attrition. During hiring other aspects
also, company needs to consider. And through this concept the firm can build dependable and correct
models which can improve the hiring and preservation cost of quality employees and enable to establish
the attrition position of employee by using the appropriate data extracting methods.

I Setiawan (2020)

I Setiawan et.al done the research on “HR Analytics-Employee Attrition using Logistic Regression.”
Researchers used Logistic Regression for analyzing the employee attrition. For more understanding the
authors apply R studio for data mixing, empirical data analysis, data planning, logistic regression, model
assessment, and visualization. Authors created 5 stages of their study – data gathering and business
awareness, data pre-managing, experimental data analysis, model selection and preparation, and analysis
& assessment of the model. Employee attrition is studied with the help of variables as - number of
companies operated, total work-experience, years with current supervisor, frequent professional travel,
poor work environment satisfaction, department HR, marital status – separation, marital status – wedded,
poor job satisfaction, early logout, and working overtime. The Employee one of the main reasons, and The
Company reason were also used. It is clear from the study, the retention rates, the company needs to
develop the human resource department by assessing the working atmosphere, work or job satisfaction,
amount of work of employee, and communication between manager, leader, and subordinates.

Christopher Boomhower (2018)

Christopher Boomhower et.al researched and knowing us the “Employee Attrition-What makes an
Employee Quit.” The study attempts to understand the underlying reasons for employee quit with special
reference to civil workers. The authors examined publicly available data from office of Personnel
Management, the Bureau of Labor Statistics, and IBM. The researchers applied the principal Component

6
Analysis methodology for exploring the reasons of attrition. The findings of the study, that pay scale is one
of the major reasons for quitting the job. The significant highlights of the study show that, significant
reduction in chances of an employee resigning as his or her service length increases, chances increase or
decrease dependent on employee age, and chances of leaving are low if the employee is in the specification
pay plan.

Bradley E. Wright (2010)

The research focus to evaluate the co-relation between job attraction-selection & attrition with
help of a model. The authors have considered the Public Sector for the study and studied public service
motivation between private and public sector lawyers. The other important into understand the influence
of PSM and how PSM influences employee sector choice. The data gathered by the American Bar
Association (ABA) to analyze the employment trends of lawyers. Authors designed survey and participants
for more understanding and Panel Respondent Demographics by Sector of Employment for testing the job
attraction – selection – Attrition model. Researchers concluded, relationship is not straightforward
between PSM and sector employment. Even though the data used in this research do not rule out the
opportunity that these answers are due to adaptation rather than magnetism–selection procedures, these
results do not fully support either mechanism’s opportunities that employee reward preferences will
accord with the purpose each sector assists.

Eric W. MacIntosh (2009)

The objectives of the present research are to study the Impact the culture of the organization and
job pleasure or fulfilment and intention to resign or quit in fitness industries. “The Cultural Index for Fitness
Organizations (CIFO)” was established to evaluate views of organizational culture in the fitness industry to
understand the significance of fitness organizations, researchers conducted interviews with many fitness
company managers, leaders, and experienced staff. The data was collected by administering the
questionnaire with 7-point lid kart scale. The Research included 11 dimensions namely: presence of
organization, successful members, connections, reinforcement, innovation, marketing-sales,

7
organizational reliability, health and ability, maintenance, work ethic, environment. Experimental factor
analysis is employed which discovered eight components that characterize common dimensions of culture
to this context: staff proficiency, ambience, connections among employees, validation, sales, service-gear,
service systems, and organizational existence. Organizational culture proportions, job pleasure and
intention to quit developed by using path analysis. The outcomes of the research showed 14.3% of the
difference in job fulfilment and 50.3% of the difference with intention to quit the organization. It was clear
that, multi-characteristic of organizational culture and its complication in the fitness industry.

Romila Singh (2013)

Authors have used Social Cognitive Career Theory Model to predict the turnover intentions among
women employees. SCCT model is used to predict the correlation among outcome opportunities, select
career goals, effective actions, benefits, self-ability, as they might play the role for developing career. For
further understanding, researchers used STEM theory and found the movement of women's representative
in engineering and physical sciences. Also, they came to know about their intentional declination from
post-degree and the occupation. (Ref: NSF, 2012; Preston, 2004; Society of Women Engineers, 2007).
Authors developed 6 Hypothesis and predicted that - Job mindsets, Relationship between social
intellectual influences and job attitudes, Self-ability will facilitate the relationship between progressive
opportunities at work and job position or job attitudes. Job Attitudes and Self-ability positively corelated
with each other, Job attitudes should be related with Outcome opportunities or expectations. Outcome
expectations will facilitate the correlation between growth opportunities at employment and job attitudes.
Self-ability and outcome prospects will build the correlation between growth opportunities at workplace
and job attitudes.

Kashyap Bhuva (2018)

Kashyap Bhuva et.al (2018) done the research on Machine Learning Techniques for Forecasting the
Employee Attrition rate. Researchers considered IT organization and collected the sample of the employee
database of IBM USA. The current study is based on a workflow for the analytics project and considered

8
segments - 1) Raw Dataset 2) Data Processing 3) Feature Selection & Scaling 4) Modelling 5) Model
Evaluation & Tuning 6) Deployment & Monitoring Index-Terms-Ridge, Lasso, Logistic Failure, Judgmental
structure, Random Forest, straight Discriminant Analysis, Support Vector System analytical techniques are
the analytical tools used in the research. Researchers predicted the attrition of IBM, USA data having 35
data mining techniques and machine learning algorithms by using different algorithms and combinations
of several target attributes. The authors found, why intelligent and effective employee attrition prediction
using data mining as Employee Attrition is one of the biggest Business Problem. Concluded the result of
the prediction by data mining techniques with the similar dataset exposes that Linear Discriminant
Analytical result beats than other one followed by Logistic Regression Model for a particular dataset if
exactness is the metric favored.

Louis Lévy-Garboua (2008)

The researchers done their research on Job Satisfaction and Resignations and used the wealth
maximization theory of quitting behavior on the German Socioeconomic Panel (1985-2003). Authors
observed, job satisfaction is one of the most important reasons for employees who has exceptionally good
experience and competency. Researchers considered Voluntary Retirement, Satisfaction level from the job,
Surprises for the employees, Wealth Maximization model for explaining the research. Authors explained
the reasons for quitting, an employee's tendency to leave or resign from the organization must be
comparable with the present values of future results and the outside opportunities compare to current
job. The results confirmed that through a simple subjective survey questionnaire about work structure,
satisfaction level - Economists can draw or found plenty of hidden information or evidence easily. In-fact
through this survey, a firm or industry can gain a prospect to improve them instead of using any traditional
tools.

Rupesh Khare (2019)

Rupesh Khare et.al conducted their research on Employee Attrition Risk Assessment using Logistic
Regression Analysis and they applied logistic regression technique and predicted employee attrition

9
probability in an organization which is dependent on demographic data of divided employees. Researchers
executed the research based on real life project and collected demographic information from separated
and existing employees. The questionnaire authors prepared, evaluate quitting probability with current
staffs. The authors divided overall Attrition Risk into two part 1) Demographic Risk and 2) Behavioral Risk.
In the present study, based on demographic information, Logistic Regression mainly utilized to predict
employee attrition risk. Along with this a proper retention plan has been mapped to focus on the risk
categories developed.

Resham Sundrani (2019)

The Author, Reshma Sundrani (2019), done her research on “Study on Employee Attrition &
Retention Exploring the Issues and Challenges”. The research investigated & considerable economical and
intangible costs correlated with losing, expert and competent employees. Researcher observed, the best
practices of HR must be reflected in the organization and organizational policy which are related to
employees. The HR strategy should grow from a transactional support role to partnering in the
organizations business strategy.

2.1Employee retention strategies highlighted in the study are


a) If organization acts as a source of pride and affiliation,
b) When they respect their supervisors,
c) If they are compensated and
d) When they perceive their work as meaningful
The Researcher concludes that, HR policies, strategies and programs should go hand in hand along with the
Vision and Mission of the business. Budget controlling measures that include cost cutting can only create
a negative sentiment if employee satisfaction is overlooked.

10
CHAPTER 3
PROBLEM STATEMENT

Element of performance management difficult to make hiring and recruitment plan (How long does
it take to hire employees). Difficult to Measuring employee performance management identify patterns of
employee engagement, employee satisfaction and performance. Difficult to make plan for employee
learning and development are learning and development initiatives having an impact on employee
performance.

Tracking employment contract status, develop strategies and make decisions that will improve the
work environment and engagement levels. Having data-backed evidence means that organizations can
focus on making the necessary improvements and plan for future initiatives.

What amount of investment is needed to get employees up to a fully productive speed?


Which of our employees are most likely to leave within the year?

3.1 DESCRIPTION

The project titled "IBM HR Analytics for Employee Attrition and Performance Prediction" aims to
address the challenge of employee turnover in companies, particularly in the software industry. The study
utilizes IBM's HR Analytics datasets to analyze and predict employee attrition and performance using
various statistical methods and machine learning algorithms. The analysis involves data cleaning, data
visualization, and the application of algorithms like Logistic Regression, Random Forest, Support Vector
Machine, XGBoost, CatBoost, AdaBoost, and LightGBM to understand and mitigate employee attrition.

3.2 IMPORTANCE TO SOLVE THE PROBLEM

Solving the problem of employee attrition is crucial for maintaining organizational stability and
performance. High attrition rates can lead to significant operational challenges, including decreased
productivity, disrupted team dynamics, and increased recruitment and training costs. By identifying the
11
factors contributing to employee dissatisfaction and attrition, companies can develop targeted strategies
to improve employee retention, thereby enhancing overall organizational efficiency and creating a more
stable and productive work environment.

3.3 BACKGROUND INFORMATION NECESSARY TO UNDERSTAND THE PROBLEM

To understand the problem of employee attrition and the project's approach to solving it, several key
background elements are necessary:

1. Employee Attrition: Understanding what employee attrition is and its impact on organizations. This
includes voluntary turnover (employees leaving by choice) and involuntary turnover (employees
leaving due to layoffs or dismissals).
2. HR Analytics: Familiarity with HR analytics and its role in analyzing workforce data to make informed
decisions about employee management and retention.
3. Data Analysis Techniques: Knowledge of statistical tests such as ANOVA and Chi-Square tests used
to analyze the significance of various factors in employee attrition.
4. Machine Learning Algorithms: Understanding how different machine learning algorithms (e.g.,
Logistic Regression, Random Forest, XGBoost, etc.) can be applied to predict employee behavior
and identify key factors leading to attrition.
5. Dataset Characteristics: An overview of the IBM HR Analytics dataset, including its features (both
numerical and categorical), and the process of data cleaning and preprocessing to ensure accurate
analysis.

12
CHAPTER 4
METHODOLOGY

The methodology for IBM HR Analytics Employee Attrition and Performance Prediction is as follows

● Input is taken by loading the ODIR dataset, which contains ocular


● Load the Dataset: The IBM HR Analytics Attrition Dataset is loaded using the pd. read_csv ()
function. The head () and info () methods are used to display the first few rows and get information
about the dataset, respectively.
● Knowing the Dataset: Basic Information about the dataset is generated; numerical and categorical
attributes are enlisted.
● Data Cleaning: Any missing values in the dataset are dropped using the dropna () method.
● Data Visualization: Matplotlib and Seaborn libraries are used to visualize the data.
● Statistical Analysis: The ANOVA Test is performed to analyze the Numerical Features' Importance
in Employee Attrition, while the Chi-Square Test to Analyze the Categorical Feature Importance in
Employee Attrition.
● Data Preprocessing: The target variable 'Attrition' is mapped to binary values (1 for 'Yes' and 0 for
'No'). Selected features are extracted from the dataset and one-hot encoded using the
get_dummies () function.
● Splitting the Dataset: The dataset is split into training and testing sets using the train_test_split ()
method from scikit-learn.
● Implementing Machine Learning Algorithms: Logistic Regression, XGBoost, CatBoost, AdaBoost,
LightGBM, Decision Tree, and Random Forest classifiers are initialized and trained using the training
data.
● Model Evaluation: The accuracy score and confusion matrix are computed to evaluate the
performance of each algorithm on the testing data.
● Results: The results, including the accuracy and confusion matrix, are printed for each algorithm.

13
● Model Performance Comparison: The hvPlot library is used to visualize the ROC curve diagram
comparing the performance of all models used.

4.1 APPROACH
The project employs a structured approach to address the problem of employee attrition through data
analysis and predictive modeling. The main steps include:

1. Data Collection: Utilizing the IBM HR Analytics Attrition Dataset, which contains comprehensive
information on employee demographics, job satisfaction, and performance metrics.
2. Data Exploration and Preprocessing: Conducting initial exploration to understand the dataset,
followed by cleaning and preprocessing steps such as handling missing values, removing redundant
features, and converting categorical variables into numerical formats.
3. Data Visualization: Creating visual representations of the data to identify trends and patterns
related to employee attrition.
4. Statistical Analysis: Applying statistical tests to determine the significance of various features in
predicting attrition.
5. Machine Learning Model Implementation: Training and evaluating multiple machine learning
algorithms to predict employee attrition.
6. Model Evaluation and Comparison: Assessing the performance of different models using metrics
like accuracy, confusion matrix, and ROC curve.

4.2 TECHNIQUES

1. Data Cleaning and Preprocessing:


● Handling Missing Values: Removing rows with missing data to ensure dataset integrity.
● Feature Selection: Dropping features with single unique values and those irrelevant to the
analysis.
● Encoding Categorical Variables: Using one-hot encoding to convert categorical variables into
a numerical format suitable for machine learning models.

14
2. Data Visualization:
● Matplotlib and Seaborn: Tools used for creating plots and charts to visualize the
relationships between different features and employee attrition.
● Visualization Types: Bar plots, histograms, and other charts to illustrate the distribution and
impact of various features on attrition.
3. Statistical Analysis:
● ANOVA Test: Used to identify the significance of numerical features in predicting attrition.
● Chi-Square Test: Applied to categorical features to determine their importance in the
prediction model.
4. Machine Learning Algorithms:
● Logistic Regression: A basic yet powerful algorithm for binary classification tasks.
● Random Forest: An ensemble learning method that uses multiple decision trees to improve
prediction accuracy.
● Support Vector Machine (SVM): A robust classifier that finds the optimal hyperplane for
separating data points.
● XGBoost, CatBoost, AdaBoost, LightGBM: Advanced boosting algorithms that enhance
model performance by combining weak learners to form a strong predictor.
5. Model Evaluation:
● Accuracy Score: Measures the proportion of correctly predicted instances out of the total
instances.
● Confusion Matrix: Provides insights into the true positives, true negatives, false positives,
and false negatives.
● ROC Curve: Plots the true positive rate against the false positive rate to visualize the
performance of classification models.

15
CHAPTER 5

SYSTEM ARCHITECTURE
5.1 ARCHITECTURE DIAGRAM:

Fig 5.1: System Architecture Diagram

5.2 EXPLANATIONS OF THE COMPONENTS AND INTERACTION:

1. Data Collection

Component: IBM HR Datasets

Interaction: The project begins with collecting data from IBM HR datasets, which include
comprehensive details about employees such as demographics, job roles, satisfaction levels, and
performance metrics. This data forms the foundation for subsequent analysis and model building.

16
2. Data Preprocessing

Component: Data Cleaning and Normalization

Interaction: Raw data often contains inconsistencies such as missing values, duplicates, and outliers. The
data preprocessing step involves cleaning the dataset to handle these issues. This includes:

Removing missing values: Ensuring that the dataset is complete.

Normalization: Adjusting numerical values to a common scale without distorting differences in the ranges
of values.

3. Feature Selection

Component: Statistical Analysis (ANOVA and Chi-Square tests)

Interaction: Feature selection is critical for model accuracy and performance. Statistical tests like ANOVA
and Chi-Square are conducted to identify significant features that influence employee attrition. These tests
help in understanding the relationship between different variables and employee attrition, ensuring that
only relevant features are used in the model.

4. Model Training

Component: Machine Learning Algorithms (Random Forest)

Interaction: Various machine learning algorithms are trained on the preprocessed dataset. Each algorithm
interacts with the data to learn patterns and relationships that can predict employee attrition. The key
algorithms used are:

Logistic Regression: A statistical model that predicts the probability of a binary outcome.

Random Forest: An ensemble learning method using multiple decision trees to improve prediction
accuracy.

17
5. Evaluation

Component: Model Performance Metrics (ROC Curve, Cross-Validation)

Interaction: Once the models are trained, their performance is evaluated using various metrics. The ROC
(Receiver Operating Characteristic) curve is particularly useful as it illustrates the true positive rate
against the false positive rate at various threshold settings. Cross-validation is used to ensure that the
models generalize well to unseen data, providing a reliable measure of their predictive power.

6. Data Visualization

Component: Graphs and Charts

Interaction: Visualizations are crucial for interpreting the results of the analysis. They include:

Attrition Rate Visualization: Graphs showing trends in employee attrition across different features such
as age, job role, and satisfaction level.

Model Performance Comparison: Charts comparing the ROC curves of different models, helping in
identifying the best-performing model.

7. Insights and Actionable Recommendations

Component: Analytical Insights

Interaction: The final component involves deriving actionable insights from the data analysis and model
predictions. These insights help HR departments in formulating strategies to reduce employee attrition by
addressing the identified factors contributing to turnover.

18
CHAPTER 6
IMPLEMENTATION
6.1 OVERVIEW:

The implementation of this final year project involves several key stages, integrating R programming
for data analysis and Power BI for visualization. The project aims to analyze and predict employee attrition
using the IBM HR Analytics Attrition Dataset, providing actionable insights through interactive dashboards.
The steps in the implementation process include data collection, data cleaning, statistical analysis,
predictive modeling, and visualization. Below is a detailed description of each stage.

Dataset Description:

During website session, browsing information about visited pages is collected and features are extracted
as follows:

Table 1: Numerical features used in the user attrition analysis model.

Min Max
Feature Name Feature Description Std. Dev
Value Value

Age Age of employee 18 60 9.13

It is the billing cost for an individual's


DailyRate 102 1499 403.50
services for a single day

It is the distance between company and


DistanceFromHome 1 29 8.10
home of the employee

Education qualification of the employees


Education 1 5 1.02
of company

EmployeeCount Count of employee 1 1 0.0

It is a unique number that has been


EmployeeNumber assigned to each current and former 1 2068 602.02
employee

19
It is all about an individual's feelings
EnvironmentSatisfaction about the work environment and 1 4 1.09
organization culture.

The amount of money that is paid to an


HourlyRate 30 100 20.32
employee for every hour worked
Job involvement refers to the degree
JobInvolvement to which a job is central to a person's 1 4 0.71
identity.
Job levels are categories of authority in
JobLevel 1 5 1.10
an organization.
Job satisfaction happens when an
JobSatisfaction employee feels he or she is having job 1 4 1.10
stability.
Gross monthly income is the amount of
4707.9
MonthlyIncome income an employee earns in one 1009 19999
5
month.

If a monthly rate is set, employees


7117.7
MonthlyRate should be paid in exchange for normal 2094 26999
8
hours of work of a full-time worker.

Number of other companies the


NumCompaniesWorked 0 9 2.49
employee previously worked for
The amount a salary is increased of an
PercentSalaryHike 11 25 3.65
employee in percentage
Rating means gauging and comparing
PerformanceRating 3 4 0.36
the performance.
It is the rate of satisfaction between
RelationshipSatisfaction 1 4 1.08
Employer employee relationship.

20
Table 2: Categorical Features used in the User Attrition Analysis Model.

Number of Categorical
Feature Name Feature Description Values
Attrition Attrition in business describes a gradual but deliberate
reduction in staff numbers that occurs as employees 2
retire or resign, [NOTE: Target Variable] (0=no, 1=yes)

BusinessTravel
Business travel is travel undertaken for work or 3
business purposes, as opposed to other types of travel
(1=No Travel, 2=Travel Frequently, 3=Travel Rarely)

Department
Consists three departments that contribute to the 3
company's overall mission. (1=HR, 2=R&D, 3=Sales)
Education field of the employees (1=HR, 2=Life 6
EducationField
Sciences, 3=Marketing, 4=Medical Sciences, 5=others,
6= Technical)
Gender of the employee (1=Female, 2=Male) 2
Gender

JobRole
These refer to the specific activities or work that the 9
employee will perform. (1=HC Rep, 2=HR, 3=Lab
Technician, 4=manager, 5= Managing Director, 6=
Research Director, 7= Research Scientist, 8=sales
Executive, 9= Sales Representative)
MaritalStatus
Marital Status of the employee (1=divorced, 2=married, 3
3=single)
Over18 (1=Yes, 2=No) 2

(1=No, 2=Yes) 2
Overtime

21
CHAPTER 7
DATA EXPLORATION AND PROCESSING

COMPUTE SIZE:

In first step, we try to understand the dataset's size and structure briefly by computing its size.

DATASET:

22
Fig.7.1 Compute Size of Dataset.

The code reveals that the "employee_data" DataFrame contains 1,470 rows and 35
columns, providing a quick overview of its size and structure.

23
DAX FUNCTION USED

24
CHAPTER 8
DATA VISUALIZATION

By analyzing employee data, we can identify factors that contribute to employee attrition, such
as job satisfaction, compensation, and work-life balance. This information can be used to develop
strategies to retain top talent and reduce turnover rates. HR analytics can help identify high-
performing employees by analyzing data related to performance metrics, such as productivity, quality,
and customer satisfaction. This information can be used to develop strategies to retain top talent and
improve overall organizational performance.

8.1 DASHBOARD VISUALS:

Fig 8.1: HR Attrition Dashboard

25
8.2 DASHBOARD: ATTRITION PAGE

Fig 8.2: HR Attrition Page

26
8.3 VISUALIZING THE EMPLOYEE ATTRITION RATE

Fig 8.3: Employee Attrition Rate


Inference:
✓ The employee attrition rate of this organization is 16.12%.
✓ According to experts in the field of Human Resources, says that the attrition rate 4% to
6% is normal in organization.
✓ So, we can say the attrition rate of the organization is at a dangerous level.
✓ Therefore, the organization should take measures to reduce the attrition rate.

27
8.4 ANALYZING EMPLOYEE ATTRITION BY GENDER

Fig 8.4: Employee Attrition by Gender

Inference:

✓ The number of male employees in the organization accounts for a higher proportion than female
employees by more than 20%.

✓ Male employees are leaving more from the organization compared to female employees.

28
8.5 ANALYZING EMPLOYEE ATTRITION BY AGE.

Fig 8.5: Employee Attrition by age

Inference:

✓ Most of the employees are between ages 30 to 40.


✓ We can clearly observe a trend that as the age is increasing the attrition is decreasing.
✓ From the boxplot we can also observe that the median age of employee who left the
organization is less than the employees who are working in the organization.
✓ Employees with young age leaves the company more compared to elder employees.

29
8.6 ANALYZING EMPLOYEE ATTRITION BY BUSINESS TRAVEL.

Fig 8.6: Employee Attrition by Business Travel

Inference:

✓ Most of the employees in the organization Travel Rarely.


✓ Highest employee attrition can be observed by those employees who Travels Frequently.
✓ Lowest employee attrition can be observed by those employees who are non-travel

30
8.7 ANALYZING EMPLOYEE ATTRITION BY DEPARTMENT

Fig 8.7: Employee Attrition by Department

Inference:

✓ Most of the employees are from Research & Development Department.


✓ Highest Attrition is in the Sales Department.
✓ Human Resources Department Attrition rate is also very high.
✓ Though of highest employees in Research & Development department there is
least attrition compared to other departments.

31
8.8 ANALYZING EMPLOYEE ATTRITION BY DAILY RATE

Fig 8.8: Employee Attrition by Daily Rate

Inference:

✓ Employees with Average DailyRate & High Daily Rate are approximately equal.
✓ But the attrition rate is very high of employees with average Daily Rate compared to
the employees with High DailyRate.
✓ The attrition rate is also high of employees with low DailyRate.
✓ Employees who are not getting High Daily Rate are mostly leaving the organization.

32
8.9 ANALYZING EMPLOYEE ATTRITION BY DISTANCE FROM HOME.

Fig 8.9: Employee Attrition by Distance from Home


Inference:

✓ In the organization there is all kind of employees staying close or staying far from
the office.
✓ The feature Distance from Home does not follow any trend in attrition rate.
✓ Employees staying close to the organization are mostly leaving compared to
employees staying far from the organization

33
8.10 ANALYZING EMPLOYEE ATTRITION BY EDUCATION

Fig 8.10: Employee Attrition by Education

Inference:

✓ Most of the employees in the organization have completed Bachelors or Masters as their education
qualification.

✓ Very few employees in the organization have completed Doctorate degree as their education
qualification.
✓ We can observe a trend of decreasing in attrition rate as the education qualification increases.

34
8.11 ANALYZING EMPLOYEE ATTRITION BY EDUCATION FIELD:

Fig 8.11: Employee Attrition by Education Field

Inference:

✓ Most of the employees are either from Life Science or Medical Education Field.
✓ Very few employees are from Human Resources Education Field.
✓ Education Fields like Human Resources, Marketing, and Technical is having very high
attrition rate.

✓ This may be because of work load because there are very few employees in these
education fields compared to education field with less attrition rate.

35
8.12 ANALYZING EMPLOYEE ATTRITION BY ENVIRONMENT SATISFACTION

Fig 8.12: Employee Attrition by Environment Satisfaction

Inference:

✓ Most of the employees have rated the organization environment satisfaction High & Very
High.
✓ Though the organization environment satisfaction is high still there's very high attrition in
this environment.
✓ Attrition Rate increases with increase in level of environment satisfaction.

36
8.13 ANALYZING EMPLOYEE ATTRITION BY JOB ROLES

Fig 8.13: Employee Attrition by Job Roles

Inference:

✓ Most employees are working as Sales executive, Research Scientist or Laboratory


Technician in this organization.
✓ Highest attrition rates are in sector of Research Director, Sales Executive, and Research
Scientist.

37
8.14 ANALYZING EMPLOYEE ATTRITION BY JOB LEVEL.

Fig 8.14: Employee Attrition by Education Field

Inference:

✓ Most of the employees in the organization are at Entry Level or Junior Level.
✓ Highest Attrition is at the Entry Level.
✓ As the level increases the attrition rate decreases.

38
8.15 ANALYZING EMPLOYEE ATTRITION BY JOB SATISFACTION.

Fig 8.15: Employee Attrition by Job Satisfaction

Inference:

✓ Most of the employees have rated their job satisfaction as high or very high.
✓ Employees who rated their job satisfaction low are mostly leaving the organization.
✓ All the categories in job satisfaction are having high attrition rate.

39
8.16. ANALYZING EMPLOYEE ATTRITION BY MARTIAL STATUS

Fig 8.16: Employee Attrition by Marital Status

Inference:

✓ Most of the employees are married in the organization.


✓ The attrition rate is very high of employees who are divorced.
✓ The attrition rate is low for employees who are single.

40
8.17 ANALYZING EMPLOYEE ATTRITION BY MONTHLY INCOME

Fig 8.17: Employee Attrition by Monthly Income

Inference:

✓ Most of the employees are getting paid less than 10000 in the organization.

✓ The average monthly income of employee who has left is comparatively low with employee who is
still working.

✓ As the Monthly Income increases the attrition decreases.

41
8.18 ANALYZING EMPLOYEE ATTRITION BY WORK EXPERIENCE

Fig 8.18: Employee Attrition by Work Experience

Inference:

✓ Most of the employees have worked for less than 2 companies.


✓ There is a high attrition rate of employees who have for less than 5 companies.

42
8.19 ANALYZING EMPLOYEE ATTRITION BY OVERTIME

Fig 8.19: Employee Attrition by Overtime

Inference:

✓ Most of the employees do not work for OverTime.


✓ The feature OverTime is having a very high-class imbalance due to which we cannot
make any meaningful insights.

43
8.20 ANALYZING EMPLOYEE ATTRITION BY SALARY HIKE

Fig 8.20: Employee Attrition by Salary Hike


Inference:

✓ Very Few employees are getting a high percent salary hike.


✓ As the amount of percent salary increases the attrition rate decreases.

44
8.21 ANALYZING EMPLOYEE ATTRITION BY PERFORMANCE RATING

Fig 8.21: Employee Attrition by Performance Rating

Inference:

✓ Most of the employees are having excellent performance rating.


✓ Both the categories in this field are having same attrition rate.
✓ That is why we cannot generate any meaningful insights.

45
8.22 ANALYZING EMPLOYEE ATTRITION BY RELATIONSHIP SATISFACTION

Fig 8.22: Employee Attrition by Relationship Satisfaction

Inference:

✓ Most of the employees are having high or very high relationship satisfaction.
✓ Though the relationship satisfaction is high there is a high attrition rate.
✓ All the categories in this feature are having a high attrition rate.

46
8.23 ANALYZING EMPLOYEE ATTRITION BY WORK LIFE BALANCE.

Fig 8.23: Employee Attrition by Work Life Balance

Inference:

✓ More than 60% of employees are having a better work life balance.
✓ Employees with Bad Work Life Balance are having Very High Attrition Rate.
✓ Other Categories is also having High attrition Rate.

47
8.24 ANALYZING EMPLOYEE ATTRITION BY TOTAL WORKING EXPERIENCE

Fig 8.24: Employee Attrition by Total Work Experience

Inference:

✓ Most of the employees are having a total of 5 to 10 years of working experience. But their
Attrition Rate is also very high.

✓ Employees with working experience of less than 10 years are having High Attrition Rate.

✓ Employees with working experience of more than 10 years are having Less Attrition Rate.

48
8.25 ANALYZING EMPLOYEE ATTRITION BY YEARS AT COMPANY.

Fig 8.25: Employee Attrition by Years at Company

Inference:

✓ Most employees have worked for 2 to 10 years in the organization.


✓ Very few employees have working for less than 1 year or more than 10 years.
✓ Employee who has worked for 2-5 years are having very high attrition rate.
✓ Employee who has worked for 10+ years are having low attrition rate.

49
8.26 ANALYZING EMPLOYEE ATTRITION BY YEARS IN CURRENT ROLE

Fig 8.26: Employee Attrition by years In current role

Inference:

✓ Most employees have worked for 2 to 10 years for the same role in the organization.
✓ Very few employees have worked for less than 1 year or more than 10 years in the same
role.
✓ Employee who has worked till 2 years in the same role are having very high attrition rate.
✓ Employee who has worked for 10+ years in the same role are having low attrition rate.

50
8.27 ANALYZING EMPLOYEE ATTRITION BY YEARS SINCE LAST PROMOTION

Fig 8.27: Employee Attrition by Years since last promotion

Inference:

✓ Almost 36% of employee has not been promoted since 2 to 5 years.


✓ Almost 8% of employees have not been promoted since 10+ years.
✓ All the categories in this feature are having high attrition rate specially employee who
has not been promoted since 5+ years.

51
8.28 ANALYZING EMPLOYEE ATTRITION BY YEARS WITH CURRENT MANAGER

Fig 8.28: Employee Attrition by Years with Current Manager

Inference:

✓ Almost 51% of employees have worked for 2-5 years with the same manager.
✓ Almost 38% of employees have worked for 5-10 years with the same manager.
✓ Employee who has worked for 10+ year with the same manager is having very low
attrition rate.
✓ Other Categories is having high attrition rate.

52
CHAPTER 9

STATISTICAL ANALYSIS

Statistical analysis plays a crucial role in HR analytics by helping organizations make informed
decisions about their human resources and workforce management. It enables evidence-based
decision-making, enhances workforce planning strategies, and fosters a deeper understanding of the
organization's human capital dynamics.

1. Perform ANOVA Test: ANOVA test is used to analyzing the impact of different numerical features on
a response categorical feature.
Inference:

The following features show a strong association with attrition, as indicated by their high F-
scores and very low p-values.
1. Age
2. DailyRate
3. HourlyRate
4. MonthlyIncome
5. MonthlyRate
6. NumCompaniesWorked
7. PercentSalaryHike
8. TotalWorkingYears
9. TrainingTimesLastYear
10. YearsAtCompany
11. YearsWithCurrManager

53
The following features do not show significant relationship with attrition because of their moderate
F-scores and extremely high p-values.
1. DistanceFromHome
2. StockOptionLevel
3. YearsInCurrentRole
4. YearsSinceLastPromotion
It is important for the organization to pay attention to the identified significant features and consider
them when implementing strategies to reduce attrition rates

2. Perform CHI-SQUARE Test: CHI-SQUARE test is used to analyzing the impact of different categorical
features.

Inference:

The following features showed statistically significant associations with employee attrition:
6. JobRole
1. Department
7. JobSatisfaction
2. EducationField
8. MaritalStatus
3. EnvironmentSatisfaction
9. OverTime
4. JobInvolvement
10. WorkLifeBalance
5. JobLevel

The following features did not show statistically significant associations with attrition.

1. Gender
2. Education
3. PerformanceRating
4. RelationshipSatisfaction

It is important for the organization to pay attention to the identified significant features and consider
them when implementing strategies to reduce attrition rates.

54
CHAPTER 10
DATA MODELING

Data modeling plays a significant role in HR analytics when integrating machine learning
techniques. Machine learning algorithms leverage data models to make predictions, classifications,
and recommendations based on patterns and relationships found in the HR data.

Data splitting to train and test:

The data set was split into 70% for training and 30% for testing and we have considered
Attrition as target feature.
Fitting the different machine learning models:
Random Forest Model.
Importing Libraries:

● numpy: A library for numerical operations.


● pandas: A library for data manipulation and analysis.
● seaborn: A library for statistical data visualization.
● matplotlib. pyplot: A library for creating static, animated, and interactive visualizations.

Lists all files in the input directory to understand the available datasets.

55
Loads the dataset into a DataFrame and displays the first few rows.

56
Provides the shape of the dataset, statistical summary of numerical columns, and checks for missing
values.

Out [7]:

Age 0
Attrition 0
BusinessTravel 0
DailyRate 0
Department 0
DistanceFromHome 0
Education 0
EducationField 0
EmployeeCount 0
EmployeeNumber 0
EnvironmentSatisfaction 0
Gender 0
HourlyRate 0
JobInvolvement 0
JobLevel 0
JobRole 0
JobSatisfaction 0
MaritalStatus 0
57
MonthlyIncome 0
MonthlyRate 0
NumCompaniesWorked 0
Over18 0
OverTime 0
PercentSalaryHike 0
PerformanceRating 0
RelationshipSatisfaction 0
StandardHours 0
StockOptionLevel 0
TotalWorkingYears 0
TrainingTimesLastYear 0
WorkLifeBalance 0
YearsAtCompany 0
YearsInCurrentRole 0
YearsSinceLastPromotion 0
YearsWithCurrManager 0

dtype: int64

In [7]:

attrition_count = pd. DataFrame(df['Attrition'].value_counts ())

attrition_count

58
59
60
61
Out[16]:

sns.barplot(x = 'BusinessTravel', y = 'Yes', data = df)

Out[16]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f56383e4110>

62
Out[17]:

plt.figure(figsize = (10,6))

sns.heatmap(df.corr())

Out[17]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f56383a8850>

df = df.drop(['Age' , 'JobLevel'], axis = 1)

63
Data Preprocessing
Converting String columns into integers

Out[19]:

from sklearn.preprocessing import LabelEncoder

for column in df.columns:

if df[column].dtype==np.number:

continue

else:

df[column]=LabelEncoder().fit_transform(df[column])

Model Building
Splits the data into training and testing sets, trains a RandomForestClassifier, and evaluates its performance on the
training data.

In [20]:

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_estimators = 10, criterion = 'entropy',

random_state = 0)

In [21]:

x = df.drop(['Yes'], axis = 1)

64
y = df['Yes']

In [22]:

x_train, x_test , y_train, y_test = train_test_split(x,y, test_size = 0.3,

random_state = 0)

In [23]:

x_train.head()

Out[24]:

rf.fit(x_train, y_train)

Out[24]:

RandomForestClassifier(criterion='entropy', n_estimators=10, random_state=0)

In [25]:

rf.score(x_train, y_train)

Out[25]: 0.9815354713313897

65
Predicting for x_test
In [26]:

pred = rf.predict(x_test)

In [27]:

from sklearn.metrics import accuracy_score

accuracy_score(y_test, pred)

Out[28]:

0.8526077097505669

Makes predictions on the test set and calculates the accuracy of the model, which is 85.26%.

66
CHAPTER 11

CONCLUSION

In conclusion, we embarked on a comprehensive analysis of the IBM HR Analytics Attrition


Dataset, from data loading to model evaluation. By implementing and evaluating various machine
learning algorithms, we gained insights into which models are effective for predicting employee
attrition. The results and visualizations generated throughout the process provide valuable
information for decision-makers and HR professionals seeking to understand and mitigate employee
attrition within the organization. This project showcases the power of data analysis and machine
learning in addressing real-world business challenges.

67
REFERENCES

[1] Yadav, Sandeep, Aman Jain, and Deepti Singh. "Early Prediction of Employee Attrition using Data
Mining Techniques." 2018 IEEE 8th International Advance Computing Conference (IACC). IEEE, 2018.

[2] Setiawan, I., et al. "HR analytics: Employee attrition analysis using logistic regression." IOP
Conference Series: Materials Science and Engineering. Vol. 830. No. 3. IOP Publishing, 2020.

[3] Frye, Alex, et al. "Employee Attrition: What Makes an Employee Quit?" SMU Data Science Review
1.1 (2018): 9.

[4] Wright, Bradley E., and Robert K. Christensen. "Public service motivation: A test of the job
attraction–selection–attrition model." International Public Management Journal 13.2 (2010): 155-
176.

[5] MacIntosh, Eric W., and Alison Doherty. "The influence of organizational culture on job satisfaction
and intention to leave." Sport Management Review 13.2 (2010): 106-117.

[6] Singh, Romila, et al. "Stemming the tide: Predicting women engineers' intentions to leave." Journal
of Vocational Behaviour 83.3 (2013): 281-294.

[7] Brown, Larry K., et al. "Predictors of retention among HIV/hemophilia health care professionals."
General hospital psychiatry 24.1 (2002): 48-54.

[8] Bhuva, Kashyap, and Kriti Srivastava. "Comparative Study of the Machine Learning Techniques for
Predicting the Employee Attrition." IJRAR-International Journal of Research and Analytical Reviews
(IJRAR) 5.3 (2018): 568-577. [9]. Lévy-Garboua, Louis, Claude Montmarquette, and Véronique
Simonnet.
68

You might also like