CV-Based Personality Prediction Using AI
CV-Based Personality Prediction Using AI
net/publication/370799390
CITATION READS
1 270
4 authors, including:
All content following this page was uploaded by Nongmeikapam Thoiba Singh on 06 October 2023.
Abstract - Organization wants to recruit expert candidates for Whenever there is a vacancy in a company and they post
their development. Yet, the primary worry for them is choosing their opening on various Job Portals, in advertisements, and
the right candidate for a particular position. Traditionally, they approach Job Consultancies, etc., they sometimes receive a
were following the method to go through the candidates' CVs or large number of applications, making it difficult for the HR
resumes and recruit them. But nowadays, it’s not possible to Department of the company to sift through these CVs, address
follow this method as every year, they receive a lot of each applicant, and find the most qualified candidate for the
applications, and it will be difficult for them to go through all the vacancy using traditional techniques such as technical tests,
applications and recruit the best candidates. As the corporate interviews, and group discussions. Therefore, in the first round
present reality doesn't focus only on the abilities a potential
itself, they eliminate candidates based on factors such as their
worker has yet in addition their overall personality. Personality
is what that assists one with achieving success in professional as
suitability for the position, their abilities, an improper CV, and
well as in personal life. Subsequently, the Recruiter or the HR their skills [2]. Therefore, in order to reduce the difficulty of
Department should know about the character qualities an the hiring process, we propose a novel method for making the
applicant has. With a remarkable expansion in job searchers yet selection and short listing of candidates easier for
a diminishing in the quantity of positions, it is challenging to organisations, namely the use of personality prediction based
waitlist the best fit possibility for a suitable work by taking a on CV analysis. Whenever there is a necessity in any
gander at the CV physically. In this paper, the proposed system organization and they post their opening on various Job
assists with enlisting the right up-and-comers by parsing the Portals, in commercials, and reach Job Consultancies, and so
information in CVs and resumes and by leading tests to foresee on at times they get an enormous number of utilizations and it
personality of the applicant. Calculated Relapse is utilized to is truly challenging for the HR Division of the organization to
construct the model that will parse the information. Along these go through these number of CVs, tending to every single
lines, the model assists with tracking down the character and candidate and finding the most reasonable contender for the
subtleties of the applicants, like abilities, experience, and so on. prerequisite utilizing customary procedures like specialized
Utilizing this framework, associations can find master candidates tests, meetings, and gathering conversations [3]. In this way,
and make the enlistment office's work more straightforward. in the main round itself, they sift through the applicants in
This paper endeavours to look at changed AI-ML approaches for light of various perspectives like whether they are reasonable
proficiently anticipating character/personality through CV
for the job, their capacities, ill-advised CV, and the abilities of
Analysis.
the competitor. Thus, to diminish the trouble in the recruiting
Keywords - Personality Prediction; CV; Myers-Briggs Type system, we propose another imaginative thought where the
Indicator (MBTI); Machine Learning; Random Forest. most common way of choosing and short posting of applicants
gets more straightforward for the associations that are by
I. INTRODUCTION utilizing character expectation utilizing CV Examination.
Therefore, personality analysis and comprehension are the
One of the most significant parts of the enrolment process
most important considerations. Our project's primary objective
is to really look at the most pivotal components in deciding if
is to create a machine that can conduct reasonable analysis and
an individual is an ideal candidate/applicant for a specific
make fair decisions when selecting competitors. Our
position or not. We can determine a person's capabilities by
undertaking's essential objective is to make character forecasts
determining if they can effectively influence and communicate
in view of an individual's MBTI Test Grade [4]. Many work
with others, which is crucial for the development and growth
searchers will go after a job when the business offers explicit
of an organization [1].We can get to know the capacities of an
business prerequisites and data. Consequently, work hopefuls
individual by checking, deciding whether they can impact and
finish up their web-based CV first prior to stepping through
speak with others effectively, which is very important for the
the examination. Basically, the test we used is the MBTI test.
enhancement and development of an association.
We will learn a person's personality based on their scores in
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 06,2023 at 17:15:38 UTC from IEEE Xplore. Restrictions apply.
each domain, i.e., serious, extraverted, lively, dependable, and a structured form of both the job description and the
responsible. We used a simple resume parser to extract data candidate's profile—the latter of which is derived from a
from the CV, including name, age, gender, etc. Here, after content analysis of the former—to more accurately evaluate
extracting information from the CV and test score, we the applicant's potential for success in the open position. An
generate the individual's score [5]. Finally, after receiving the experiment is performed to determine FoDRA's reliability and
score, the CV is analysed. performance. Our main research shows that FoDRA works,
and it opens up new possibilities for the study of job
II. LITERATURE SURVEY recommender systems (JRSs).
Aseel Kmail et al. [4] expressed that in spite of the fact Firoz Ahmed et al. [11] suggested an automated hiring
that there have been various programmed enlistment process and psychometric testing. The mechanism for
framework in the market and have been doing all around well processing resumes and job applications had been automated.
however once in a while it passes up a great opportunity Instead of using the conventional job search and application
significant data particularly the ones which utilizes watchword process, the authors created a social networking website for
matching example. Thus, the creators have introduced a job seekers and employers that will automatically transmit
programmed enlistment framework which features the resumes to the appropriate company or organisations by
pertinent substance ideas that were not at first perceived by matching the required criteria.
semantic assets.
The proposed system, developed by Aishwarya Popat III. METHODOLOGY
Bondre [5], identifies the most qualified candidate based on Personality is the most reliable predictor of job
their eligibility score after taking an aptitude test and performance, according to numerous scientific researches on
uploading a CV or resume. TF-IDF is utilised to construct the the subject of "Personality as a Prognosticator of Future Job
model. Candidates' characteristics can be evaluated based on Success". Employers can use it as a way to judge a candidate's
their scores, and the graphical representation of their scores professional appearance and figure out if the person will fit in
aids in accurately evaluating their personalities and CVs. well with the company's core values and mission. By
providing recruiters with specific information that greatly
Atharva Kulkarni [6] developed a system for predicting improves the candidate selection process; a personality test
candidate personalities using Natural Language Processing surpasses more conventional information gathering methods
and a variety of machine learning algorithms. In terms of [6]. Personality typology makes use of the introspective self-
accuracy, Random Forest outperforms other algorithms such report known as the Myers-Briggs Type Indicator (MBTI) to
as KNN, Logistic Regression, Support Vector Machine, and categorise people according to their innate tendencies
Naive Bayes. regarding how they take in information and use it to guide
Md. Tanzim Reza [7] used natural language processing their decision-making. The test attempts to measure four
and machine learning to examine a resume, then converted it different personality traits: extroversion/introversion,
to HTML, reverse-engineered it into HTML code, finalised sensing/intuition, thinking/feeling, and judging/perceiving.
the segments, and extracted qualifying features. The model When a four-letter result is generated, such as "INTJ" or
reads a CV, pulls out relevant data, and then organises it into "ESFP," one letter is picked at random from each set.
categories based on the data. Using multivariate logistic The major goal of this group is to help the HR department
regression, the resumes were divided into categories. select the most qualified candidate for a certain job description
However, the dataset was rather small. by drawing attention to qualities beyond education and
Shruti Maheshwari [8] created a system using the Logistic experience, such as personality [7]. The project's ultimate goal
Regression machine learning technique. Psychometric testing is to make it easier for the HR department to hire qualified
and the OCEAN model are utilised to determine the people from among the many applicants who apply for each
applicant's emotional aptitude and personality, respectively. open position.
The candidate information is protected by a password- The quality of work and the likelihood of keeping an
encryption mechanism, and the passwords are only known to employee are both improved by a focus on finding the best
the necessary parties. The dashboard and SMS allow possible applicants. It is crucial for a corporation to retain its
candidates to learn if they have been selected for an interview. talented employees and reduce their turnover rate. Using
Srujami Palkar [9] developed a web-based instrument for personality tests, you can more efficiently screen individuals
personality evaluation and CV analysis. The system analyses for personality and skill. In addition, it helps identify a
resumes through natural language processing and predicts candidate's chances of continuing in the position and
personalities through machine learning. The filtered compatibility with the organization's culture. The automation
candidates produced by the system will be useful for of the HR phase of the electronic hiring procedure saves time
predicting the candidates' skills and outlook. for both the employer and the candidates [8]. This project can
automate the e-recruitment process by changing based on
In order to solve the challenge of matching people with need, and by setting psychological inquiries and personality
jobs, Nikolaos D. et al. [10] devised a content-based questions in the medical profession, psychological effects can
recommendation system that preserves and increases the be predicted when addressing patients' concerns.
Minkowski distance. The proposed algorithm, known as
FoDRA (Four Dimensions Recommendation Algorithm), uses
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 06,2023 at 17:15:38 UTC from IEEE Xplore. Restrictions apply.
Due to the fact that all processes are automated using A. Aptitude Assessment
machine learning techniques, this duty supports the HR The aptitude assessment helps understand the underlying
department in quickly finding competent candidates. patterns of candidate’s interest and predict the stream that the
Additionally, it enables applicants to assess their own potential candidate is interested in. Understanding a candidate’s
and interests so they may decide on a job with clarity [9]. The inherent aptitude is very crucial for an organisation.
organization's selection of candidates is just as important as Candidates can test their aptitude after which a report is
the candidate's selection of the appropriate position. The generated which can assess a candidate's interest. Based on
effectiveness of the candidate determines the organization's this, the Human Resource Manager can place a candidate in
ability to expand. the right team and point out the right candidate for a particular
The work that was suggested will be a more efficient job.
approach to rejecting candidates by conducting a personality • Question Structure
test and deciding whether or not they are qualified for the post.
Using logistic regression and a random forest classifier, the • All the questions are MCQ. Four types of questions are
effort will do this. It is entirely permissible to act in this supported by this application.
manner. The approach assigns a score between 1 and 8 to each
job role based on the knowledge and skills required for that • Normal MCQ questions
role. The system's predicted rating policy will allow the HR • Question with image and option
department to quickly identify candidates [10].
• Question and option with images.
The Big Five, the Rorschach, and the Myers-Briggs Type
Indicator are just a few of the many personality tests available. Questions can be from four main sections namely Science,
In this research, personality predictions are made using the Commerce, Humanities, and Aptitude. To prepare the question
MBTI test. The process workflow is shown in Fig.1 paper the human resource personnel chooses about fifteen
questions from each section. Each question paper contains
sixty questions. The questions are given a weightage
according to the category that particular question falls. The
Login and weightage is made useful for the evaluation of candidate
Registration assessment [12]. It is mandatory for the students to attempt all
the sixty question. The information stored in database is
shown in Fig. 2.
Aptitude
Assignment Student Instructor
Profile Details
Database
Personality
Answers Questions
Test
given Details
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 06,2023 at 17:15:38 UTC from IEEE Xplore. Restrictions apply.
• Handling Imbalanced Dataset Fig.3. Different Personality Types according to MBTI
• Vectorization of Text Data The above representation of MBTI types in Fig. 3 does not
reflect their actual distribution in the dataset. It was clear that
• Model Creation
the dataset needed to be cleaned up a lot to make sure that the
• Model Training proportional representation of each MBTI type was more
accurate.
• Model Evaluation
For the purpose of gaining an understanding of the manner
C. CV Analysis in which type indicators are distributed throughout the dataset,
It is critical for recruiters to analyse CVs to make sure they four unique categories were constructed for the type
don't miss out on strong prospects or put forward the wrong indicators. Introversion (I) and extraversion (E) made up the
ones, but it's also crucial to process all applications quickly to first category, followed by intuition (N) and sensing (S) in the
get back to customers and move through to the interview and second category, thinking (T) and feeling (F) in the third
placement processes [14]. Our project's ultimate objective was category, and judging (J) and perceiving (P) in the fourth
to analyse the whole resume and locate specific terms within category (P). Figure 4 illustrates this point perfectly. As a
it. With the help of Pyresparser, a straightforward resume consequence of this, each category will be represented by a
parser, we were able to extract candidate information like single letter, and the final four letters will each represent one
name, email ID, description, and abilities from submitted CVs. of the 16 personality types described by the MBTI. The INTJ
Pyresparser supports document formats including PDF and personality type is indicated when the first category yields the
DOCX. letter I, the second category yields the letter N, the third
category yields the letter T, and the fourth category yields the
IV. RESULTS AND DISCUSSIONS letter J.
A. Dataset In the first group of introverts and extroverts, called
The 8675-row Myers-Briggs personality type dataset was Introversion (I)/Extraversion (E), introversion is much more
used in this investigation; it is available for public use on common than extraversion. In a similar way, the distribution
Kaggle. This dataset is structured with two columns per row. of Intuition (N) is much bigger than that of Sensing (S) in the
The person's MBTI personality type is listed in the first second category, which is called Intuition (N)/Sensing (S) . In
column, and fifty of their social media posts are listed in the the third group, Thinking (T)/Feeling (F), Feeling (F) is just a
second. Three pipe characters [15] separate each post. Users of little more common than Thinking (T). In the fourth category,
an online forum were polled in two stages: in the first, they called Judging (J)/Perceiving, the distribution of Perceiving
filled out a questionnaire to determine their MBTI type, and in (P) is bigger than that of Judging (J).
the second, they engaged in conversation with other forum After analysing the content of this dataset obtained from an
members. The Python 2D charting software matplotlib was online forum, it became clear that word removal was
used at this point for a sneak peek at the data and to find out necessary. The main issue was that the distribution of MBTI
how the MBTI personality types were distributed across the types in the dataset was not consistent with the distribution of
sample. MBTI types in the population as a whole. This was traced
B. Discussions back to the fact that the data came from a forum where
members discussed their personality types exclusively and that
members commonly referred to themselves as members of
specific MBTI types. In addition, this could reduce the
reliability of the model. The MBTI types were consequently
removed from the dataset using NLTK. After this was done,
the MBTI personality type distribution in the dataset was
recomputed. The dataset was further cleaned up by removing
all URLs and stop words. The text was lemmatized, or
converted from inflected forms to their base words, to improve
the dataset's interpretability.When the random over sampler
function was used, unbalanced data was also handled. This
ensured that the classification of type indicators across four
dimensions was equitable.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 06,2023 at 17:15:38 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Type Indicators in four Dimensions
V. MODEL TRAINING AND TESTING Each internal node represents a test on a feature, each leaf
Models and predictions were made with the help of node represents a class label (decision taken after computing
classification methods like the ones below. all features), and branches represent conjunctions of features
that lead to those class labels [15]. A decision tree is a
A. Random Forest structure similar to a flowchart. A decision is made once all
Random Forest is a method for classifying data that uses a features have been computed, and each internal node
forest-like combination of many decision trees based on represents a test on a feature.
different sets of instances. The values of a random vector E. XGBoost
sampled at random from the same distribution are used to
establish each tree's position in the forest. Using the Random In the domain of supervised learning, XGBoost (eXtreme
Forest algorithm, the results of several decision trees can be Gradient Boosting) is a well-known method for labelling large
put together to find out which category is the most popular. datasets. It uses a training strategy that prevents overfitting by
[12]. constructing shallow decision trees in a sequential fashion in
order to produce reliable results [16].
B. Naive Bayes
VI. RESULTS
It is a form of classification based on the Bayes theorem
and the assumption of predictor independence. Naive Bayes Some of the metrics that were used to measure how well
classifiers, in their simplest form, assume that the presence of the model worked were accuracy, precision, recall, F1-score,
any given feature in a class has no effect on the presence of and ROC area. In Table I, the metrics helps us determine the
any other features in that class [13]. best classification algorithm to use for prediction of the
different indicator types. According to the findings, the
C. Support Vector Machine Random Forest algorithm is the most effective method for
To resolve problems of two-class classification, a making a prediction regarding the first category, which is
supervised machine learning model known as a support vector Introversion (I) and Extraversion (E).The random forest
machine (SVM) applies classification strategies. By randomly algorithm had an accuracy of 0.952 which exceeds all the
putting new samples into one of two categories, the SVM- other algorithms, hence the model is selected for prediction.
trained model becomes a probabilistic binary linear classifier Also, the random forest method should be used to make
(though it is possible to use SVM in a probabilistic predictions for the second group, which consists of "Intuition
classification setting with methods like Platt scaling). The (N)" and "Sensing (S)". The model's accuracy was 0.993%. In
support vector machine (SVM) assigns training examples to choosing a model for the third category, Judging (J) /
points in space in such a way as to maximise the difference in Perceiving (P), XGBoost is the model of choice due to its
size that exists between the two categories [14]. superior performance over other categorization models. Due to
its superior performance in comparison to that of other
D. Decision Tree classification models, the Support Vector Machine (SVM) is
It is a method of classification in which the branches currently the model of choice for predicting the fourth
leading from the root to the leaf each reflect a different category, which is Judging (J) or Perceiving (P). The metrics
classification rule. obtained by the various algorithms used are shown in Table I.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 06,2023 at 17:15:38 UTC from IEEE Xplore. Restrictions apply.
TABLE I. MEASURE OF SCORES OBTAINED BY DIFFERENT ALGORITHMS
Metrics Type Indicators Naïve Bayes SVM Decision Tree Random XGboost
Forest
Accuracy E-I 0.823 0.896 0.796 0.952 0.936
N-S 0.905 0.953 0.797 0.993 0.972
F-T 0.812 0.856 0.749 0.845 0.845
J-P 0.732 0.81 0.735 0.84 0.841
Precision E-I 0.827 0.882 0.835 0.986 0.914
N-S 0.908 0.98 0.765 0.991 0.998
F-T 0.806 0.851 0.747 0.831 0.843
J-P 0.728 0.801 0.772 0.91 0.827
Recall E-I 0.818 0.915 0.74 0.918 0.964
N-S 0.903 0.926 0.863 0.995 0.947
F-T 0.82 0.861 0.751 0.866 0.846
J-P 0.735 0.821 0.662 0.752 0.859
F1-Score E-I 0.823 0.888 0.785 0.951 0.938
N-S 0.905 0.953 0.749 0.848 0.845
F-T 0.813 0.856 0.749 0.848 0.845
J-P 0.731 0.811 0.713 0.824 0.843
ROC-AUC Score E-I 0.823 0.896 0.797 0.952 0.936
N-S 0.905 0.954 0.797 0.993 0.973
F-T 0.812 0.856 0.749 0.845 0.845
J-P 0.732 0.81 0.734 0.84 0.841
VII. CONCLUSION AND FUTURE SCOPE [6] Atharva Kulkarni , Tanuj Shankarwar , and Siddharth Thorat,
“Personality Prediction Via CV Analysis using Machine Learning”,
This paper discusses how analysing a person's personality International Journal of Engineering Research & Technology (IJERT),
has become increasingly important in modern society. The vol.10, pp. 544-547, September 2021.
MBTI test is used to assess an individual's identity and [7] Reza, Md. Tanzim Zaman, Md. Sakib, Analyzing CV/Resume using
personality. The proposed system outperforms the previous Natural Language Processing and Machine Learning, BRAC University,
2017.
system in terms of accuracy. The Random Forest algorithm is [8] Gagandeep Kaur, and Shruti Maheshwari, “Personality Prediction
a useful tool for improving results. The examined CVs include through Curriculam Vitae Analysis involving Password Encryption and
suggestions for improvement in the relevant fields. Many Prediction Analysis”, International Journal of Advanced Science and
businesses may use the suggested method to speed up the Technology, 28(16), pp. 1 – 10, November 2019.
hiring process by considering potential candidates' [9] Rutuja Narwade, Srujami Palkar, Isha Zade, and Nidhi Sanghavi,
“Personality Prediction with CV Analysis”, International Research
personalities. In the future, work can also be done to increase Journal of Engineering and Technology (IRJET), vol. 09, pp. 3220-
the system's effectiveness and performance in order to make 3225, April 2022.
more accurate predictions on a person's personality based on [10] N. D. Almalis, G. A. Tsihrintzis, N. Karagiannis and A. D. Strati,
their CV. "FoDRA — A new content-based job recommendation algorithm for job
seeking and recruiting," 2015 6th International Conference on
REFERENCES Information, Intelligence, Systems and Applications (IISA), 2015, pp. 1-
7.
[1] Nuthalapati Leena, Normala Sandhya, Reddem Sai Archana Reddy,and [11] F. Ahmed, M. Anannya, T. Rahman and R. T. Khan, "Automated CV
Shaik Kasim Saheb, “Personality Prediction Through CV Analysis”, processing along with psychometric analysis in job recruiting process,"
International Research Journal of Modernization in Engineering 2015 International Conference on Electrical Engineering and
Technology and Science, vol.04, pp. 1423-1429, July 2022. Information Communication Technology (ICEEICT), 2015, pp. 1-5.
[2] T. S. Kanchana and B. S. E. Zoraida, "A Framework for Automated [12] A. Arora, N.K. Arora, YC. Hu, S. Tiwari, M. Trivedi and K. Mishra,
Personality Prediction from Social Media Tweets," 2022 IEEE World "Personality Prediction System Through CV Analysis" in Ambient
Conference on Applied Intelligence and Computing (AIC), 2022, pp. Communications and Computer Systems. Advances in Intelligent
698-701. Systems and Computing, Singapore:Springer, vol. 1097, 2020.
[3] G. Sudha, S. K. K, S. J. S, N. D, S. S and K. T. G, "Personality [13] A. Bruno and G. Singh, "Personality Traits Prediction from Text via
Prediction Through CV Analysis using Machine Learning Algorithms Machine Learning," 2022 IEEE World Conference on Applied
for Automated E-Recruitment Process," 2021 4th International Intelligence and Computing (AIC), 2022, pp. 588-594.
Conference on Computing and Communications Technologies (ICCCT), [14] Evanthia Faliagka, Athanasios Tsakalidis and Giannis Tzimas, "An
2021, pp. 617-622 . Integrated E-recruitment System for Automated Personality Mining and
[4] Kmail, Aseel & Maree, Mohammed & Belkhatir, Mohammed & Applicant Ranking", Internet Research, vol. 22, no. 5, October 2012.
Alhashmi, Saadat. (2015). An Automatic Online Recruitment System
[15] J. Xu, W. Tian, G. Lv, S. Liu and Y. Fan, "Prediction of the Big Five
Based on Exploiting Multiple Semantic Resources and Concept-
Personality Traits Using Static Facial Images of College Students With
Relatedness Measures. IEEE 27th International Conference on Tools
Different Academic Backgrounds," in IEEE Access, vol. 9, pp. 76822-
with Artificial Intelligence (ICTAI), pp.620-627.
76832, 2021.
[5] Rout, Jayashree & Bagade, Sudhir & Yede, Pooja & Patil, Nirmiti, [16] R. K. Cherukuru, A. Kumar, S. Srivastava and V. Kumar Verma,
“Personality Evaluation and CV Analysis Using Machine Learning "Prediction of Personality Trait using Machine Learning on Online
Algorithm”,International Journal of Computer Sciences and Texts," 2022 International Conference for Advancement in Technology
Engineering, vol.07, pp. 1852-1857, May 2019. (ICONAT), 2022, pp. 1-8.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on October 06,2023 at 17:15:38 UTC from IEEE Xplore. Restrictions apply.