0% found this document useful (0 votes)
161 views4 pages

Automated Resume Classification System

The document presents an Automated Resume Classification System using an ensemble deep-learning model to efficiently classify resumes based on candidates' skill sets. It discusses the methodology, including data preprocessing, model architecture, and performance metrics, achieving an accuracy of 88%. The system aims to streamline the recruitment process by reducing the time required for manual resume screening.

Uploaded by

eng24cse0001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views4 pages

Automated Resume Classification System

The document presents an Automated Resume Classification System using an ensemble deep-learning model to efficiently classify resumes based on candidates' skill sets. It discusses the methodology, including data preprocessing, model architecture, and performance metrics, achieving an accuracy of 88%. The system aims to streamline the recruitment process by reducing the time required for manual resume screening.

Uploaded by

eng24cse0001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)

Automated Resume Classification System Using


Ensemble Learning
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS) | 979-8-3503-9737-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICACCS57279.2023.10112917

Spoorthi M Indu Priya B Meghana Kuppala


Department of Computer Science Department of Computer Science Department of Computer Science
and Engineering and Engineering and Engineering
Gokaraju Rangaraju Institute of Gokaraju Rangaraju Institute of Gokaraju Rangaraju Institute of
Engineering and Technology Engineering and Technology Engineering and Technology
Hyderabad, India Hyderabad, India Hyderabad, India
rajoslinc@[Link] [Link]@[Link] kuppalameghana3102@[Link]

Vaishnavi Sunilkumar Karpe Divya Dharavath


Department of Computer Science and Engineering Department of Computer Science and Engineering
Gokaraju Rangaraju Institute of Engineering and Gokaraju Rangaraju Institute of Engineering and
Technology Technology
Hyderabad, India Hyderabad, India
[Link]@[Link] divya773181@[Link]

Abstract—One of job recruiters’ biggest challenges is require manual human labor. We propose a mechanism
selecting a suitable resume from the pool of resumes. For a that allows the recruiters to recruit based on the skill set
single role, thousands of candidates send their resumes. and information mentioned in the resume of the
Manually selecting the resume from a large number of candidates, by using an ensemble deep-learning model,
applicants and assigning them suitable positions is time which classifies the given resume into various categories.
taking and not feasible. An automated system can make this This reduces the time to screen a resume manually. Ease
process easy and efficient. This system takes candidates’ of Use.
resumes in word, pdf, or any format and classifies them
according to the skill set mentioned in the resume. We
propose an ensemble deep-learning model to classify the II. LITERATURE SURVEY
resume. The research paper “A machine learning approach for
Automation of resume recommendation system” describes
Keywords—resume; classification; job; deep learning; an algorithm that analyses the characteristics extracted
ensemble learning; skill set.
from the resume and categorizes those characteristics
according to the job description. The categorized resume
I. INTRODUCTION is mapped and recommends the candidate who is more
Recruitments in the Information Technology field suitable for the position. They have built two models. A
have been increasing exponentially. Recruiters must Classification model was built using several algorithms
properly screen resumes to hire suitable candidates. The like the random forest, Multinomial Naïve Bayes, Logistic
process of checking if a candidate is suitable for a Regression, and Linear Support Vector Machine
particular role according to the information from their Classifiers. Among these models, the SVM model’s
CV/Resume is called resume screening. Recruiters have to performance was better. The Recommendation model was
screen through a large amount of resume data fast and built on content-based recommendation and K-Nearest
reliably. Neighbours [1].
The most important and basic tool in any selection The research paper “automated tool for resume
process is the candidate’s resume. Interviewing has classification using semantic analysis” presents the
become a time-consuming affair. The number of development of a resume classification application. It uses
applications is in the millions, making it time-consuming a voting classifier which is based on ensemble learning. It
to sort through them. Here we need a machine learning categorizes a candidate’s profile into an appropriate
algorithm that can give a better way of screening and full domain in accordance with the work experience and other
fill the requirements in the industry. details given by the applicant in the profile [2].
The world of Artificial Intelligence and Machine The research Paper “Resume Classification using
Learning has grown significantly. In the discipline of various Machine Learning Algorithms” describes a model
machine learning, a dataset is used to train a model to using Nave Bayes, Random Forest, and SVM, which
predict the intended outcome from incoming data. The extracts skills and shows diverse capabilities under
large amounts of data available have contributed to appropriate job profile classes. Random Forest gave the
significant growth in the performance of ML models best accuracy among the three of them [3].
recently. We can take advantage of this growth in ML for The research paper “Differential Hiring using a
automation and increase productivity in the areas which Combination of NER and Word Embedding” describes a
979-8-3503-9737-6/23/$31.00 ©2023 IEEE

1782
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on March 13,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)

methodology using the NLP, Word2Vec which is a pre- 1) Data Cleaning: In the cleaning process, numbers,
trained word embedding layer. Word embedding is a special characters, and words with single letters are
method of expressing words as real-valued vectors that removed. Then we get the cleaned resumes .
convey their meaning in such a way that words that are 2) Tokenization: Tokenization was performed on the
adjacent to one another in the vector space are assumed to
resume data using the tokenizer class of TensorFlow.
have identical meanings. This will help to get the most
accurate resume according to the skillset provided [4]. 3) Removal of stop words: Words such as ‘is’, ‘are’,
‘was’ etc are called stop words. They appear in most of
The research paper “Resume Screening Using the text. Such stop words that do not provide any
Machine Learning and NLP: A Proposed System” important information for the classification task are
proposes a machine learning model which takes a removed from the generated tokens.
student's resume and according to skills and other details English stop words were imported from the NLTK
mentioned in the resume, the model shows suitable job corpus and used for the stop word removal process.
roles and the resume's relevance to the job description [5].
4) Label encoding: Label encoding should be done to
“Resume Ranking based on Job Description using
SpaCy NER model” devised a method that lowers hiring assign a numerical label to all categories. The sklearn
costs and speeds up the process of selecting the best Label Encoder was used. Fig. 2. shows how after label
candidate for the job role [6]. encoding, the categories are given unique numeric values.
Number of instances of different domains in the dataset is
“Domain Adaptation for Resume Classification Using shown in Fig. 1.
Convolutional Neural Networks” employs a classifier to
categorize resume data after training it on a large number
of openly accessible job description excerpts. Despite just
having a tiny amount of labeled resume data at their
disposal, they empirically confirmed a respectable
classification performance of the approach [7].
The research work “A Hybrid Approach to Conceptual
Classification and Ranking of Resumes and Their
Corresponding Job Posts" presents a hybrid approach
using a conceptual-based classification of resumes and a
ranking system that ranks the candidates according to the
corresponding job offers. They collected 2000 resumes
from online sites and 10,000 different job postings for the
experiment. They used job titles and skill sets in the
classification process. They got higher precision results
[8].
In “Towards an automated system for intelligent
screening of candidates for recruitment using ontology
mapping EXPERT”, EXPERT mapping-based candidate
screening, an intelligent ontology tool, was utilized to Fig. 1. Number of instances of different domains in the dataset.
construct an automated system for the intelligent
screening of prospects for recruitment, improving the
precision with which candidates are matched to the job
criteria [9].

III. METHODOLOGY

A. Data Collection and Visualization


The data was collected from Kaggle. The data is in
Comma Separated Value(CSV) format, with two columns
Category, and Resume. The category column is the
resume’s sector or field, and the resume column is the
content of the resume. There are 962 resumes in 25
different categories.

B. Data Preprocessing
Data Preprocessing involves converting raw data in
the dataset suitable to our task. The information supplied
by the Curriculum vitae in this procedure would be
cleaned. Unnecessary data would be removed. Then the
data would be converted into vectors. The following steps Fig. 2. After label encoding
were performed in the data pre-processing of resume data.

1783
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on March 13,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)

C. Model Architecture The Max pooling layer is used to downsample the feature
We created an ensemble model using 1D space. The pool size is 2. Each of the dimensions of the
Convolutional Neural Network (CNN) and Bi-directional output can be considered an ‘extracted feature’. Then we
Gated Recurrent Unit (GRU), as in Fig. 3. They both act used a flattened layer. Then this is fed to a drop-out layer.
as two channels in our model. Each and every text The drop-out rate is 0.5. It is used to regularize learning
message is mapped to a reality we used the pre-trained and prevent overfitting.
word embeddings trained with a skip-gram model using At last, a softmax layer is added.
the 3-billion-word Google News corpus [10]. The input
sequences are fed to the embedding layer. In channel 2, the output from the embedding layer
feeds into a GRU layer. Then the output is fed to a drop-
In channel 1, the embedding layer’s output is fed into out layer. The drop-out rate is 0.5. It is used to regularize
a 1D convolutional layer. The number of filters is 100. learning and prevent overfitting. At last, a softmax layer is
The kernel size is 3. The rectified linear unit (ReLU) is added.
used as the activation function. ReLU helps in preventing
the exponential computation growth in neural networks. The output from both channels is combined to get the
The input feature space is convolved as a result of this. final output.
The convolved input feature space is then down-sampled. Fig. 3. Shows the model architecture.

Fig. 3. Model Architecture

IV. EXPERIMENT TABLE I. PERFORMANCE OF THE MODEL


We used Keras with the Tensorflow backend. For each Precision Recall F1- Score
dataset, we split it into 80:20. Accuracy and other metrics
are shown in tables Table. I, Table. II. Macro 0.84 0.85 0.83
Weighted 0.86 0.88 0.86
V. RESULT
TABLE II. TRAINING AND TESTING ACCURACY
Table I. gives precision, recall, and F-1 score values.
Accuracy
Table II. gives accuracy.
Training 90.377
Fig. 4 is the confusion matrix. Testing 88.083
Creating a web app: After training the model we
created a web application using streamlit, where the user
can upload resumes in .pdf, .docx, and .txt formats. The
resume will be classified into a suitable category by
clicking the submit button. Web app is shown in Fig. 5
and output in web app is shown in Fig. 6.

1784
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on March 13,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)

VI. CONCLUSION AND FUTURE WORK


We have studied the performance of a CNN + GRU
model for the resume classification task.
The resume classification model will improve the
effectiveness of the recruitment process. This strategy will
help organizations to streamline the hiring process and
save time. We created an Automated Resume
Classification model which classifies the resume with an
accuracy of 88%
We will explore further different aspects like other
structures of neural networks; different types of word
embedding layers etc.
Future work includes the ranking to the resume
classification with the Ensemble model
Fig. 4. Confusion Matrix
We classified resumes only based on skill set but then
we can classify them by adding more criteria.

REFERENCES
[1] Roy, P. K., Chowdhary, S. S., & Bhatia, R. (2020). A Machine
Learning approach for automation of Resume Recommendation
system. International Conference on Computational Intelligence
and Data Science, 167(Elsevier B.V), 2318–2327.
[2] Gopalakrishna, S. T., & Varadharajan, V. (2019). AUTOMATED
TOOL FOR RESUME CLASSIFICATION. International Journal
of Artificial Intelligence and Applications, 10. Bengaluru.
[3] Pal, R., Shaikh, S., Bhagwat, S., & Satpute, S. (2022). Resume
Classification using various Machine Learning Algorithms.
International Conference on Automation, Computing and
Communication, 44. Navi Mumbai.
[4] (2020). Differential Hiring using a Combination of NER and Word
Embedding. International Journal of Recent Technology and
Engineering.
[5] Kinge, B., Mandhare, S., Chavan, P., & Chaware, S. M. (2022).
Fig. 5. Web app Resume Screening Using Machine Learning and NLP : A Proposed
System. International Journal of Scientific Research in Computer
Science, Engineering and Information Technology, 8(2), 253-258.
[6] [Link], [Link], [Link], [Link], [Link],
& [Link]. (2020). Resume Ranking based on Job Description
using SpaCy NER model. International Research Journal of
Engineering and Technology, 07(05), 74-77.
[7] Sayfullina, L., Malmi, E., Liao, Y., & Jung, A. (2017). Domain
Adaptation for Resume Classification Using Convolutional Neural
Networks. Springer, Cham.
[8] Zaroor, Abeer & Maree, Mohammed & Sabha, Muath. (2017). A
Hybrid Approach to Conceptual Classification and Ranking of
Resumes and Their Corresponding Job Posts. 10.1007/978-3-319-
59421-7_10.
[9] V, Senthil kumaran & Annamalai, Sankar. (2013). Towards an
automated system for intelligent screening of candidates for
recruitment using ontology mapping EXPERT. International
Journal of Metadata, Semantics and Ontologies. 8. 56-64.
10.1504/IJMSO.2013.054184.
[10] Zhang, Z., Robinson, D., & Tepper, J. (2018). Detecting Hate
Speech on Twitter Using a Convolution-GRU Based Deep Neural
Network (Vol. 48). Europe: Springer, Cham, June 2018.
doi:10.1007/978-3-319-93417-4_48

Fig. 6. Output in web app

1785
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on March 13,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.

Common questions

Powered by AI

The main benefits of using an ensemble deep-learning model for resume classification include improved accuracy and efficiency in the screening process. Ensemble models, such as the combination of CNN and GRU used in the study, leverage multiple learning algorithms to obtain better predictive performance than any of the constituent learning algorithms alone. They can process large volumes of data quickly, ensuring more accurate resume classification by extracting and categorizing relevant skills and attributes from resumes. This method reduces manual workload, speeds up the recruitment process, and enhances the accuracy of matching candidates to job requirements by utilizing sophisticated data patterns recognition .

Ensemble learning is considered over other machine learning approaches for developing a resume classification system due to its ability to improve prediction accuracy and robustness. By combining multiple models, ensemble methods can capture a more comprehensive picture of the data's patterns compared to a single model, which might be limited in its prediction capabilities. In the context of resume classification, using ensemble learning such as voting classifiers allows for diverse models like CNN, SVM, and GRU to work together, reducing the likelihood of errors and increasing the overall accuracy of classifications. This approach is particularly beneficial in handling the complex and varied nature of resume data, where different algorithms provide complementary strengths .

The model uses word embeddings in the classification process as part of its initial layer, mapping each word from resumes into a multi-dimensional vector space where semantically similar words have closer vector representations. This process allows the model to capture the latent syntactic and semantic patterns in the text data. By using pretrained embeddings like those from the Google News corpus, the model can leverage a vast amount of language data, improving its ability to generalize and understand context within resumes, even if specific phrases do not appear in the training dataset. The embeddings effectively reduce the dimensionality of input features and improve the learning efficiency and accuracy of the model .

The proposed system ensures fairness and objectivity in the resume selection process by utilizing standardized algorithms that consistently apply predefined criteria across all resumes. The use of ensemble learning models helps mitigate individual biases from single algorithm types and allows for a more balanced evaluation. Moreover, the automation of the classification process minimizes human bias, providing a more equitable assessment based on the quantitative analysis of skill sets and relevant candidate data without the influence of subjective human judgments .

Traditional resume screening methods face several challenges that automated systems aim to address. These include the time-consuming and labor-intensive nature of manual sorting, the potential for human bias in selection, and the difficulty in consistently applying evaluation criteria across large volumes of applications. Automated systems can process thousands of resumes swiftly and objectively, applying uniform standards to each applicant, which increases fairness. They also free up recruiters to focus on more complex tasks that require nuanced human judgment. By leveraging machine learning algorithms, automated systems improve the efficiency and accuracy of resume classification based on relevant skill sets and job requirements .

Data preprocessing plays a crucial role in the pipeline of the automated resume classification system as it prepares the raw data into a suitable format for model training. This involves several steps: cleaning the text by removing numbers, special characters, and unnecessary words to avoid clutter that does not contribute to classification; tokenization to break down the text into manageable pieces or tokens; and the removal of stop words that do not provide meaningful information for machine learning tasks. Additionally, preprocessing involves encoding labels into a numerical form that can be understood by machine learning algorithms. These steps ensure that the dataset fed into the model is consistently formatted and ready for effective feature extraction, leading to more accurate and efficient model predictions .

The dropout layer plays a pivotal role in the model architecture by mitigating the risk of overfitting during training. In the context of resume classification, the dropout layer randomly sets a fraction of input units to zero at each update during the training phase, which prevents the network from relying too heavily on specific nodes. This helps the model become more robust and generalizable by ensuring that it learns a more dispersed pattern recognition strategy, rather than memorizing specific input data. The dropout rate, such as 0.5 used in the described model, is a hyperparameter that controls the amount of dropout applied .

The development of automated resume classification systems in the IT field significantly impacts the future of recruitment by enabling more efficient and scalable candidate screening processes. By automating the evaluation and categorization of resumes, recruitment processes can handle larger volumes of applications, reducing time and operational costs associated with hiring. These systems facilitate more precise matching of candidate skills to job requirements, potentially improving hiring quality and reducing turnover rates. Furthermore, as these technologies evolve, they can integrate more complex features such as personality assessments and cultural fit, shaping the future landscape of talent acquisition in the IT industry and beyond .

Future improvements to the automated resume classification model can include exploring other neural network architectures, trying different types of word embedding layers to capture semantic nuances better, and incorporating additional criteria beyond skill sets for classification. Enhancing the model to rank resumes might provide more nuanced insights into candidate compatibility. Integrating features such as work experience, project achievements, and educational background could also refine classification. Expanding the training data to include more diverse job descriptions and candidate profiles can increase the robustness and generalizability of the model. Furthermore, incorporating domain adaptation and domain-specific tuning might enhance the model's performance across different industry sectors .

The CNN and Bi-Directional GRU model architecture improves resume classification by leveraging the complementary strengths of both neural network types. The 1D CNN layer processes input data in a way that captures local patterns through convolutional filters, which helps in summarizing sequences of text (like resumes) into high-level features. The GRU, on the other hand, is adept at capturing dependencies in sequences, including long-term dependencies that might span several words or sentences. By using bi-directional GRUs, the model is able to process the sequence in both forward and backward directions, thus gaining insights from context that precedes and follows any given point in the text. This dual-layer architecture improves the accuracy of the classification task by effectively synthesizing spatial and sequential information from resumes .

You might also like