DOI: 10.1002/pra2.
324
SHORT PAPERS
Community-based data integration of course and job
data in support of personalized
career-education recommendations
Guoqing Zhu1 | Naga Anjaneyulu Kopalle2 | Yongzhen Wang2 |
Xiaozhong Liu2 | Kemi Jona3 | Katy Börner2
1
School of Maritime Economics and
Management Dalian Maritime University, Abstract
China How does your education impact your professional career? Ideally, the courses
2
Indiana University Bloomington, you take help you identify, get hired for, and perform the job you always
3
Northeastern University, wanted. However, not all courses provide skills that transfer to existing and
Correspondence
future jobs; skill terms used in course descriptions might be different from those
Guoqing Zhu, School of Maritime listed in job advertisements; and there might exist a considerable skill gap
Economics and Management Dalian between what is taught in courses and what is needed for a job. In this study,
Maritime University, Dalian, 116026,
China. we propose a novel method to integrate extensive course description and job
Email: [email protected] advertisement data by leveraging heterogeneous data integration and commu-
nity detection. The innovative heterogeneous graph approach along with identi-
fied skill communities enables cross-domain information recommendation, for
example, given an educational profile, job recommendations can be provided
together with suggestions on education opportunities for re- and upskilling in
support of lifelong learning. Note: This work was partially supported by the
National Science Foundation under award 1,936,656. Any opinions, findings,
and conclusions or recommendations expressed in this material are those of the
author(s) and do not necessarily reflect the views of the NSF.
KEYWORDS
career, data/graph mining, education, information recommendation
1 | INTRODUCTION skill discrepancies between research, education, and jobs
should be minimized (Börner et al., 2018).
Lifelong learning, the pursuit of knowledge for either From an information recommendation perspective,
personal or professional reasons, enhances employees'/ although existing job recommendation systems meet some
professionals' competitiveness and career satisfaction. of the needs of students, their impact is limited. Generally,
Students often explore various kinds of educational job/career and course/education recommendation systems
opportunities to reach their career goals. Generally, the implement the same multi-task learning framework: Given
education system should serve the career ecosystem, and a user's career, educational, or hybrid information needs,
83rd Annual Meeting of the Association for Information Science & Technology October 25-29, 2020.
Author(s) retain copyright, but ASIS&T receives an exclusive publication license
Proc Assoc Inf Sci Technol. 2020;57:e324. wileyonlinelibrary.com/journal/pra2 1 of 6
https://doi.org/10.1002/pra2.324
2 of 6 ZHU ET AL.
F I G U R E 1 Heterogeneous graph index schema. The orange circles represent the job nodes, the blue circles represent the skill nodes,
and the green circles represent the course nodes. The directed orange edges represent the job to skill relations, the directed blue edges
represent the skill that was required by a job to a skill that was covered by a course via the linked relation, the directed green edges
represent the course to skill via the covered relation, the directed red edges represent the course to the course via the pre-required relation.
The dotted circle indicates the skill community to which the job and course belong to
recommend various types of jobs/courses simultaneously. 2 | L I T E R A T UR E R E V I E W
Notwithstanding a rich body of literature on course recom-
mendation and job recommendation, few studies consider Recommender systems have been broadly applied in
both of them simultaneously (Li et al., 2017). the context of course planning. For most of these stud-
In this study, we propose a novel method to integrate ies, courses were recommended to target users based on
job/career and course/education data by employing het- other users' feedback, overall user performance or
erogeneous graph indexation. As shown in Figure 1, two similarities between course materials (Li et al., 2017;
different sources of data on education and career devel- Wang, Liu, & Chen, 2017). For instance, Nguyen,
opment are integrated into a heterogeneous graph, with Pham, Vo, Vo, and Quan (2018) applied sequential rule
skills serving as the bridge. Because job skills and course mining for pairs of courses and grades and recommend
skills are different, skill communities computed by the the course with the best performance. Generally, few
Infomap algorithm, help connect jobs and courses data. course recommendation systems consider users' future
Given a properly indexed heterogeneous graph, we can career goals or target jobs (Ma & Ye, 2018). Many
then apply a random walk algorithm to generate person- graph-based course recommender systems have been
alized suggestions for courses and jobs by considering developed. For instance, Bridges et al. (2018) made per-
future career goals or planned educational experiences. sonalized suggestions about which course should he/she
To showcase that students are able to benefit from enroll based on a directed graph that gathered students'
this graph-based data integration, we conducted prelimi- grades and enrollment history. However, existing graph-
nary experiments, using the course data from Indiana based course recommendation research focuses on the
University and IT industry job postings. Results demon- education domain only. To the best of our knowledge,
strate that the proposed approach can effectively provide this study is one of the first investigations of graph-
course recommendations in light of given career goals. based cross-domain recommendation that leverages
The proposed method can be generalized to different massive education and career data via community-based
education/career environments. data fusion.
ZHU ET AL. 3 of 6
In recent decades, similar to course recommendation greedy match algorithm, all the IUB-SICE courses are pro-
systems, job recommendation systems have generated jected to related skills from MOOCs. At the end, each
tremendous interest in the research community. Some course in the educational data set has four features
researchers have studied job recommendations from the (i.e., the course ID, the course name, the course descrip-
perspective of career paths (Patel, Kakuste, & tion, and all related skills). A total of 266 university courses
Eirinaki, 2017). Some approaches used social networks to and 376 skills are included in the educational dataset.
generate job recommendations. Shalaby et al. (2017) pro- Job/Career dataset was compiled from Career-
posed a graph-based approach that uses the relationship builder2 job advertisements downloaded in December
between user-work interaction and job posting content 2019. Popular IT job titles3 were used as the search query,
for real-time job recommendations. Broadly speaking, redundant jobs were removed, and the final data com-
prior research work on job recommendations mainly prises a total of 20,000 jobs and the 1,611 skills associated
focuses on information about the user's professional with them. The job advertisements were analyzed and
experience, without considering the user's educational five features were extracted: the job ID, the job title, the
history. Besides, although some studies use graph-based company, the location, and the list of required skills.
methods, they only focus on a single career domain.
This study goes beyond the existing work by pre-
senting a novel graph-based approach that recommends 3.2 | Heterogeneous graph-based data
rank-ordered courses or jobs for a student (or junior indexation and skill community detection
employee) by considering his/her education/career his-
tory and leveraging a heterogeneous graph that integrates The main bridge for integrating career and education data
education and career data. is a set of skills required by a profession and a set of skills
covered by each course (Li et al., 2017). However, skills
listed in courses and skills listed in jobs differ–they utilize
3 | RESEARCH METHODS different vocabularies. In our dataset, we only identified
79 overlapping skills across job and course data. In order
In this section, we discuss the proposed method in detail, to address this challenge, we utilize the Infomap algorithm
which includes: collecting career data and education data (Rosvall & Bergstrom, 2008) to detect skill communities in
(Section 3.1), integrating two datasets using skills com- the target graph M. The Infomap method works as fol-
munities computed using Infomap plus heterogeneous lows: simulates a random walker wandering on the graph
graph and indexing (3.2), ranking courses (as a case for m steps and indexes his random walk path via a two-
study) via a graph-enabled cross-domain ranking func- level codebook (a global index codebook and each com-
tion that uses a random walk algorithm (3.3), and run- munity having a code book). The goal is to generate a
ning a preliminary user study (3.4). community partition with the minimum random walk
description length, which is calculated as follows:
3.1 | Data collection X
m X
m
Lðπ Þ = qi H ð Q Þ + pi H P i ð1Þ
i i=1
The dataset collected in this project includes two types
of data:
Where L(π) is the description length for a random
Courses/Education data was gathered from the walker under current community partition π. qi and pi
course enrollment logs of the Luddy School of Informatics, are the jumping rates between and within the ith commu-
Computing, and Engineering (SICE), Indiana University nity in each step. H ðQÞ is the frequency-weighted aver-
Bloomington (IUB), covering four academic years from age length
of codewords in the global index codebook
2016 to 2019. This data consists of 7,824 students, and H P i is the frequency-weighted average length of
371 courses, and 188,881 records of course enrollment from codewords in the ith community codebook.
five departments over 16 academic semesters. Original We first create a career graph and an education graph
course data does not have information on the skills taught individually. By using Infomap, all the skill nodes are
by a course. In order to interlink the courses and skills, we grouped into communities (e.g., programming community
extracted 957 MOOCs1 and 1,011 related skills in the field and AI community). Then, the most similar career and
of computer science, informatics, information and library education communities (by counting the overlapping
science, intelligent system engineering, and statistics via skills) are merged along with the associated jobs and
automated web scraping techniques. By leveraging the courses (see Figure 1).
4 of 6 ZHU ET AL.
T A B L E 1 Nodes and relations in the constructed community. Because of community restriction, the noisy
heterogeneous graph similar skill pairs will not pollute the graph accuracy.
Nodes and
The complete network graph has a total 22,253 nodes
relations Description and 95,712 edges. There are 20,000 jobs, 266 university
courses, 1,987 course and job skills, and the numbers of
C The course nodes r p
the various relations are: 73,560 J ! S , 11,155 C ! C ,
J The job nodes g sim
641 C ! S, 10,356 S ! S respectively.
S The skill nodes
p
C!C The course to course edge via the
pre-required relation 3.3 | Heterogeneous graph-enabled
c
C!S The course to skill edge via the covered cross-domain recommendation with
relation community restriction
r
J !S The job to skill edge via the required
relation In this section, we apply a graph-enabled cross-domain
l
S!S Skill to skill edge (skill-skill text similarity ranking approach to make education/career recommen-
within each community based on BM25). dations. For each query node in the graph, we retrieve
target candidate nodes and make suggestions based on
their ranking scores. The ranking score of each candidate
node comes from the meta-path-based ranking function
By using this method, all the education and career (Liu, Yu, Guo, & Sun, 2014) along with the community
data (skills, courses and jobs) are integrated into the structure. On the heterogeneous graph, the meta-path
same heterogeneous graph G = (V, E). In this graph, we defines the connection between query nodes and result
have defined a node type mapping function τ : V ! O nodes. For the same recommendation task (for example,
and an edge type mapping function ϕ : E ! R, where recommending courses to different types of users), there
each node v ∈ V belongs to one particular variable τ(v) ∈ are usually multiple meta-paths on the heterogeneous
O, and each edge e ∈ E belongs to one particular relation graph. Besides, when we change the type of query node
ϕ(e) ∈ R. If two links belong to the same relation type, and recommended node, this method can be generalized
the two links share the same starting object type and the to other recommendation tasks, for example, rec-
ending object type. The nodes and relations are described ommending a job to a student or professional.
in Table 1. To quantify the ranking score of a candidate's nodes
For any node on the graph, the sum of the same type following the meta-path, a random walk-based measure
of outgoing links equals 1. For instance, the weight of the is proposed (Liu et al., 2014). It can be represented by:
p dC !p C
link from Ci to Cj is defined as w C i ! C j = p ,
i j
X
ð1Þ ðl + 1Þ
p
d C i !C s vi ! vj = RW ðt Þ ð2Þ
ð1Þ ðl + 1Þ
where d C i ! C j is the number of students who t = vi !vj
enrolled for course Cj before enrolling for course Ci, and
p
d Ci ! C is the total number of students who enrolled Where vi represent the seed node, and vj is for a can-
didate's queried node. Where t is a tour from vi to vj.
for any course before enrolling course Ci. The weight of
c
ð1Þ ð2Þ ðl + 1 Þ
c
Ci ! Sj is defined as w Ci ! Sj = 1
, where Suppose t = vi1 , vi2 , …, vil + 1 : The random walk
dðC i !SÞ Q ð jÞ ð j + 1Þ
c
c probability is then RW ðt Þ = w vij ,vij + 1 , where
d Ci ! S is the total number of skills covered by course j
r ð jÞ ð j + 1Þ ð jÞ ð j + 1Þ
Ci. The weight of J i ! Sj is defined as w vij , vij + 1 is the weight of edge vij ! vij .
r d J !r S r
ð j jÞ
w J i ! Sj = , where d J i ! Sj is the number of
dðJ i !SÞ
r
As a proof of concept, we conducted a course recom-
r
job Ji that required skill Sj, and d J i ! S is the total mendation user study and defined different meta-paths to
recommend courses for three scenarios:
number of job Ji that required any skill. The relation
l
S ! S is the key to internally connecting the entire het- Scenario 1: A first-year undergraduate/graduate stu-
l dent has a career goal (job node), and he/she is looking
erogeneous graph. The weighted of S ! S is normalized
similarity score (via BM25) between skills within each for education suggestions (courses nodes) to achieve this
career goal.
ZHU ET AL. 5 of 6
T A B L E 2 Preliminary result of
Precision MAP Map@5 Precision@10 Map@10
course recommendation for different
ranking features Vector space 0.41 0.50 0.50 0.48 0.51
Probabilistic model 0.38 0.55 0.55 0.44 0.59
Graph-enabled 0.39 0.57 0.63 0.48 0.62
For this scenario, the input is the student's career goal case, binary judgment is provided for each candidate
(job node), and the output is a set of recommended candi- course. We also evaluate the ranking performance with a
date courses (nodes). The corresponding meta-path func- given cut-off rank, considering only the topmost candidate
tion is: courses returned by the experiment. In Table 2, we report
the performance of different recommendation functions
r
CJ j J ! S ! S
l c
C? (overall and top-ranked education opportunities perfor-
mance). In the experiment, two baseline methods (Vector
p Space model and Probabilistic Model (Truyen, Phung, &
C⧸J j C ? Cp? Venkatesh, 2014) are employed for comparison.
Based on this table, the proposed method outperforms
Where J is the query job node, and C? is the candidate the baselines for most of the evaluation metrics except
course node. This path function, walking through the for precision. The result shows that the proposed method
relationship between the job and courses via skills, the is promising for cross-domain recommendation while
candidate courses related to the career goal are retrieved. students/professionals can potentially benefit from
Note that the first function only performs on the target it. Note that, in this experiment, we did not utilize sophis-
community CJ that job J belongs, and the second func- ticated graph ranking models and learning to rank algo-
tion retrieves all required courses C p? from communities rithms. Prior studies (Liu et al., 2014) showed that these
C⧸J that the job J does not belong to. methods could further enhance the recommendation per-
Scenario 2: A student already took some courses Cp, formance. We plan to explore more sophisticated graph
and he/she is looking for new courses C? to achieve ranking models in future investigations.
his/her career goal J. This is similar to the Scenario 1 but
c c
we add another function: C⧸J jCp ! S C? where Cp has
additional chance to help locate relevant courses C?. 4 | CONCLUSIONS
Scenario 3: An employee/professional has a current
job J, and he/she is looking for career acceleration. That In this study, we proposed a novel method for career and
is, he/she is looking for a course that helps to upskill. The education data integration and indexation by using a het-
r l c p
ranking function could be J ! S ! S C ! C? Because erogeneous graph and community detection. The gener-
user's information need is to upskill, the last step will be ated graph enables different information retrieval/
p
C ! C? ; the foundation course (like programming) can recommendation scenarios to address various kinds of
walk to a more advanced course (like machine learning). information needs from students and professionals. From
Unlike prior studies on this track, community restric- a data science perspective, career data and education data
tion is critical for the proposed random walk function, need to be cross-walked, and the community detection
which can be useful to reduce noise and enhance the rec- along with data fusion provides a promising method for
ommendation accuracy. data integration.
The limitation of this method is that there are fewer
overlapping skills across jobs and courses—only 79 of the
3.4 | Preliminary experimental result 376 skills identified in the course data could be mapped
to skill terms listed in job advertisements.
A preliminary experiment was run for Scenario 1, where In the future, our efforts on this project will be three-
two graduate students at IUB were asked to use the course fold. Firstly, we would like to enhance the graph quality
recommendation system by leveraging the proposed rank- by adding novel nodes and edges, for example, company,
ing algorithm. Each student entered five text queries location, and enrollment information. Secondly, we
(e.g., “Database Administrator”, “Java Developer”, and would like to investigate a sophisticated graph ranking
“Data Scientist”) and rated each recommended course as algorithm to enhance the recommendation performance.
“useful” or “not useful”. MAP (Mean Average Precision) or Last but not least, we plan to conduct more comprehen-
Precision was used as the evaluation metrics. For MAP sive user evaluations employing both professionals and
6 of 6 ZHU ET AL.
students. User feedback will be used for model training Nguyen, H.-Q., Pham, T.-T., Vo, V., Vo, B., & Quan, T.-T. (2018).
and user interface optimization. The predictive modeling for learning student results based on
sequential rules. International Journal of Innovative Computing,
Information & Control, 14(6), 2129–2140.
E N D N O T ES
Patel, B., Kakuste, V., and Eirinaki, M. (2017). Capar: A career path
1
https://www.coursera.org/ recommendation framework. 2017 IEEE Third International
2
https://www.careerbuilder.com/ Conference on Big Data Computing Service and Applications
3
https://money.usnews.com/money/careers/slideshows/discover- (BigDataService), pp. 23–30. IEEE.
the-best-technology-jobs Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on
complex networks reveal community structure, maps of random
walks on complex networks reveal community structure. Pro-
R EF E RE N C E S ceedings of the National Academy of Sciences, 105(4), 1118–1123.
Börner, K., Scrivner, O., Gallant, M., Ma, S., Liu, X., Chewning, K., Shalaby, W., AlAila, B., Korayem, M., Pournajaf, L., AlJadda, K.,
… Evans, J. A. (2018). Skill discrepancies between research, Quinn, S., and Zadrozny, W. (2017). Help me find a job: A
education, and jobs reveal the critical need to supply soft skills graph-based approach for job recommendation at scale, 2017.
for the data economy. Proceedings of the National Academy of IEEE International Conference on Big Data (Big Data),
Sciences, 115(50), 12630–12637. 1544–1553. IEEE.
Bridges, C., Jared, J., Weissmann, J., Montanez-Garay, A., Truyen, T. T., Phung, D. Q., & Venkatesh, S. (2014). Preference net-
Spencer, J., and Brinton, C. G. (2018). Course recommenda- tion works: Probabilistic models for recommendation systems.
as graphical analysis, 2018. 52nd Annual Conference on Infor- arXiv, 1407.5764.
mation Sciences and Systems, CISS 2018, 1–6. Wang, Y., Liu, X., & Chen, Y. (2017). Analyzing cross-college course
Li, N., Naren, S., Gao, Z., Xia, T., Börner, K., & Liu, X. (2017). Enter enrollments via contextual graph mining. PLoS One, 12(11).
a job, get course recommendations. iConference, 2, 118–122.
Liu, X., Yu, Y., Guo, C., and Sun, Y. (2014). Meta-path-based rank-
ing with pseudo relevance feedback on heterogeneous graph How to cite this article: Zhu G, Kopalle NA,
for citation recommendation. Proceedings of the 23rd ACM
Wang Y, Liu X, Jona K, Börner K. Community-
international conference on conference on information and
based data integration of course and job data in
knowledge management, pp. 121–130.
Ma, X., & Ye, L. (2018). Career goal-based e-learning recommendation support of personalized career-education
using enhanced collaborative filtering and prefixspan. Interna- recommendations. Proc Assoc Inf Sci Technol. 2020;
tional Journal of Mobile and Blended Learning (IJMBL), 10(3), 57:e324. https://doi.org/10.1002/pra2.324
23–37.