0% found this document useful (0 votes)
10 views

DW & DM

This overview study synthesizes findings from 33 review studies on predicting student academic performance (SAP) in higher education using educational data mining (EDM) methods from 2013 to September 2021. It identifies key trends, including the predominance of Decision Tree algorithms for prediction accuracy and the common use of student demographics as input variables. The study highlights the lack of explicit sample sizes in most reviews and the focus on specific subject areas like maths, science, and computer science.

Uploaded by

kotalsujay89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

DW & DM

This overview study synthesizes findings from 33 review studies on predicting student academic performance (SAP) in higher education using educational data mining (EDM) methods from 2013 to September 2021. It identifies key trends, including the predominance of Decision Tree algorithms for prediction accuracy and the common use of student demographics as input variables. The study highlights the lack of explicit sample sizes in most reviews and the focus on specific subject areas like maths, science, and computer science.

Uploaded by

kotalsujay89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Educational Data Mining for Student

Performance Prediction

Abstract:-
This overview study set out to compare and synthesise the findings of review studies
conducted on predicting student academic performance (SAP) in higher education using
educational data mining (EDM) methods, EDM algorithms, and EDM tools from 2013 to
September 2021. It conducted multiple searches for suitable and relevant peer-reviewed
articles on two online search engines, on nine online databases, and on two online academic
social networks. It, then, selected 33 eligible articles from 2,500 articles. Some of the
findings of this overview study are worth mentioning. First, only 3 studies explicitly stated
their precise sample sizes, and only 5 studies explicitly mentioned their subject areas with
maths and science, and computer science and engineering as the four most mentioned subject
areas. Second, 20 review studies had purposes related to either EDM techniques, EDM
methods, EDM models, or EDM algorithms employed to predict SAP and student success in
the higher education sector. Third, there are six commonly used typologies of input variables
reported by 33 review studies, of which student demographics was the most commonly
utilised variable for predicting SAP. Fourth and last, seven common EDM algorithms
employed for predicting SAP were identified, of which Decision Tree emerged both as the
most used algorithm and as the algorithm with the highest prediction accuracy rate for
predicting SAP.

Table Of Contents:-
1. Introduction
2. Methodology :- i) Contextualising issues
ii) Literature review related to predicting student academic
performance using EDM techniques
iii) Purpose of the study
iv) Literature search strategy
v) Findings

3. Discussion
4. Conclusion
5. References
Introduction:-
The last few years have witnessed an exponential increase in review studies exploring
educational data mining (EDM) methods, algorithms, and tools for predicting student
academic performance (SAP) (Khasanah, 2018; Saa et al., 2019; Shahiri et al., 2015). This is
the case for diverse disciplinary fields, even though fields such as computer science and
engineering seem to have conducted more such studies than others (Ashenafi, 2017). Most
EDM review studies on predicting SAP have been conducted as either reviews (Ameen et al.,
2019; Cui et al., 2019; Del Río & Insuasti, 2016; Durga & Thangakumar, 2019; Moreno-
Marcos et al., 2019; Muttathil & Rahman, 2016; Shahiri et al., 2015); literature reviews
(Alyahyan & Düştegör, 2020; Manjarres et al., 2018; Saqr, 2018); systematic literature
reviews (Alban & Mauricio, 2019; Liz-Domínguez et al., 2019; Namoun & Alshanqiti, 2021;
Papamitsiou & Economides, 2014); systematic reviews (Agrusti et al., 2019; Alamri &
Alharbi, 2021; Aydogdu, 2020; López Zambrano et al., 2021; Zulkifli et al., 2019); review
syntheses (Aldowah et al., 2019); or surveys (Alturki et al., 2020; Ganesh & Christy, 2015;
Jindal & Borah, 2013). While these review study types are not exhaustive, they represent a
broad spectrum of the types of review studies that the current paper was able to locate.

Methodology:-
1. Contextualising issues
This paper uses an overview of reviews in the same sense as a review of reviews. In an
overview of reviews (hereafter an overview or an overview study), review studies or aspects
featuring in review studies become key units or foci of analysis as opposed to aspects of
primary studies (Polanin et al., 2016). There are different terms used to refer to a review of
reviews. These include review of reviews, second-order review, umbrella review, tertiary
review, meta-meta-analysis, synthesis of meta-analysis, synthesis of systematic reviews,
summary of systematic reviews, or systematic review of systematic reviews (Grant & Booth,
2009; Moonsamy et al., 2021; Pieper et al., 2012). These terms constitute typologies of
overviews. These typologies reflect the roles played by the respective overviews and the
purposes these overviews are meant to serve. Benefits of utilising overviews are: • retrieving,
identifying, assessing and integrating findings from several review studies leveraging
previous research syntheses; • aggregating the evidence provided by multiple reviews or
contrasting multiple treatments on the same topic; and • identifying a gap in existing reviews
(Pieper et al., 2012; Polanin et al., 2016).

2. Literature review related to predicting student academic performance using EDM


techniques
Student academic performance (SAP) is a crucial construct employed to determine student
academic success at different educational levels (Khanna et al., 2016; Papadogiannis et al.,
2020; Shahiri et al., 2015). Even though it has multiple definitions, at a basic level, SAP is
the performance that students display in their academic tasks (e.g., assignments, tests and
examinations). It is often reflected in students’ past cumulative grade point average
(CGPA)/grade point average (GPA) in a previous semester and in students’ expected GPA in
the existing semester. If the term performance is disaggregated from the phrase student
academic performance, it embodies achievement in relation to assignments and courses,
continuous progress in programmes, and a successful completion of programmes (Hellas et
al., 2018; Khasanah, 2018). Moreover, it entails persistence, retention, progression, wastage,
and success or progress (Hamoud et al., 2017). In this sense, SAP should be seen in the same
way as student academic achievement (Alyahyan et al., 2020). However, SAP is a complex
construct, and in this regard, there are multiple factors that impact on and affect it. These
include the historical academic performance and the socio-economic background of students.
In this context, some of the factors (also known as attributes) employed to predict SAP are:
academic factors (historical and current); student demographics; socio-economics factors;
psychological factors; student e-learning activities; student environments; and extra curricular
activities (Kumar & Salal, 2019). The superordinate factors listed in the preceding set are
often utilised to predict SAP by most scholars (Alturki et al., 2020; Khasanah, 2018). These
superordinate factors are further categorised into specific subordinate factors with the former
serving as input variables or performance features, and with the latter serving as output
variables or performance metrics [18]. Nonetheless, at times there are overlaps between the
superordinate and subordinate factors as certain scholars tend to conflate them (Alyahyan &
Düştegör, 2020; Ashenafi, 2017; Hellas et al., 2018; Kumar & Salal, 2019). Moreover, certain
methods (or tasks) such as association rule mining, clustering, classification and regression
are used for building models for predicting SAP. Such methods are at times referred to as
techniques (Aldowah et al., 2019; Hellas et al., 2018), while Saa et al. (2019) call them EDM
approaches. In this way, classification tends to be the predominantly used method.
Furthermore, there are algorithms that are employed to predict SAP. Among them are
Artificial Neural Network (ANN), Bayesian Network (BN), Decision Tree (DT), K-Nearest
Neighbours (K-NN), K-Means; Naïve Bayesian classifiers, Neural Network (NN), and
Support Vector Machine (SVM) (Alamri & Alharbi, 2021; Ashenafi, 2017; López-Zambrano
et al., 2021; Namoun & Alshanqiti, 2021). In certain instances, these algorithms are referred
to as EDM techniques (Ashenafi, 2017), or as tasks or as methods (Alturki et al., 2020). The
choice of prediction algorithms is determined by SAP outcomes to be predicted. For instance,
classification algorithms such as DT, NN and NB classifiers are commonly used for
predicting a binary outcome like pass/fail at a certain degree of probability (Alamri &
Alharbi, 2021; Ashenafi, 2017; Shahiri et al., 2015). By contrast, SVM and linear regression
are often employed for predicting numerical scores (Ashenafi, 2017). Furthermore, some of
the tools belonging to software programmes such as WEKA, RapidMiner, MATLAB,
KNIME, Apache Mahout, Rattle GUI are used for predicting SAP. Of these, WEKA appears
to be the frequently used tool (Alyahyan & Düştegör, 2020; Alturki et al., 2020; Khasanah,
2018; Kumar & Salal, 2019).

3. Purpose of the study


The purpose of this paper was to compare and synthesise findings of review studies
conducted on predicting SAP in higher education through utilising EDM methods,
algorithms, and tools from 2013 to September 2021. The major focus was on review studies
related to the higher education sector. The following served as research questions (RQs) for
this study:
• RQ1: What are the primary purposes of the review studies investigated in this overview?
• RQ2: What common input (predictor) and common output (target) variables do these
review studies employ to predict SAP?
• RQ3: What common educational data mining (EDM) techniques (or methods) and
algorithms do they employ in predicting SAP?
• RQ4: What algorithms are reported to have the highest prediction accuracy for SAP?
• RQ5: What common EDM tools do these studies employ in predicting SAP?
• RQ6: What are the key results of these review studies?

4. Literature search strategy

The search strategy for relevant review studies was conducted online from March 2020 to
September 2021, and started by locating search engines, databases, and academic social
networking sites. Subsequently, two online search engines (Google and Bing), nine online
databases (Google Scholar, Microsoft Academic, Semantic Scholar, IEEE Xplore, ERIC,
ScienceDirect, Emerald; JSTOR, SpringerLink), and two online academic social networks
(ResearchGate and Academia.edu), were identified (Figure 1). Search strings were arranged
into super- and sub-strings in keeping with the major area of focus of the overview:
predicting SAP through using EDM methods, algorithms, and tools. These search strings
consisted of the following keywords: predicting student academic performance; educational
data mining techniques; educational data mining algorithms; and educational data mining
software tools. To ensure that a wide range of review studies on the major focus area of this
overview was covered in all the search combinations, two commonly used Boolean operators,
AND and OR, together with parentheses and double quotation marks (where necessary), were
employed in the search strategy. Examples of these search combinations were as follows:
predicting student academic performance AND educational data mining techniques AND
educational data mining algorithms AND educational data mining software tools • predicting
student academic performance OR educational data mining techniques OR educational data
mining algorithms OR educational data mining software tools. In certain instances, the word,
techniques, was replaced with methods and tasks. The afore-said keyword combinations,
together with their relevant iterations, were queried in the three search engines, in the nine
online databases, and in the two online academic social networking sites mentioned earlier.
Moreover, dependency and snowball search strategies were employed based on the
bibliographies of the journal articles obtained from the three sets of online search platforms.
4.1 Eligibility criteria and selection of studies
The criteria for including and excluding review studies are as listed below. They were
formulated to respond to the major focus area of the current overview.
• review studies focusing on predicting SAP using EDM methods (techniques or tasks),
algorithms and tools;
• focus on higher education;
• review studies published between 2013 and September 2021;
• review studies published in peer-reviewed journals and by (internationally) recognised
conference organisations;
• mention of a specific years/duration covered (e.g., 2010 to 2015); and
• review studies published in English.
5. Findings
The findings presented in this section of the overview are grounded on the data extracted
from the 33 full-text articles and are informed by the manner in which the extracted data were
codified as highlighted in the relevant section above. Additionally, the findings respond to the
six research questions stated earlier.

Discussion:-
In this section, the discussion of the findings is structured in response to the six research
questions of the study. As pointed out above, 33 review studies constituted the focal point of
the present overview. Except for four studies, the rest (n = 29) were reviews of different
typologies: systematic reviews (n = 9); classical reviews (n =8 systematic reviews (n = 5);
surveys (n = 4); and literature reviews (n =3). In their review of reviews, Kim et al. (2018)
investigated qualitative reviews (narrative and thematic reviews) and quantitative reviews
(systematic and meta-analysis reviews) as part of the articles (n = 171) included in their study
on hospitality and tourism. Concerning subject areas, maths and science, and computer
science and engineering featured among the subject areas mentioned by 5 studies. In this
case, 3 studies mentioned sample sizes that together totalled 46,695 participants. A review of
reviews in a different but related area that offers subject areas on which its reviews focused is
Kim et al. (2018). Of the 13 reviews this overview reviewed, economics and finance (n = 29),
customer behaviour (n = 24), and marketing (n =22) are reported as the top three subject
areas mentioned by the reviewed studies, respectively. The overview mentions that sample
sizes of its 171 reviews ranged from less than 10 to more than 10,000, with systematic
reviews having the highest sample sizes. In the current overview, the 3 reviews that
mentioned specific sample sizes were a systematic literature, a literature review, and a meta
analysis (see Figure 4 and Appendix A). Pertaining to the purposes of the 33 reviews, it
emerged that the purposes of 20 reviews had to do with either EDM techniques, EDM
methods, EDM models, or EDM algorithms utilised to predict SAP and student success in
higher education. By contrast, of the remaining 13 studies, 10 reviewed or surveyed EDM
techniques and tools, whereas 2 focused on student dropout prediction. A study that had
purposes (or objectives) as one of its focal points of analysis is Khanna et al.’s (2016)
systematic review, which had reviewed 13 articles. Among the purposes of the 13 articles it
analysed, educational data mining (EDM) methods or techniques employed for predicting
student performance featured prominently in the purposes of 10 of these articles. The other
study, Papamitsiou and Economides’ (2014) systematic literature review of 40 articles, had
six purposes, of which prediction of student performance was the second most common
purpose after student behaviour modelling.

Conclusions:-
The purpose of this overview was to compare and synthesise the findings of review studies
conducted on predicting SAP in higher education using EDM methods, algorithms, and tools
from 2013 to September 2021. For subject areas, maths and science, and computer science
and engineering were cited by the review studies that explicitly mentioned their fields of
study. Humanities and social sciences subjects did not feature in any of these review studies.
Concerning sample size, only 3 studies explicitly stated their precise sample sizes, of which
the total number was 46,695. Among the EDM methods used for predicting SAP, four
emerged as the most commonly used: classification, clustering, regression, and association
rules. Classification was the most commonly used of the four methods. Naïve Bayes was the
least utilised method. Of the seven commonly used EDM algorithms identified by the 33
review studies for predicting SAP, DT was the most commonly employed, followed by both
Support Vector Machine (SVM) and Artificial Neural Network (ANN) respectively, with
Naïve Bayes (NB) as the least used algorithm. Nevertheless, as a cluster of algorithms,
Bayesian classifiers were the predominantly used algorithms. Moreover, DT was an EDM
algorithm that was reported as having the highest prediction accuracy rate for predicting SAP.
With respect to EDM software tools, WEKA was the most commonly utilised tool, followed
by both SPSS and RapidMiner.

References:-
1.Agrusti, F., Bonavolontà, G., & Mezzini, M. (2019). University dropout prediction through
educational data mining techniques: A systematic review. Journal of E-Learning and
Knowledge Society, 15(3), 161–182. https://doi.org/10.20368/1971 8829/1135017
2.Alamri, R., & Alharbi, B. (2021). Explainable student performance prediction models: A
systematic review. IEEE Access. 9, 33132–33143. 10.1109/ACCESS.2021.3061368
3.Alban, M., & Mauricio, D. (2019). Predicting university dropout through data mining: A
systematic literature. Indian Journal of Science and Technology, 12(4), 1–12.
https://doi.org/10.17485/ijst/2019/v12i4/139729
4.Aldowah, H., Al-Samarraie, H., & Fauzy, W. M. (2019). Educational data mining and
learning analytics for 21st century higher education: A review and synthesis. Telematics and
Informatics, 37, 13–49. https://doi.org/10.1016/j.tele.2019.01.007
5.Alturki, S., Hulpus, I., & Stuckenschmidt, H. (2020). Predicting academic outcomes: A
survey from 2007 till 2018. Technology, Knowledge and Learning, 1 33.
https://doi.org/10.1007/s10758-020-09476-0

You might also like