DW & DM
DW & DM
Performance Prediction
Abstract:-
This overview study set out to compare and synthesise the findings of review studies
conducted on predicting student academic performance (SAP) in higher education using
educational data mining (EDM) methods, EDM algorithms, and EDM tools from 2013 to
September 2021. It conducted multiple searches for suitable and relevant peer-reviewed
articles on two online search engines, on nine online databases, and on two online academic
social networks. It, then, selected 33 eligible articles from 2,500 articles. Some of the
findings of this overview study are worth mentioning. First, only 3 studies explicitly stated
their precise sample sizes, and only 5 studies explicitly mentioned their subject areas with
maths and science, and computer science and engineering as the four most mentioned subject
areas. Second, 20 review studies had purposes related to either EDM techniques, EDM
methods, EDM models, or EDM algorithms employed to predict SAP and student success in
the higher education sector. Third, there are six commonly used typologies of input variables
reported by 33 review studies, of which student demographics was the most commonly
utilised variable for predicting SAP. Fourth and last, seven common EDM algorithms
employed for predicting SAP were identified, of which Decision Tree emerged both as the
most used algorithm and as the algorithm with the highest prediction accuracy rate for
predicting SAP.
Table Of Contents:-
1. Introduction
2. Methodology :- i) Contextualising issues
ii) Literature review related to predicting student academic
performance using EDM techniques
iii) Purpose of the study
iv) Literature search strategy
v) Findings
3. Discussion
4. Conclusion
5. References
Introduction:-
The last few years have witnessed an exponential increase in review studies exploring
educational data mining (EDM) methods, algorithms, and tools for predicting student
academic performance (SAP) (Khasanah, 2018; Saa et al., 2019; Shahiri et al., 2015). This is
the case for diverse disciplinary fields, even though fields such as computer science and
engineering seem to have conducted more such studies than others (Ashenafi, 2017). Most
EDM review studies on predicting SAP have been conducted as either reviews (Ameen et al.,
2019; Cui et al., 2019; Del Río & Insuasti, 2016; Durga & Thangakumar, 2019; Moreno-
Marcos et al., 2019; Muttathil & Rahman, 2016; Shahiri et al., 2015); literature reviews
(Alyahyan & Düştegör, 2020; Manjarres et al., 2018; Saqr, 2018); systematic literature
reviews (Alban & Mauricio, 2019; Liz-Domínguez et al., 2019; Namoun & Alshanqiti, 2021;
Papamitsiou & Economides, 2014); systematic reviews (Agrusti et al., 2019; Alamri &
Alharbi, 2021; Aydogdu, 2020; López Zambrano et al., 2021; Zulkifli et al., 2019); review
syntheses (Aldowah et al., 2019); or surveys (Alturki et al., 2020; Ganesh & Christy, 2015;
Jindal & Borah, 2013). While these review study types are not exhaustive, they represent a
broad spectrum of the types of review studies that the current paper was able to locate.
Methodology:-
1. Contextualising issues
This paper uses an overview of reviews in the same sense as a review of reviews. In an
overview of reviews (hereafter an overview or an overview study), review studies or aspects
featuring in review studies become key units or foci of analysis as opposed to aspects of
primary studies (Polanin et al., 2016). There are different terms used to refer to a review of
reviews. These include review of reviews, second-order review, umbrella review, tertiary
review, meta-meta-analysis, synthesis of meta-analysis, synthesis of systematic reviews,
summary of systematic reviews, or systematic review of systematic reviews (Grant & Booth,
2009; Moonsamy et al., 2021; Pieper et al., 2012). These terms constitute typologies of
overviews. These typologies reflect the roles played by the respective overviews and the
purposes these overviews are meant to serve. Benefits of utilising overviews are: • retrieving,
identifying, assessing and integrating findings from several review studies leveraging
previous research syntheses; • aggregating the evidence provided by multiple reviews or
contrasting multiple treatments on the same topic; and • identifying a gap in existing reviews
(Pieper et al., 2012; Polanin et al., 2016).
The search strategy for relevant review studies was conducted online from March 2020 to
September 2021, and started by locating search engines, databases, and academic social
networking sites. Subsequently, two online search engines (Google and Bing), nine online
databases (Google Scholar, Microsoft Academic, Semantic Scholar, IEEE Xplore, ERIC,
ScienceDirect, Emerald; JSTOR, SpringerLink), and two online academic social networks
(ResearchGate and Academia.edu), were identified (Figure 1). Search strings were arranged
into super- and sub-strings in keeping with the major area of focus of the overview:
predicting SAP through using EDM methods, algorithms, and tools. These search strings
consisted of the following keywords: predicting student academic performance; educational
data mining techniques; educational data mining algorithms; and educational data mining
software tools. To ensure that a wide range of review studies on the major focus area of this
overview was covered in all the search combinations, two commonly used Boolean operators,
AND and OR, together with parentheses and double quotation marks (where necessary), were
employed in the search strategy. Examples of these search combinations were as follows:
predicting student academic performance AND educational data mining techniques AND
educational data mining algorithms AND educational data mining software tools • predicting
student academic performance OR educational data mining techniques OR educational data
mining algorithms OR educational data mining software tools. In certain instances, the word,
techniques, was replaced with methods and tasks. The afore-said keyword combinations,
together with their relevant iterations, were queried in the three search engines, in the nine
online databases, and in the two online academic social networking sites mentioned earlier.
Moreover, dependency and snowball search strategies were employed based on the
bibliographies of the journal articles obtained from the three sets of online search platforms.
4.1 Eligibility criteria and selection of studies
The criteria for including and excluding review studies are as listed below. They were
formulated to respond to the major focus area of the current overview.
• review studies focusing on predicting SAP using EDM methods (techniques or tasks),
algorithms and tools;
• focus on higher education;
• review studies published between 2013 and September 2021;
• review studies published in peer-reviewed journals and by (internationally) recognised
conference organisations;
• mention of a specific years/duration covered (e.g., 2010 to 2015); and
• review studies published in English.
5. Findings
The findings presented in this section of the overview are grounded on the data extracted
from the 33 full-text articles and are informed by the manner in which the extracted data were
codified as highlighted in the relevant section above. Additionally, the findings respond to the
six research questions stated earlier.
Discussion:-
In this section, the discussion of the findings is structured in response to the six research
questions of the study. As pointed out above, 33 review studies constituted the focal point of
the present overview. Except for four studies, the rest (n = 29) were reviews of different
typologies: systematic reviews (n = 9); classical reviews (n =8 systematic reviews (n = 5);
surveys (n = 4); and literature reviews (n =3). In their review of reviews, Kim et al. (2018)
investigated qualitative reviews (narrative and thematic reviews) and quantitative reviews
(systematic and meta-analysis reviews) as part of the articles (n = 171) included in their study
on hospitality and tourism. Concerning subject areas, maths and science, and computer
science and engineering featured among the subject areas mentioned by 5 studies. In this
case, 3 studies mentioned sample sizes that together totalled 46,695 participants. A review of
reviews in a different but related area that offers subject areas on which its reviews focused is
Kim et al. (2018). Of the 13 reviews this overview reviewed, economics and finance (n = 29),
customer behaviour (n = 24), and marketing (n =22) are reported as the top three subject
areas mentioned by the reviewed studies, respectively. The overview mentions that sample
sizes of its 171 reviews ranged from less than 10 to more than 10,000, with systematic
reviews having the highest sample sizes. In the current overview, the 3 reviews that
mentioned specific sample sizes were a systematic literature, a literature review, and a meta
analysis (see Figure 4 and Appendix A). Pertaining to the purposes of the 33 reviews, it
emerged that the purposes of 20 reviews had to do with either EDM techniques, EDM
methods, EDM models, or EDM algorithms utilised to predict SAP and student success in
higher education. By contrast, of the remaining 13 studies, 10 reviewed or surveyed EDM
techniques and tools, whereas 2 focused on student dropout prediction. A study that had
purposes (or objectives) as one of its focal points of analysis is Khanna et al.’s (2016)
systematic review, which had reviewed 13 articles. Among the purposes of the 13 articles it
analysed, educational data mining (EDM) methods or techniques employed for predicting
student performance featured prominently in the purposes of 10 of these articles. The other
study, Papamitsiou and Economides’ (2014) systematic literature review of 40 articles, had
six purposes, of which prediction of student performance was the second most common
purpose after student behaviour modelling.
Conclusions:-
The purpose of this overview was to compare and synthesise the findings of review studies
conducted on predicting SAP in higher education using EDM methods, algorithms, and tools
from 2013 to September 2021. For subject areas, maths and science, and computer science
and engineering were cited by the review studies that explicitly mentioned their fields of
study. Humanities and social sciences subjects did not feature in any of these review studies.
Concerning sample size, only 3 studies explicitly stated their precise sample sizes, of which
the total number was 46,695. Among the EDM methods used for predicting SAP, four
emerged as the most commonly used: classification, clustering, regression, and association
rules. Classification was the most commonly used of the four methods. Naïve Bayes was the
least utilised method. Of the seven commonly used EDM algorithms identified by the 33
review studies for predicting SAP, DT was the most commonly employed, followed by both
Support Vector Machine (SVM) and Artificial Neural Network (ANN) respectively, with
Naïve Bayes (NB) as the least used algorithm. Nevertheless, as a cluster of algorithms,
Bayesian classifiers were the predominantly used algorithms. Moreover, DT was an EDM
algorithm that was reported as having the highest prediction accuracy rate for predicting SAP.
With respect to EDM software tools, WEKA was the most commonly utilised tool, followed
by both SPSS and RapidMiner.
References:-
1.Agrusti, F., Bonavolontà, G., & Mezzini, M. (2019). University dropout prediction through
educational data mining techniques: A systematic review. Journal of E-Learning and
Knowledge Society, 15(3), 161–182. https://doi.org/10.20368/1971 8829/1135017
2.Alamri, R., & Alharbi, B. (2021). Explainable student performance prediction models: A
systematic review. IEEE Access. 9, 33132–33143. 10.1109/ACCESS.2021.3061368
3.Alban, M., & Mauricio, D. (2019). Predicting university dropout through data mining: A
systematic literature. Indian Journal of Science and Technology, 12(4), 1–12.
https://doi.org/10.17485/ijst/2019/v12i4/139729
4.Aldowah, H., Al-Samarraie, H., & Fauzy, W. M. (2019). Educational data mining and
learning analytics for 21st century higher education: A review and synthesis. Telematics and
Informatics, 37, 13–49. https://doi.org/10.1016/j.tele.2019.01.007
5.Alturki, S., Hulpus, I., & Stuckenschmidt, H. (2020). Predicting academic outcomes: A
survey from 2007 till 2018. Technology, Knowledge and Learning, 1 33.
https://doi.org/10.1007/s10758-020-09476-0