Predicting Student Performance From Online Engagement Activities Using Novel Statistical Features
Predicting Student Performance From Online Engagement Activities Using Novel Statistical Features
https://doi.org/10.1007/s13369-021-06548-w
Received: 25 August 2021 / Accepted: 26 December 2021 / Published online: 17 January 2022
© King Fahd University of Petroleum & Minerals 2022
Abstract
Predicting students’ performance during their years of academic study has been investigated tremendously. It offers impor-
tant insights that can help and guide institutions to make timely decisions and changes leading to better student outcome
achievements. In the post-COVID-19 pandemic era, the adoption of e-learning has gained momentum and has increased the
availability of online related learning data. This has encouraged researchers to develop machine learning (ML)-based models
to predict students’ performance during online classes. The study presented in this paper, focuses on predicting student per-
formance during a series of online interactive sessions by considering a dataset collected using digital electronics education
and design suite. The dataset tracks the interaction of students during online lab work in terms of text editing, a number of
keystrokes, time spent in each activity, etc., along with the exam score achieved per session. Our proposed prediction model
consists of extracting a total of 86 novel statistical features, which were semantically categorized in three broad categories
based on different criteria: (1) activity type, (2) timing statistics, and (3) peripheral activity count. This set of features were
further reduced during the feature selection phase and only influential features were retained for training purposes. Our pro-
posed ML model aims to predict whether a student’s performance will be low or high. Five popular classifiers were used in
our study, namely: random forest (RF), support vector machine, Naïve Bayes, logistic regression, and multilayer perceptron.
We evaluated our model under three different scenarios: (1) 80:20 random data split for training and testing, (2) fivefold
cross-validation, and (3) train the model on all sessions but one which will be used for testing. Results showed that our model
achieved the best classification accuracy performance of 97.4% with the RF classifier. We demonstrated that, under similar
experimental setup, our model outperformed other existing studies.
Keywords Machine learning · Random forest · Student performance prediction · Feature extraction · Binary classification
123
10226 Arabian Journal for Science and Engineering (2022) 47:10225–10243
outside the classroom, to name but a few [2]. Though most of • The design of a student performance prediction model
the performance prediction work tends to focus on previous based on the extraction of a set of statistical features which
exam scores, very little work seems to target the analysis of were categorized into three broad categories: (1) activity-
data wherein student interactions with online systems is being type based, (2) timing statistics-based, and (3) peripheral
logged and analyzed [3]. Using past scores has two disadvan- activity count-based. These features comprise an extrac-
tages: one is that it predicts performance in the long term such tion phase followed by a feature selection phase using an
as predicting students’ performance in his junior or sopho- entropy-based selection method.
more year courses based on exam grades and course grades • Performance evaluation of the proposed model by consid-
from the freshman or sophomore year, second is that even ering the following classifiers: random forest (RF), support
if it takes grades of major1 and major2 exams to predict the vector machine (SVM), Naïve Bayes (NB), logistic regres-
performance in the course, then it is too far in the semester for sion (LR), and multilayer perceptron (MLP).
any corrective actions to be taken to assist the student to suc- • Comparative performance analysis between our proposed
ceed. Therefore, in the work introduced herein, we propose model and some of the existing published research propos-
focusing on performance prediction problems while explor- ing students’ performance prediction models using the
ing data describing students’ online interactions during the DEEDS dataset [4, 30, 31].
online exam sessions. We experiment with a dataset that has
been collected using digital electronics education and design The rest of the paper is organized as follows. Section 2
suite (DEEDS)—a simulation software used for e-learning presents the background about the student performance
in a Computer Engineering course (Digital Electronics) [1]. prediction domain: its importance, applicable prediction
The DEEDS platform logs student activities and actions dur- metrics, dataset categorization, and overview of prediction
ing exams (such as viewing/studying exam content, working models. Section 3 describes our proposed approach consid-
on a specific exercise, using a text editor or IDE, or viewing ering student engagement data wherein the DEEDS dataset
exam-related material). Our literature survey indicates that is presented along with the feature extraction process. Sec-
this DEEDS dataset was explored twice: in [1, 4], wherein tion 4, details the model performance in terms of prediction
authors attempted to predict student performance based on accuracy, and then a comparative study is presented in rela-
exam complexity and predict the exam difficulty based on tion to the existing work. Finally, in Sect. 5, conclusions are
student activities from prior sessions, respectively. drawn, and future research directions are suggested.
The aim of this research work is to build a prediction
model which is based on newly extracted statistical features
aimed at predicting students’ performance based on their 2 Student Performance Prediction: Overview
online activities. To build and refine the model, we have
proceeded as follows. Initially, we have proposed new fea- This section starts with a brief background about the impor-
tures that were categorized into three broad categories, based tance of student performance prediction. Then it overviews
on different criteria: (1) Activity-type count-based, (2) Tim- the performance prediction targets in terms of prediction
ing statistics-based, and (3) Peripheral activity count-based, goals. Next, it surveys and categorizes the set of features
resulting in a total of 86 features. We have also proposed commonly considered in most of the datasets used during
further improvement in the model by reducing the set of fea- the prediction process. Finally, it overviews the prominent
tures and eventually keeping the most influential (significant) approaches and models being used along with their achieved
ones using the entropy-based feature selection method [35, performance.
36]. The proposed model was then evaluated and compared
with other existing similar research work. We compared the 2.1 Importance of Student Performance Prediction
performance with some of the existing work addressing the
same problem and using the same DEEDS dataset. We have The problem of predicting student performance has been
shown that our proposed model outperforms existing ones in extensively studied by the research community as part of
terms of classification accuracy results. the learning analytics topic due to its importance for many
The key contributions of this research work can be sum- academic disciplines [2, 3, 5–7]. Based on the goal of the
marized as follows. performance prediction model, the benefits may include
the following: (1) improved planning and accurate adjust-
ments in education management strategies to yield enhanced
• Categorization of student academic performance-related attainment rates in program learning outcomes [8], (2) iden-
features in existing datasets. tify, track and improve student learning outcomes and their
• Statistical analysis of the DEEDS dataset which supported impact on classroom activities. For instance, prediction mod-
the feature extraction process. els could be tuned to classify student performance as low,
123
Arabian Journal for Science and Engineering (2022) 47:10225–10243 10227
average, or high. Based on the classification results, con- of these five categories. Table 1 also includes the stated aim
certed measures may be taken by the education managers of the prediction study per category.
to support the low-performing students [7], (3) propose new,
formative learning approaches for the students based on their 2.4 Related Work
predicted performance [8]. For instance, students are advised
to adopt different learning strategies such as emphasizing Several methods and approaches were considered in pre-
more on practical aspects of course material, (4) allocating dicting student performance; most of these approaches are
resources to the students based on their predicted perfor- statistical in nature and designed for machine learning (ML)
mance. For instance, the identification and prediction of models. The models attempt to estimate an inherent corre-
high-performing students will support institutions to estimate lation between input variables and identify patterns within
the number of awarded scholarships [9], (5) minimize the stu- the input data. Following our review of most of the exist-
dent dropout rates which is considered a resources black hole ing datasets, these attributes can be classified under any
that impacts graduation rates, quality, and even institutional of five categories, namely: (1) student historic performance
ranking [10]. attributes, (2) student demographic attributes, and (3) stu-
dent learning platform interactions attributes, (4) personality
2.2 What to Predict? attributes, and (5) institutional attributes; as detailed in Table
1.
Student performance prediction models have targeted sev- Among the two existing ML models types, supervised
eral metrics which are both quantitative and qualitative in learning techniques are a better fit for handling classifica-
nature. The amount of research work to predict quantitative tion and regression problems and were more widely used to
metrics outweighs those for qualitative metrics [8]. Qualita- deal with the student prediction problem as compared to the
tive metrics have mainly focused on Pass/Fail or Letter Grade unsupervised learning techniques. Classification approaches
classifications of students in particular courses [11] or overall attempt to classify entities into some known classes, which
student assessment prediction in terms of high/average/low. are two in the case of binary classification (for example, clas-
This type of assessment could be performed per course, sifying students into Passing or Failing classes) or more in
major, topic, etc. [3], or student knowledge accomplish- the case of multinomial classification. On the other hand,
ment levels: First/Second/Third/Fail [4, 12], or to classify in regression-based approaches, the model attempts to pre-
students into low risk/high risk/medium risk [7]. By con- dict a continuous type of value (for instance, predict the final
trast, quantitative metrics have mainly attempted to pre- exam score, which could be a real number between “0 to
dict scores or course/exam/assignment grades [5], range of 100”). This makes regression techniques more challenging
course/exam/assignment grades [6], major dropout/retention problems to solve compared to the classification problems.
rates [10], prediction of the time needed for exam completion, In the context of student performance prediction, sev-
prediction of on-duration/delay of graduation and student eral supervised learning models were used while considering
engagement as well [13]. datasets with each of the five feature categories as well
as their combinations and targeting the prediction of spe-
2.3 Dataset Features Categorizations cific performance features (as specified in Table 1). For
instance, the authors in [2] have studied the impact of each
Most of the datasets that have been used to machine-learn the of the three categories of features (student engagement,
student performance have considered historic data that can be demographics, and performance data) on predicting student
categorized into three broad categories based on the attribute performance using binary classification-based models to pre-
types [2]: (1) student historic performance attributes, (2) dict at-risk students and regression techniques to predict
student demographic attributes, and (3) student learning the student scores. Also, they studied the prediction per-
platform interactions (engagement) attributes. These catego- formance at different time instances before taking the final
rizations were further extended into a more comprehensive exam. The analysis was performed on a public open univer-
classification of features to include two more categories [3], sity learning analytics dataset (OULAD) while using Support
namely: (4) personality—to better describe the subject capa- vector machine (SVM), decision tree (DT), artificial neural
bility and ability (such as efficacy, commitment, efficiency, networks (ANNs), Naïve Bayes (NB), K-nearest neighbor
etc.), and (5) institutional—to better describe the teaching (K-NN), and logistic regression (LR) models for classifi-
methods, strategies, and qualities [14]. We have surveyed cation and SVM, ANN, DT, Bayesian Regression (BN),
several datasets that have been considered for student per- K-NN, and Linear Regression models for regression analysis.
formance prediction studies and have summarized them in For the classification task, better performance results were
Table 1, where we enlisted the five categories along with obtained with ANN while considering engagement and exam
most of the common and relevant features being used in each score data (F1-score ~ 96%). The same performance was also
123
10228
123
Feature categories Features sub-categories Features Targeted/predicted attributes
Historic performance [2] Pre-course performance Exam/assignment/ project/lab/quiz/seminars Course/Program dropout & retention,
Course performance results, previous semester performance, assessment grades, GPA range, actual GPA,
Admission score, cumulative GPA, topic course Pass/Fail, course letter grade, course
performance, average course grade, course grade, assignment grade
Pass/Fail
Feeling based
Institutional [14] Teaching strategy University educations, high school, pedagogical Course/Program dropout & retention, at-risk
Institution type mode, teaching mode, formal/informal students
education, training mode, type of material
available, deep/shallow learning approach
Arabian Journal for Science and Engineering (2022) 47:10225–10243
Arabian Journal for Science and Engineering (2022) 47:10225–10243 10229
obtained for the regression analysis task where ANN outper- engagements such as the number of clicks that student exe-
formed other algorithms while inputting the model with more cutes to access certain URLs, resources, homepages, etc. The
historic assessment scores (RMSE ~ 14.59). authors have considered several machine learning classifica-
The authors in [5] have focused on predicting student tion algorithms to predict low engagement instances, such as
performance based on historic exam grades and in-progress Decision Trees (DT), JRIP, J48, gradient-boosted tree (GBT),
course exam grades. The goal of such a study is to identify, per CART, and Naïve Bayes. The best performance in terms of
area and per subject, the “at-risk” students and hence provide predicting students with low engagement was obtained with
real-time feedback about the current status of the students. the first four algorithms (topping 88.5%). Also, these results
Such feedback would drive the appropriate remedial strate- have identified the best predictive variables in terms of the
gies to support these students and eventually help improve number of clicks students executed per activity during the
retention rates. The authors have conducted their studies on identification of low-engagement students.
335 students and 6358 assessment grades while consider- In the same context, Vahdat et al. [1] aimed to study
ing 68 subjects categorized into 7 different knowledge areas. the impact of student behavior during the online assessment
The prediction model used the Decision Tree algorithm to and the scores obtained. The authors have used complexity
classify students into passing and failing categories. In their matrix and process mining techniques to identify any cor-
effort to identify the most influential variable, the authors run relation between attained scores and student engagement.
their model while considering all possible combinations of The dataset was collected using digital electronics education
Final grades and their weighted Partial (1 to 3) ones. The best and design suite (DEEDS), a simulation software used for
model accuracy performance was reported (96.5%) when all e-learning in a Computer Engineering course, namely Digi-
Partial grades were included in the prediction. tal Electronics. Analysis has shown that: (1) the complexity
In a different study, Huang and Fang [7] focused on pre- matrix and student scores are positively correlated, (2) and
dicting student performance in a specific field: Engineering the complex matrix and session difficulties are negatively
dynamics. The authors conducted their research while con- correlated. Additionally, the authors demonstrated that the
sidering four predictive models (multiple linear regression proposed process discovery could provide useful informa-
(MLR), multilayer perceptron network, radial basis function tion about student learning practices.
network, and the support vector machine-based model). Their In a different study Elbadrawy et al. [16], the authors have
study aimed to identify the best prediction model and the most built a model to accurately predict student grades based on
influential variable among a set of six considered ones leading several types of features (past grade performance, course
to a more accurate prediction model. The dataset being con- characteristics, and student interaction with the online learn-
sidered in this study included only the historic performance ing management system (aka. Student Engagement). The
type of data. The dataset was a collection of 323 students built model relies on a weighted sum of collaborative multi-
over 4 semesters and included nine different grade dynamics regression-based models, which were able to improve the
and pre-requisite dynamic courses. Results showed that the prediction accuracy performance by over 20%.
type of mathematical model being used had little impact on Along the same directions as in [16], Liu and d’aquin [17]
the prediction accuracy. Regarding the most influential vari- have attempted to predict student performance based on two
ables, results showed that it varies based on the goal of the categories of features: Demographics and Student Engage-
instructor in terms of what to predict (average performance of ment with the online learning system. They have applied
the entire dynamic class or individual student performance). supervised learning-based algorithms in their model on the
Best performance results in terms of accuracy were achieved Open University Learning Analytics dataset [18] and inves-
with MLR (reaching 89.7%). Similar to the proposed model tigated the relationship between demographic features and
in [5], this model only considered one category of student the achieved performance. Analysis has shown that the best-
data (historic performance-related data) and did not include performing students were those who had acquired a higher
different categories such as engagement or demographic data. education level and were residing in the most privileged
As online teaching is gaining more and more popularity, areas.
it is becoming necessary for all schools to provide e-learning Hussain et al. in [4] have investigated predicting diffi-
options to their students, especially after the COVID-19 pan- culties that students may face during Digital Electronics lab
demic. Many research works have studied and evaluated the sessions based on previous student activities. They identified
performance of such learning in a virtual environment. For the best predicting model, as well as the most influential fea-
instance, Hussain et al. [15] have focused on studying the tures. The authors have only considered the engagement type
impact of student engagement in a virtual learning envi- of data using the digital electronics education and design suite
ronment on their performance in terms of attained exam (DEEDS) simulator [1]. They have conducted their study
scores. This study has considered various variables includ- considering the following five features: average time, average
ing demographics, assessment scores, and student-system idle time, and the total number of activities, total related activ-
123
10230 Arabian Journal for Science and Engineering (2022) 47:10225–10243
ity, and the average number of keystrokes. Five classification institutional categories were used. Out of the eight algorithms
algorithms were explored: support vector machines (SVMs), used, the RF performed best and has achieved an accuracy
artificial neural networks (ANNs), Naïve Bayes classifiers, of 88% following the feature reduction step.
Logistic Regression, and Decision Trees. While consider- In another study [34], the authors have used Artificial Neu-
ing fivefold cross-validation and random data division, the ral Networks (ANN) to predict the performance of students
best accuracy performance results were obtained with the in an online environment. The data set was collected using
ANN and SVM-based models (75%). This performance was the participation of 3518 students. Out of the five categories
later improved and reached 80% when considering the Alpha of features described in Table 1, only historic performance
Investing technique on the SVM-based model. and learning platform interaction categories were considered.
DEEDS dataset has also been used in other research work Results showed that the ANN-based model was able to pre-
where researchers have attempted to predict the performance dict the students’ performance with an accuracy of 80%.
of students and the difficulties they face by analyzing stu- Summary of the prediction of student performance using
dents’ behavior during interactive online learning sessions. machine learning-based models is presented in Table 2. It
For instance, in [30], the authors attempted to predict the is worth noting that the last entry in Table 2 captures the per-
student performance using the DEEDS dataset by consid- formance achieved with the proposed model in predicting
ering regression models. The input variables of the model students’ performance while considering student engage-
were the current study of the students for all sessions, how- ment and historic type of features.
ever, the model output consists of the student’s grade for Our proposed research explores the area of predicting stu-
a specific session. Among the three models used, (Linear dent performance in an e-learning environment, which is
regression, Artificial Neural Networks, and Support Vector gaining more popularity, especially post the COVID-19 pan-
Machine (SVM)), SVM performed the best and achieved an demic. We propose exploring the DEEDS dataset, which is,
accuracy of 95%. to the best of our knowledge, studied twice by Vahdat et al.
In a different research work, the authors in [31] have [1] and Hussain et al. [4]. DEEDS is a Technology Learn-
also considered the DEEDS dataset to perform a compar- ing Environment platform that logs the real-time interactions
ative analysis using various machine learning models such of students during classwork as well as exam performance
as Artificial Neural Network, Logistic regression, Decision in terms of grades; these logs were collected in six differ-
Tree, Support Vector Machines, and Naïve Bayes. In their ent interactive sessions. We intend to conduct a statistical
study, the authors have extracted and considered five differ- study on the DEEDS dataset followed by the design of a new
ent types of features, including average time, total activities, prediction model based on a new set of statistical features
average mouse clicks, related activities in an exercise, aver- to predict student performance using their interaction logs
age keystrokes, and average idle time per exercise. During registered by DEEDS. We propose assessing our model’s
this study, SVM performed the best as compared to the rest performance using five different types of classifiers in a dif-
of the models and has achieved an accuracy of 94%. ferent experimental setup. We will also compare the achieved
In [32], the authors have considered different datasets in performance in terms of accuracy and F1-score to that pro-
their attempt to predict the performance of students in two posed by Hussain et al. [4].
different courses (namely Mathematics and Portuguese lan-
guage). Both datasets have a total of 33 attributes each and
a total of 396 and 649 records, respectively. The authors 3 Proposed Methodology
have applied two different types of models: following a
feature Support Vector Machines and Random Forest to clas- The proposed method aiming to classify student performance
sify passing students from the failing ones. Both datasets follows a typical machine learning classification approach.
comprise 16 features, including historic performance, demo- Initially, we start with a statistical analysis and feature engi-
graphic data, and personality type of features. Experimental neering of the DEEDS dataset which resulted in the reduction
results showed an accuracy of more than 91% and that the of DEEDS activity features from 15 to 9 activities. This
historic features were most influential during classification. reduction (as will be described in Sect. 3.2) consists of
It was also worth noting that Random Forest performed better aggregating semantically similar activities into a single cat-
in the case of larger dataset (Portuguese language). egory. The next step of this process consists of the data
The authors in [33] have proposed a model to predict stu- pre-processing phase, where entries showing some discrep-
dents’ performance in higher educational institutions using ancy were discarded, and DEEDS logs with no corresponding
video learning analytics and data mining techniques. A sam- lab or final exam grades were also excluded. This is further
ple of 722 students was considered in the collection of the explained in Sect. 3.3. The next step is a features extrac-
dataset. Among the five categories highlighted in Table 1, tion phase. This phase resulted in extracting three broad
only the historic performance, engagement, personality, and categories of features: (1) Activity-type count-based, (2)
123
Arabian Journal for Science and Engineering (2022) 47:10225–10243 10231
[2] Open University Learning Student engagement, Support Vector Machine ANN (96% F1-score)
Analytics (OULAD) dataset, demographics, historic (SVM), Decision Tree (DT),
3166 students performance Artificial Neural Networks
(ANN), Naïve Bayes (NB),
K-Nearest Neighbor (K-NN),
and Logistic Regression (LR)
[5] Ecuador University, 335 Historic performance Decision Tree 96.5% accuracy
students
[7] Engineering Dynamics course, Historic performance Multiple Linear Regression MLR (89.7% accuracy)
323 students (MLR), Multilayer
Perception (MLP) network,
Radial Basis Function (RBF)
network, Support Vector
Machine (SVM)
[15] OU University, United Demographic, assessment Decision Tree (DT), JRIP, J48, J48 (88.5% accuracy)
Kingdom, 384 students scores, student-system Gradient-Boosted Tree
engagement (GBT), CART, and Naïve
Bayes
[16] University of Minnesota’s Student engagement, historic Multi-Regression 0.145 RMSE
Moodle, 11,556 students performance
[4] DEEDS, 115 students Student engagement, historic Support Vector Machines SVM (80% accuracy)
performance (SVMs), Artificial Neural
Networks (ANNs), Naïve
Bayes classifiers, Logistic
Regression, and Decision
Trees
[30] DEEDS, 115 students Student engagement, historic Linear Regression, Artificial SVM (95% accuracy)
performance Neural Networks, and
Support Vector Machine
(SVM)
[31] DEEDS, 115 students Student engagement, historic Artificial Neural Network, SVM (94.8% accuracy)
performance Logistic Regression,
Decision Tree, Support
Vector Machine, and Naïve
Bayes
[32] Mathematics dataset, 696 Historic performance, Support Vector Machine, SVM (92% accuracy)
records demographic data, Random Forest
personality type of features
[32] Portuguese dataset, 649 records Historic performance, Support Vector Machine, RF (94% accuracy)
demographic data, Random Forest
personality
[33] 722 students Historic performance, Neural Network, KNN, LR, RF (88% accuracy)
engagement, personality, CN2 Rule Inducer, Naïve
institutional Bayes, SVM
[34] 3518 students Only historic performance, Artificial Neural Networks 80%. accuracy
student engagement (ANN)
Proposed work DEEDS, 115 students Student engagement, historic Random Forest (RF), Support 97.4% accuracy
performance Vector Machine (SVM),
Naïve Bayes (NB), Logistic
Regression (LR), and
Multilayer Perceptron (MLP)
123
10232 Arabian Journal for Science and Engineering (2022) 47:10225–10243
123
Arabian Journal for Science and Engineering (2022) 47:10225–10243 10233
Year 2015
Topic Digital Electronics—Computer Engineering
Performance metric Session grades,
Final exam grades
Dataset Location https://sites.google.com/site/learninganalyticsforall/data-sets/epm-dataset
Table 4 Dataset features description Table 5 Statistical analysis of the DEEDS dataset
Order Feature Description DEEDS details Statistics
11 Mouse click right Number of right mouse clicks during Freq. of non 0 entries for “Keystrokes” 33,706 (14%)
activity
12 Mouse movement Distance covered by mouse
13 Keystrokes Keystrokes hit during activity
except for the last session, which included 23% of all data.
Statistical analysis has also shown that some dataset fea-
tures logged fewer activities, and most of the registered
is a collection of 13 comma-separated features arranged in values were “0”. For the “number of mouse wheel” feature,
the order indicated in Table 4. 90.7% of the logged values were zeros’, in the “number of
Figure 2 shows a snapshot of the comma-separated log mouse wheel click” feature, 99.9% of the logged values were
file corresponding to student ID 21 collected during the 4th zeros’, in the “number of mouse wheel click right” feature,
session. 95% of the logged values were zero, and 86% of logged
“keystrokes” were zeros’. On the other hand, the number of
3.2 Statistical Analysis of the DEEDS Dataset “mouse wheel click left” and “mouse movement” features
were well distributed across a range of 0 to 1096 and 0 to
As a result of the preliminary data pre-processing and meta- 85,945, respectively.
data analysis, we are able to better describe the dataset, as Our next analysis focused on the representation of each of
shown in Table 5. Though 115 students were expected to the activities in the entire dataset. Some activities did not have
participate in this experiment and eventually take exams, 7 sufficient representation throughout the entire dataset and
students ended up not showing up in all sessions, and an were represented with less than 1%, occurrence such as “Text
average of 86 students were registered per lab session. Our Editor no exercise—0.02%”, “Deeds no exercise—0.2%”,
dataset included more than 230,000 entries, which were uni- “Deeds other activity—0.4%”, “FSM related—0.1%”. This
formly distributed across all sessions, as shown in Fig. 3. is in comparison with the rest of the activities, which had
Sessions 1 through 5 included 13–17% of the entire dataset, representation rates between 7.5 and 16.3% of the entire
Fig. 2 Raw data snapshot 4, 21, Es_4_1, Study_Es_4_1, 13.11.2014 11:8:17, 13.11.2014 11:12:51, 35205806, 5, 0, 2, 0, 94, 0
4, 21, Es_4_5, TextEditor_Es_4_5, 13.11.2014 11:50:49, 13.11.2014 11:50:57, 5986, 0, 0, 8, 0, 257, 6
4, 21, Es_4_5, Diagram, 13.11.2014 11:51:15, 13.11.2014 11:54:8, 89410, 0, 0, 149, 0, 6088, 0
4, 21, Es_4_5, Aulaweb, 13.11.2014 12:44:51, 13.11.2014 12:47:27, 10947165, 103, 0, 2, 0, 138, 0
123
10234 Arabian Journal for Science and Engineering (2022) 47:10225–10243
14.60%
9.05%
8.55%
1.44%
0.46%
0.29%
0.18%
0.02%
3.3 Data Pre-processing As stated in our dataset description, DEEDS defines 15 types
of activities. Based on our statistical analysis in terms of
Pre-processing is a typical first step in the data classifi- log count distribution of each of these activities (Fig. 3)
cation and pattern recognition process. During this phase, along with their significance, we propose reducing these lists
123
Arabian Journal for Science and Engineering (2022) 47:10225–10243 10235
8%
18%
9%
4%
15%
17%
10%
10% 9%
123
10236 Arabian Journal for Science and Engineering (2022) 47:10225–10243
For the next 9 features (F 37 through F 45 ), we track the MouseW heelClickCount, MouseClick Le f tcount,
occurrence count of the 9 activities across all exercises. These MouseClick Rightcount, MouseMovementCount,
are captured in Equations set 2. F 37 , for instance, represents K eystr okeCount} along with one single keyboard activity
the occurrence count of “Editing” activity in all 4 exercises. K eystr okeCount . We used 24 new features to track
the total occurrence count of each of the aforementioned 6
F37 ae,1 ; peripheral related activities in each of the 4 exercises.
e1→4 Each of these 6 peripheral activities will be mapped with
F38 ae,2 ; . . . ; F45 ae,9 (2) its order of appearance in the Peripheral_Activity_type set.
e1→4 e1→4 For instance, peripheral activity 3 represents “Mouse Click
Left count” and activity 6 represents “Keystroke Count”.
The next 4 features track the occurrence count of all 9 Similar to the case of Activity-based features, we intro-
activities together in each of the 4 exercises. These are cap- duce the following peripheral activity matrix, which captures
tured in Equations set 3. F 46 , for instance, represents the statistics about student interactions using the computer
aggregated occurrence count of all 9 activities in exercise 1. peripherals in all 4 exercises per session:
⎡ ⎤
F46 a1,i ; F47 a2,i ; p1, 1 p1, 2 p1, 3 ... p1, 6
⎢ p2, 1 p2, 2 p2, 3 ... p2, 6 ⎥
P⎢ ⎥
i1→9 e1→4
⎣ p3, 1
F48 a3,i ; F49 a4,i (3) p3, 2 p3, 3 ... p3, 6 ⎦
e1→4 e1→4 p4, 1 p4, 2 p4, 3 ... p4, 6
The final feature in this activity count-based category tracks In matrix P, which is a 4 by 6 matrix, element “ pi, j ” rep-
the aggregated sum of occurrence of all activities across all resents the occurrence count of peripheral activity “j” in
exercises. This is being captured in Eq. 4. exercise “i”. For instance, “ p1, 3 ” represents the occurrence
count of peripheral activity 3 (“Mouse Click Left Count”) in
F50 ae,i (4) exercise 1.
i1→9 e1→4 Each element of the peripheral activity-based matrix maps
to a different feature, which will be considered during the
3.4.2 Timing Statistics Based Features classification phase. These constitute a set of 24 new fea-
tures (F 59 through F 82 ). F59, for instance, describes the total
This category of features captures the timing performance occurrence count of Mouse Wheel events in exercise_1.
of a student while working on exercises. We used 2 sets of The next set of features within this category defines 4 more
timing features of size 4 features each. The first set of 4 features to describe the level of utilization of the computer
features {F 51 , F 52 , F 53 , F 54 } describes the time spent in peripherals in each of the 4 exercises. These are described in
each exercise. This is calculated by taking the difference of equation set 7. F83, for instance, specifies the total occurrence
max-end time and min-start time for each of the 4 exercises count of the 5 mouse activities and the keystrokes activities
as indicated in Eq. 5. F 51, for instance, corresponds to the in exercise 1.
period of time spent by a student in exercise 1.
F83 p1,k ; F84 p2,k ;
F j{51→54} Tmaxe − Tmine ; where e 1 → 4 (5)
k1→6 k1→6
The next set of 4 features (F 55 , F 56 , F 57 , F 58 ) describes the F85 p3,k ; F86 p4,k (7)
total idle time registered in each of the 4 exercises. F 55, for k1→6 k1→6
123
Arabian Journal for Science and Engineering (2022) 47:10225–10243 10237
between variables. We propose assessing the performance of the importance of each of the problem features. It is prone
our prediction model with the full set of extracted features to noise, outliers, and overfitting [20]. Contrarily to other
and a reduced set after eliminating the none influential fea- classification techniques, the RF relies on a combination of
tures (low ranked ones.) a set of classification techniques contributing to a single vote
during the class classification process.
3.6 Classification Algorithms
3.6.2 Multilayer Perceptron (MLP)
Among the many existing classification models, we have
evaluated our model using five different classifiers. There MLP is a supervised learning-based approach [21]. It is based
are many factors influencing the suitability of the machine on the concept of perceptron in Neural Networks, which
learning-based regression models, which may differ from one is capable of generating a single output based on a multi-
problem to another. These factors include the type of the data, dimensional data input through their linear (non-linear in
the dataset size, the number of features, data distribution, etc. some occasions) combination along with their correspond-
Also, models may work best for problems but not others. For ing weight as follows:
instance, SVM is known to perform well in the case of rela-
tively small dataset size, which is the case of DEEDS. In this
n
yα wi xi + β (10)
work, models were chosen based on two criteria: (1) current i1
existing work dealing with the same problems and using the
same dataset DEEDS. For instance SVM, LR, NB. (2) cover where wi , xi , β, and α. are the weights, input variable, bias,
the broad spectrum of the different categories of classifiers. non-linear activation function, respectively. The MLP is com-
For instance, RF was considered as a representative of the posed of three or more node layers, including the input/output
ensemble type of classifiers, NB as a probabilistic type of layer and one or many hidden layers. The training phase in
models, MLP as a Neural Network-based model. The model the case of MLP consists of adjusting the model parameters
selection process also included a step where models which (biases and weights) through a back and forth mechanism
have experienced high error rates were eliminated. (Feed-forward pass followed by Back-forward pass) with
Next, a brief description of each of the Classification mod- respect to the prediction error.
els being used in this work is presented.
3.6.3 Support Vector Machine (SVM)
3.6.1 Random Forest (RF)
SVM is a supervised ML model to solve classification and
RF is an ensemble of Decision Tree bundled together [19]. regression problems; it has demonstrated efficiency in solv-
The training of these bundles of trees consists of executing ing a variety of linear and non-linear problems. The idea of
the bagging process on a dataset of N entities. This process SVM lies in creating a hyperplane that distinctly categorizes
consists of sampling a set of N training samples with replace- the data into classes [22]. SVM works well for multi-domain
ment. Then using these samples to train a decision tree. This applications with a large dataset; however, the model has a
process needs to be repeated T times. The prediction of the high computational cost.
unseen entity is eventually made through a majority vote of
each of the T trees in the case of classification trees or the 3.6.4 Naïve Bayes (NB)
average value in the case of regression as given by Eq. 8.
NB is a probabilistic algorithm based on Bayes’ theorem. It
1
T
is naïve in the sense that each feature makes an equal and
y fi x (8)
T independent contribution in determining the probability of
i1
the target class. NP has the advantage of noise immunity
where y is the predicted value, x’ is the unseen sample, fi is [23]. It is proven to perform well in the case of large high-
the trained decision tree on data sample i, and T is a constant dimensional datasets. It is fast (computational complexity-
number of iterations to repeat the process. wise) and relatively easy to implement.
where f i (x ) is the prediction class of unseen entity x’ using LR is an ML algorithm based on probability concepts used
the decision tree trained on data sample i. for classification—finding the success and failure events. LR
RF technique has shown its ability in handling large can be considered as a Linear Regression model with a more
datasets with a large number of attributes while it weighs complex cost function, which can be defined as the sigmoid
123
10238 Arabian Journal for Science and Engineering (2022) 47:10225–10243
function (compared to a linear function in the case of linear Table 6 Metrics definitions [26]
regression [24]. LR has the advantage of being computa- Metric Formula Description
tionally efficient, relatively simple to implement with good T p +TN
performance with various types of problems. However, it has Accuracy T p +TN +F p +FN Represents the ratio of the
sum of correct
the main disadvantage of assuming linearity between inde-
classifications with respect
pendent and dependent variables [25]. to the total number of
classifications
Tp
Precision T p +F p Represents the proportion of
4 Results and Analysis positive test cases that were
classified correctly
Tp
4.1 Experiment Setup Recall (sensitivity) T p +FN Represents the fraction of
actual positive test cases
that were properly classified
Three sets of experiments aiming at three different goals F1-score
2T p
Represents the precision and
2T p +FP +FN
were conducted during this study: (1) Goal_1—evaluate recalls harmonic mean
the performance of the proposed model while considering ROC f (TPR, FPR) A probability curve which
the full set of extracted features, (2) Goal_2—study the tells how much a model can
importance of each of the extracted features in the classi- distinguish between various
classes
fication process using the entropy-based ranking approach,
(3) Goal_3—compare the performance of our model to that
proposed in [4, 30, 31]. Table 7 Model average performance (experiment1—random distribu-
The general experiment setup consists of using DEEDS tion case)
log data from five different sessions (sessions 2 through 6) Classifier Accuracy Precision Recall F1-Score ROC
along with their corresponding intermediate grades that were
MLP 0.957 0.959 0.957 0.957 0.98
attained by all 115 students. Only the first 4 exercises from
RF 0.974 0.974 0.974 0.974 0.988
each session were considered. This has resulted in a total
SVM 0.948 0.948 0.948 0.946 0.908
dataset size of 575 entries. We have proceeded in the same
direction as in [4] and [31] in terms of data labeling where LR 92.1 0.934 0.922 0.924 0.918
a student achieving a grade higher than 2 is labeled as class NB 0.826 0.887 0.826 0.837 0.982
“A” student—(student with “no difficulty”); otherwise, the
student is labeled as class “B” student—(student with “diffi-
culty”). By adopting this labeling strategy, 74% of our dataset metrics: (1) accuracy, (2) precision, (3) recall (aka. sensitiv-
included students with category A and 26% of students falling ity), and (4) F1-score. These are briefly described in Table 6,
under category B. where Tp , TN , FP , and FN represent the True Positive, True
For the training and evaluation phase, we have considered Negative, False Positive, and False Negative testing cases,
three sets of experiments. The first experiment consists of a respectively. Along with these four metrics, we also consid-
random distribution-based model where we randomly chose ered the Receiver Operator Characteristic (ROC) metric to
80% of the data (resulting in 460 records ~ 4 data sessions) analyze the proposed model’s ability to distinguish between
for training and 20% for testing (115 entries ~ 1 data session). classes by looking at the True Positive rate versus the False
The second experiment is a more generic approach and con- Positive rates under different settings.
sists of the classic fivefold cross-validation (resulting in 80%
of the data for training and 20% for testing). The third experi- 4.2 Model Performance Evaluation
ment consists of independently assessing the performance of
our model per session. In this setup, 4 session data were used The proposed models were evaluated based on the metrics
for training (equivalent to 80% of the data) and the remain- mentioned in Sect. 4.1. Initially, we have tested our model
ing one (equivalent to 20%) for testing. In the classification with the full set of extracted features (86), next we have stud-
phase, we have used five of the well-known classifiers then ied the level of influence of each of the 86 features in the
selected the most accurate one. We have considered MLP, RF, overall model accuracy performance through the application
SVM LR, and NB models to classify student performance. of entropy-based ranking approach.
Our classifiers’ configuration parameters tuning phase has
led to running all classifiers with a Batch size equal to 100, 4.2.1 Model Performance Analysis
Learning rate equal to 0.3, and Loss equal to 0.1.
We evaluated the effectiveness of the proposed classi- Table 7 shows a summary of the obtained results in terms of
fication model through the analysis of the following four averages of Accuracy, Precision, Recall, F1-score, and ROC
123
Arabian Journal for Science and Engineering (2022) 47:10225–10243 10239
metrics. These results were collected following our first set accuracy performance for all classifiers. For instance, for
of experiments (randomly choosing 80% of the data for train- our best performing classifier (RF algorithm), and in the
ing and the remaining 20% for testing), where 5 classifiers case of random distribution of test data and training data
were applied (MLP, RF, SVM, LR, and NB). Results show (randomly choosing 80% of the data for training and the
that the RF classifier, which is considered as an ensemble remaining 20% for testing), we have achieved an accuracy
of a combination of tree predictors, has achieved the best of 96.7% compared to 97.4% when the full list of features
performance with 97.4% accuracy and high Recall and Pre- where considered. This insignificant variation in the accuracy
cision values resulting in a high F1-score (97.4%). MLP and performance could be attributed to the size of our dataset. It
SVM (which are known to make good predictions in the case is worth highlighting that though the achieved accuracy per-
of binary classification problems) have also performed well formance was the same, running the model with a reduced
with an accuracy of 95.7% and 94.8%, respectively, and F1- number of features contributed to reducing the overall com-
score of 95.7% and 94.6%, respectively. On the other hand, plexity of the model.
NB, as expected, did not perform well due to the nature of Our next results are depicted in Table 9 where we show
our dataset wherein features (described in Sect. 3.4) are not the corresponding confusion matrix following the execution
completely independent from each other (as a case in point, of MLP, RF, SVM, LR, and NB classifiers. Results are in line
there is a dependency between activity type feature and the with that presented in Fig. 7. For class A entries and when
corresponding timing statistics (in terms of duration). NB has considering the SVM classifier, Table 9 shows that 85 out
achieved relatively low Accuracy and F1-score (82.6% and of 86 were classified properly, which is slightly better than
83.7%, respectively). the performance of RF, where 84 out of 86 were classified
In line with Table 7, Fig. 7 shows a breakdown of the correctly. However, RF showed a better performance than
average performance of all five classifiers in terms of their SVM when classifying class B type of entries (28 out of 29
effectiveness in predicting the correct classes (A—for stu- versus 24 out of 29). It is noteworthy that Table 9 shows that
dents with no difficulty, versus B—students with difficulty). we are in possession of unbalanced data where class A entries
Results demonstrated a pattern of improved prediction rates outnumber class B entries (86 versus 29, respectively).
for class A entries versus class B entries across all classi- Contrary to the results reported in Table 7, where 80% of
fiers. For example, RF achieved 98.8% F1-score for class A the data was chosen randomly for training and the remaining
entries compared to 95% for class B entries. By contrast, NB 20% for testing, Table 10 shows the performance achieved by
consistently showed low performance, especially for class A all five classifiers while considering fivefold cross-validation,
entries where 73.7% F1-score was attained. a more conservative approach. Results reported in Table 10
are generally in line with those reported in Table 7 in terms
4.2.2 Features Relevance Analysis of classifier performances. RF classifier, for instance, per-
formed the best in terms of Accuracy and F1-score (93.37%
In this set of experiment, we have studied the relevance and 95.4%, respectively), and NB continues to show the low-
and level of influence of the extracted features through the est performance. It is also noticed that there was a slight
application of the entropy-based ranking approach. Features decrease of about 4% in the overall performance reported
ranking results are captured in Table 8. Results show that in Table 10 compared to those reported in Table 7, which
about 20% of the total features received a low ranking (less could be attributed to the random nature of the fivefold cross-
than 0.21), indicating that these features may not influence validation technique.
the performance classification model. In fact, a re-run of Our next set of results aims at studying the performance of
our prediction model where we have excluded the 18 fea- our model in a setup where four sessions’ data will be used for
tures from our original dataset has resulted in a very similar training, and the remaining unseen session data will be used
123
10240 Arabian Journal for Science and Engineering (2022) 47:10225–10243
Table 10 Model average performance (experiment 2—fivefold cross- However, the difference in specific metrics performance var-
validation) ied from one session to another. This was manifested with
Classifier Accuracy Precision Recall F1-Score ROC the Recall and Accuracymetrics where the variations were
20% in session 1 and 14% in sessions 1 and 5, respectively.
MLP 93.04 0.954 0.948 0.950 0.935
RF 93.37 0.950 0.958 0.954 0.966
SVM 89.90 0.896 0.970 0.932 0.847 4.2.3 Comparative Analysis
LR 89.36 0.943 0.904 0.923 0.898
NB 82.22 0.946 0.796 0.863 0.920 The final set of results compares the performance of our pro-
posed model with that of [4, 30, 31], where DEEDS dataset
has been used to predict students’ academic performance
for testing. These are captured in Table 11. Table 11 shows under the same experimental setup. Figure 8 illustrates the
the results of five different experiments. For instance, the set performance comparison of all four research results (ours
of results for “session ID for testing” equals 2 represents the with three others from recent literatures which are named as
performance of the model being trained using sessions 1, 3, Hussain, Sriram and Maksud in Fig. 8) in terms of Accuracy,
4, and 5, then tested on session 2. NB had the poorest perfor- Precision, Recall, and F1-score that were achieved while con-
mance (between 85 and 88% F1-score), followed by LR, then sidering the best performing classifiers (RF in case of our
MLP, then SVM, then RF (between 94 and 97% F1-score). model and ANN in case of the model of Hussain et al. in
123
Arabian Journal for Science and Engineering (2022) 47:10225–10243 10241
0.85
0.85
0.8
0.8
0.75
0.75
0.7
Accuracy Precision Recall F1-Score
RF (Proposed Features) Hussain [4] Sriram [30] Maksud [31]
[4], SVM in case of the models by Siram et al. and Maksud is attributed to the extended set of features that were intro-
et al. in [30, 31], respectively). Results show that our pro- duced by our model compared to a reduced and abstract list
posed models outperformed all the three existing models in of five features per exercise proposed in [4]. While in [4], the
terms of accuracy with an improvement ranging between 2 authors did not differentiate between the types of activities
to 22% compared to that being achieved in [4, 30, 31]. The within a single exercise, our model provisioned for the dif-
F1-score was 12% higher than that achieved in [4] using the ferent types of activities resulting in 9 distinct features along
ANN classifier and 2 percent in the case of the SVM classi- with the total activity occurrence count per exercise. This has
fier being used in [30, 31]. We believe that such improvement resulted in a total of 10 features related to activities per exer-
123
10242 Arabian Journal for Science and Engineering (2022) 47:10225–10243
cise. Also, contrary to the model in [4], where the interaction Funding Not applicable.
of students with DEEDS was only captured by a single fea-
ture counting the number of keystrokes, our model took into Availability of Data and Materials The dataset used in this study is pub-
licly published https://sites.google.com/site/learninganalyticsforall/
account the different types of interactions of students with
data-sets/epm-dataset.
DEEDS using input peripherals (mouse and keyboard) lead-
ing to a total of 6 features per exercise. The extra information
that was provided to the prediction model explains the sig- Declarations
nificant classification performance improvement captured in
Fig. 8. Conflict of interest The author has no conflicts of interest.
5 Conclusion
References
In this article, we demonstrated the ability to predict student
1. Vahdat, M.; Oneto, L.; Anguita, D.; Funk, M.; Rauterberg, M.: A
performance by analyzing the interaction logs of students in
learning analytics approach to correlate the academic achievements
the DEEDS dataset. We have extracted a total of 86 statisti- of students with interaction data from an educational simulator.
cal features, which are categorized into three main categories In: Design for Teaching and Learning in a Networked World,
based on different criteria: (1) activity-type based, (2) tim- pp. 352–366. Springer, Cham (2015)
2. Tomasevic, N.; Gvozdenovic, N.; Vranes, S.: An overview and
ing statistics-based, and (3) peripheral activity count-based
comparison of supervised data mining techniques for student exam
features. This set of features was further reduced during the performance prediction. Comput. Educ. 143, 103676 (2020)
feature selection phase where we have applied the entropy- 3. Hellas, A.; Ihantola, P.; Petersen; A.; Ajanovski, V.; Gutica, M.;
based selection technique and only influential features were Hynninen, T; Liao, S.N.: Predicting academic performance: a sys-
tematic literature review. In: Proceedings Companion of the 23rd
retained for training purposes. We trained our model consid-
Annual ACM Conference on Innovation and Technology in Com-
ering three different scenarios: (1) 80:20 random data split for puter Science Education, pp. 175–199 (2018)
training and testing, fivefold cross-validation, and (3) train 4. Hussain, M.; Zhu, W.; Zhang, W.; Abidi, S.M.R.; Ali, S.: Using
the model on all sessions but one which will be used for machine learning to predict student difficulties from learning ses-
sion data. Artif. Intell. Rev. 52(1), 381–407 (2019)
testing. Then we collected performance results in terms of
5. Buenaño-Fernández, D.; Gil, D.; Luján-Mora, S.: Application of
Accuracy, Precision, Recall, F1-score, and ROC, using the machine learning in predicting performance for computer engi-
five prominent classifiers (RF, SVM, MLP, LR, and NB). neering students: a case study. Sustainability 11(10), 2833 (2019)
Results showed that the best performance was obtained 6. Ofori, F.; Maina, E.; Gitonga, R.: Using machine learning algo-
rithms to predict students performance and improve learning
using an RF classifier with a classification accuracy of 97%
outcome: a literature based review. J. Inf. Technol. 4(1), 33–55
and an F1-score of 97%. However, the poorest results were (2020)
achieved with NB due to the inherent dependency of the 7. Huang, S.; Fang, N.: Predicting student academic performance in
model on the proposed features. When comparing our model an engineering dynamics course: a comparison of four types of pre-
dictive mathematical models. Comput. Educ. 61, 133–145 (2013)
with the benchmark models proposed by Hussain et al. in [4],
8. Rastrollo-Guerrero, J.L.; Gomez-Pulido, J.A.; Duran-Dominguez,
Sriram et al. in [30] and Maksud et al. in [31], we were able A.: Analyzing and predicting students’ performance by means of
to demonstrate that, under a similar experimental setup, our machine learning: a review. Appl. Sci. 10(3), 1042 (2020)
model outperformed existing models in terms of classifica- 9. Sundar, P.P.: A comparative study for predicting students academic
performance using Bayesian network classifiers. IOSR J. Eng. IOS-
tion accuracy and F1-score.
RJEN e-ISSN, 2250-3021 (2013)
Future work For future work, we propose exploring vari- 10. Burgos, C.; Campanario, M.L.; de la Peña, D.; Lara, J.A.; Lizcano,
ous research directions as follows: D.; Martínez, M.A.: Data mining for modeling students’ perfor-
mance: a tutoring action plan to prevent academic dropout. Comput.
Electr. Eng. 66, 541–556 (2018)
1. Modify and compare the proposed model with a model 11. Ma, X.; Zhou, Z.: Student pass rates prediction using optimized
which considers more sophisticated Machine Learning support vector machine and decision tree. In: 2018 IEEE 8th
algorithms for feature extraction and classification such Annual Computing and Communication Workshop and Confer-
as Decision Trees, fuzzy entropy-based analysis, and ence (CCWC), pp. 209–215. IEEE (2018)
12. Masci, C.; Johnes, G.; Agasisti, T.: Student and school performance
transfer learning, etc. across countries: a machine learning approach. Eur. J. Oper. Res.
2. Enhance the prediction models to a multi-label problem 269(3), 1072–1085 (2018)
aimed at classifying students into four broad categories: 13. Pardo, A.; Han, F.; Ellis, R.A.: Combining university student self-
(1) very weak, (2) weak, (3) average, and (4) good. regulated learning indicators and engagement with online learning
events to predict academic performance. IEEE Trans. Learn. Tech-
3. Consider proposing a regression model to predict exam nol. 10(1), 82–92 (2016)
grades along with classifying students’ performance 14. Gray, G.; McGuinness, C.; Owende, P.: An application of classifica-
using just the binary classification approach. tion models to predict learner progression in tertiary education. In:
123
Arabian Journal for Science and Engineering (2022) 47:10225–10243 10243
2014 IEEE International Advance Computing Conference (IACC), 27. Trstenjak, B.; Ðonko, D.: Determining the impact of demo-
pp. 549–554. IEEE (2014) graphic features in predicting student success in Croatia. In:
15. Hussain, M.; Zhu, W.; Zhang, W.; Abidi, S.M.R.: Student engage- 2014 37th International Convention on Information and Commu-
ment predictions in an e-learning system and their impact on student nication Technology, Electronics and Microelectronics (MIPRO),
course assessment scores. Comput. Intell. Neurosci. (2018) pp. 1222–1227. IEEE (2014)
16. Elbadrawy, A.; Studham, R.S.; Karypis, G.: Collaborative multi- 28. Kursa, M.B.; Rudnicki, W.R.: Feature selection with the Boruta
regression models for predicting students’ performance in course package. J Stat Softw 36(11), 1–13 (2010)
activities. In: Proceedings of the Fifth International Conference on 29. Shaw, R.G.; Mitchell-Olds, T.: ANOVA for unbalanced data: an
Learning Analytics and Knowledge, pp. 103–107 (2015) overview. Ecology 74(6), 1638–1645 (1993)
17. Liu, S.; d’Aquin, M.: Unsupervised learning for understand- 30. Sriram, K.; Chakravarthy, T.; Anastraj, K.: A comparative analysis
ing student achievement in a distance learning setting. In: 2017 of student performance prediction using machine learning tech-
IEEE Global Engineering Education Conference (EDUCON), niques with DEEDS lab. J. Compos. Theory XII(VIII) (2019)
pp. 1373–1377. IEEE (2017) 31. Maksud, M.; Nesar, A.: Machine learning approaches to digital
18. Kuzilek, J.; Hlosta, M.; Herrmannova, D.; Zdrahal, Z.; Vaclavek, learning performance analysis. Int. J. Comput. Digit. Syst. 10, 2–9
J.; Wolff, A.: OU Analyse: analysing at-risk students at The Open (2020)
University. Learn. Analyt. Rev. 1–16 (2015) 32. Leena, H. A; Ranim, S. A; Mona, S. A; Dana, K. A; Irfan, U. K;
19. Ho, T.K.: Random decision forests. In: Proceedings of 3rd Inter- Nida, A.: Predicting Student Academic Performance using Support
national Conference on Document Analysis and Recognition, vol. Vector Machine and Random Forest. 3rd International Conference
1, pp. 278–282. IEEE (1995) on Education Technology Management. pp. 100–107 (2020)
20. Bauer, E.; Kohavi, R.: An empirical comparison of voting classifi- 33. Hasan, R.; Sellappan, P.; Salman, M.; Ali, A.; Kamal, U.S.; Mian,
cation algorithms: Bagging, boosting, and variants. Mach. Learn. U.S.: Predicting student performance in higher educational insti-
36(1), 105–139 (1999) tutions using video learning analytics and data mining techniques.
21. Latif, G.; Iskandar, D.A.; Alghazo, J.M.; Mohammad, N.: Appl. Sci. 10(11), 3894 (2020)
Enhanced MR image classification using hybrid statistical and 34. Aydoğdu, Ş: Predicting student final performance using artificial
wavelets features. IEEE Access 7, 9634–9644 (2018) neural networks in online learning environments. Educ. Inf. Tech-
22. Suthaharan, S.: Machine learning models and algorithms for big nol. 25(3), 1913–1927 (2020)
data classification. Integr. Ser. Inf. Syst 36, 1–12 (2016) 35. Biesiada, J.; Włodzisław D.; Adam K.; Krystian M.; Sebastian P.:
23. Misra, S.; Li, H.; He, J.: Machine Learning for subsurface Charac- Feature ranking methods based on information entropy with parzen
terization. Gulf Professional Publishing, Oxford (2019) windows. In: International Conference on Research in Electrotech-
24. Bewick, V.; Cheek, L.; Ball, J.: Statistics review 14: logistic regres- nology and Applied Informatics, vol. 1, p. 1. (2005)
sion. Crit. Care 9(1), 1–7 (2005) 36. Horino, H,; Hirofumi, N.; Elisa, C.A.C.; Toru, H.: Development
25. Meurer, W.J.; Tolles, J.: Logistic regression diagnostics: under- of an entropy-based feature selection method and analysis of
standing how well a model predicts outcomes. JAMA 317(10), online reviews on real estate. In: IEEE International Conference
1068–1069 (2017) on Industrial Engineering and Engineering Management (IEEM),
26. Rehman, A.; Naz, S.; Razzak, M.I.; Hameed, I.A.: Automatic visual pp. 2351–2355. IEEE (2017)
features for writer identification: a deep learning approach. IEEE
Access 7, 17149–17157 (2019)
123