0% found this document useful (0 votes)
43 views6 pages

Predicting Student Dropouts in Kebbi

Being the essential component of modernity Big data has drawn a lot of interest from practitioners, scholars, and businesses. Given the significance of the education sector, there is a current trend to investigate how big data might be used in this industry to forecast learning results. Student dropout is a significant issue in higher education, affecting both universities and polytechnics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views6 pages

Predicting Student Dropouts in Kebbi

Being the essential component of modernity Big data has drawn a lot of interest from practitioners, scholars, and businesses. Given the significance of the education sector, there is a current trend to investigate how big data might be used in this industry to forecast learning results. Student dropout is a significant issue in higher education, affecting both universities and polytechnics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Using Big Data to Determine Potential Dropout of


Students in Some Selected Tertiary Institutions in
Kebbi State, Nigeria
Bashar Badamasi Lailaba1*, Shamsu Sani2*, Saifullahi Ahmad Tijjani3*, Hassan A4*
1
Department of Sciences Kebbi State Polytechnic, Dakin-gari, Nigeria
2
Department of Computer Science Kebbi State Polytechnic, Dakin-gri, Nigeria
3
Department of Computer Science College of Advance Studies Yelwa-Yaur, Nigeria
4
Department of Mathematics Federal University Birnin Kebbi, Nigeria

Abstract:- Being the essential component of modernity who drop out of their research. These studies suggest
Big data has drawn a lot of interest from practitioners, strategies for early detection of probable dropout students in
scholars, and businesses. Given the significance of the an effort to decrease dropout rates. Numerous scholars have
education sector, there is a current trend to investigate created a number of statistical learning methods apps to
how big data might be used in this industry to forecast investigate course completion or dropout rates in order to
learning results. Student dropout is a significant issue in solve the issue of students leaving school early. These
higher education, affecting both universities and researchers' techniques include, but are not limited to,
polytechnics. Time to graduation (TTG), which has a logistic regression, k-nearest neighbors, decision trees with
direct correlation with student dropout, is one of the key random forests, Bayesian networks, and neural networks.
measures of university achievement even if there is no These Studies, however, are somewhat deficient in clarity
universally accepted way to measure the quality of and interpretation. The research's suggested methodology
education (Pineda Lezama, O., & Gómez Dorta, R. will strike a compromise between interpretations and
2017). This declining rate indicates a percentage that precision. Naïve Bayes and K-nearest neighbors are two
results in losses of millions to billions of dollars on a methods that give significant precision capacity, and
global and state level. Yet, as society demands the decision trees and logistics regression will be the two
contributions made by the population with higher methods employed in this work to create the models. When
education, such as: innovation, knowledge production, these four approaches are used, a compromise solution
and scientific discovery, dropping out has an impact not between precision and comprehensibility will be produced,
only on the nation's economy and educational quality but with the latter being assessed predominantly by the
also on the advancement of society. This offers a proportion of dropouts that were found (Bucci, et al., 2018).
straightforward method for predicting potential In this effort, we will leverage the previously mentioned
dropouts based on their academic and demographic techniques to gather datasets from various state
traits using fundamental statistical learning techniques. organizations and create an early detection system
The study will be carried out at a few chosen tertiary framework for prospective dropouts. There is little research
institutions in Kebbi State. on student dropouts from Nigerian tertiary institutions,
particularly in Kebbi State, despite the abundance of results
Keywords:- Big Data, Demography, Dakin-Gari in the literature regarding the factors that have been shown
to impact student dropouts. In order to model student
I. INTRODUCTION dropout using student admission data from 2016–2022,
which was gathered from academic databases, this research
In recent times, there has been a sharp rise in the presents a number of machine learning algorithms. It also
number of students being admitted to postsecondary schools, provides comprehensive data about the number of students
accompanied by an exponential decline in the number of who dropped out or completed the course, as well as
graduates from these institutions. In higher institutions in analyses of the causes of dropping out.
Kebbi State, there are multiple surveys that determine
dropout rates. The majority of them deal with identifying the II. LITERATURE REVIEW
causes of dropouts, counting the number of students who do
so, and developing strategies to lower the rate (Ahuja, R.; Schools that have a higher percentage of graduates
Kankane, Y. 2017). In this work, we present the two tend to have a higher number of highly qualified teachers
schemes that calculate the likelihood that a student will and fewer children from poor backgrounds (Allensworth,
graduate or drop out: the first is based on the percentage of 2005; Balfanz, Herzog, & Mac Iver, 2007). Additional
students who graduate within a specific time frame, which factors that also significantly predict high school graduation
corresponds to the time it takes to receive a diploma or include being older than average, having subpar grades, and
degree; the second simply counts the number of students having low attendance. According to Allensworth's research,

IJISRT23OCT1694 www.ijisrt.com 2023


Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
in the past, academics have identified the specific In the meantime, Mustafa, Chowdhury, and Kamal
characteristics that influence a person's propensity to drop [14] created a dynamic dropout prediction model for
out using analytical techniques like logistic regression. colleges, universities, and institutes using data mining.
Creating a method to categorize ninth-grade pupils in Gender, financial status, and year of dropout were utilized as
Chicago's public school system as "on track" or "not on classification factors to separate successful from
track" for graduation was one of the greatest attempts at this. unsuccessful students. The factors were examined using
Classification and Regression Tree (CART) and CHAID
A risk measure proposed by Neild and Balfanz (2006) following data separation. With the highest percentage of
accurately predicted eighth graders' high school graduation accurate classification overall, CART outperformed CHAID
in 75% of cases. Despite system flaws, risk prediction in tree growth.
analyses have led to the widespread implementation of
"early warning" systems in US school districts. Yukselturk, Ozekes, and Türel [9] looked studied the
data mining techniques used in an online application to
One effective method of reducing dropout rates is to forecast dropout rates in a different study. Gender, age,
forecast future student dropout rates. In studies on student education level, prior online experience, occupation, self-
retention, Tinto's model [2] is most frequently applied. Tinto efficacy, readiness, prior knowledge, locus of control, and
came to the conclusion that a student's decision to continue dropout status were the variables included in this study. Four
or discontinue their studies is significantly influenced by data mining techniques—k-Nearest Neighbor (k-NN),
their level of academic and social integration at the Decision Tree (DT), Naïve Bayes (NB), and Neural
institution. Network (NN)—were used to categorize students who had
dropped out. Every approach was trained and tested using
After putting the Tinto model to the test in [10], 10-fold cross validation. The 3-NN, DT, NN, and NB
Brunsden et al. concluded that it might not be the ideal classifiers have detection sensitivities of 87%, 79.7%,
option for dropout study. Durso and Cunha [11] did a study 76.8%, and 73.9%, in that order. The most important
to find the explanatory variables for undergraduate variables were found to be self-efficacy, readiness for online
accounting program dropouts at a public institution in learning, and previous online experience using the Generic
Brazil. The survey database used contained socioeconomic Algorithm (GA), The most important criteria in predicting
and demographic information about 371 pupils. The study's dropout rates were found to be self-efficacy, readiness for
suggested logistic regression model accurately predicted online learning, and prior online experience.
77% of the sample's incidences of dropout or completion.
Five semi-structured interviews with sample members who A total of 1290 computer science graduates from
dropped out of school were used in a qualitative ALAQSA University between 2005 and 2011 were
investigation. The findings of the study have improved our examined by Abu-Oda and El-Halees [15] using various
comprehension of the issue of undergraduate dropouts from data mining techniques to examine and predict students'
accounting programs and have brought attention to the need dropout rates. The data sets were subjected to the application
for reassessing laws designed to keep talented people in the of various classifiers, including Decision Tree and Naive
country, especially those students who work to finance their Bayes, and they were tested using 10-fold cross validation.
education. 98.14% and 96.89%, respectively, of the classifiers were
accurate. The underlying links between students' dropout
Kim and Kim [12] conducted a study to examine the status and persistence in their enrollment were also
possible causes of South Korean university dropout rates. discovered using the FP-growth algorithm. The findings
Resources, students, faculty, and university features were demonstrated a strong correlation between learning "digital
the four main areas of concentration. They calculated them design" and "algorithm analysis" courses and student
using nonlinear panel data models utilizing three-year success.
balanced panel data from 2013 to 2015. The findings
demonstrated the considerable effects of teacher quality and 300 undergraduate computer course participants from
quantity, institution size and type, and cost and burden on five different universities were evaluated for their
students' financial resources on university dropout. performance using EDM by Bharadwaj and Pal [4]. The
Numerous other research have also employed data mining results of a senior secondary test, place of residence,
techniques to forecast student dropout rates. different habits, annual family income, and family status
were found to be significant predictors of academic
Tan and Shao [13] selected the personal characteristics achievement in their Bayesian categorization system of 17
of the students and utilized the Artificial Neural Network variables.
(ANN), Decision Tree (DT), and Bayesian Networks (BNs)
technique as a prediction model, by choosing the academic In order to predict academic achievement, Bharadwaj and
achievement and personal traits of the students as input Pal [5] created a new data set in a follow-up analysis that
attributions. The outcomes demonstrated that while all three includes test, seminar, and assignment marks in addition to
machine learning techniques were successful in predicting student attendance. Kovacic [6] suggested a study akin to
student dropout rates, DT performed better. this, using EDM to determine which enrollment information
may be utilized to forecast students' academic achievement.

IJISRT23OCT1694 www.ijisrt.com 2024


Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
He applied the CART and CHAID algorithms on a dataset The data that was collected from the selected schools,
of a student enrolment. there were 99,867 records in all, equivalent to the
enrolments of 20,807 students. Records that were
In an attempt to improve the caliber of higher incomplete or contained incorrect information were deleted
education, Al-Radaideh et al. [16] evaluated student after undergoing the data cleaning, resulting in a final
academic data (student gender, student age, department, sample of 85,527 records of 17,720 students in all. In other
high school grade, lecturer degree, lecturer gender, among to be able to predict the dropouts, four assumptions were
others) using a classification model constructed using the used, each of which involved the analysis of different
decision tree approach. They found that the feature with the numbers of records:
largest gain ratio, the high school graduation rate, was
considered the root node of the decision tree. The Holdout  The first assumption uses data from all records for
method and the K-Cross-Validation method (k-CV) were students who enrolled in the semesters between the years
used to evaluate the model. But they found that the collected 2016 to 2022. A dropout is assumed to be a student who
samples and attributes were not enough to generate a high- is yet to graduate and spent at least two years without
quality classification model. enrolling into the program, while a non-dropout is an
active student or one who finished his or her studies
In a case study, Gerben et al. [17] projected student between 2016 and 2022. A total of 85,527 records
accomplishment using machine learning techniques and satisfied these criteria, of which 28.1% belonged to
characteristics extracted from students' pre-university students who were classified as dropouts.
academic records. Decision trees are an easy-to-understand  In the second assumption a dropout is defined in the
and intuitive classifier that yielded practical results with same way as in the previous perspective; however, a
accuracy levels ranging from 75% to 80%, according to their non-dropout is defined as a student who finished his or
testing findings. One of their findings was that, even though her studies. The objective here is to eliminate noise of
it wasn't thought of as the essential course, linear algebra active students when training the algorithm, since it is
was the best indicator of success. Despite these findings, it's not known beforehand if they will graduate or abandon.
not apparent which data mining techniques work best in this A total of 35,132 records satisfied these criteria, of
scenario. which 43.7% belonged to students who were defined as
dropouts.
Luan, for example, developed predictors in [18] by  In the third assumption, dropouts and non-dropouts are
using clustering as a method for data exploration and defined as in perspective 1. The difference is that only
classification. One of the findings of Romero and Ventura's one semester (one period) is used for each student who
survey on EDM [19] was that association analysis has enrolled between 2016 and 2022. In the case of dropouts,
become a popular tactic. only information of the last semester before dropping out
is used, and in the case of non-dropouts, a semester is
Aulck et al. [20] used the largest available database of chosen at random. The purpose of this perspective is to
higher education attrition to estimate student dropout using eliminate noise from previous semesters of the dropout,
transcript data and demographic data from 32500 students at on the assumption that the most recent semester provides
one of the major public universities in the nation. According the most up-to-date information to predict if he or she is
to the results, it is possible to accurately forecast several going to drop out. A total of 15,720 records satisfied
early indicators of student attrition and dropout, even when these criteria, of which 28% belong to students who were
assumptions are based on data from academic transcripts defined as dropouts.
spanning only one term. It raised awareness of the  In this assumption, the definition of dropouts and non-
implications of student retention and success and retention dropouts is the same as that used in perspective 2, but
using artificial intelligence. only one semester per student is used, as was done in
third perspective. The objective of this perspective is to
III. METHODOLOGY eliminate noise from active students and from previous
semesters of students who drop out. A total of 7,936
This section presents the process of data collection and records satisfied these criteria, of which 55.7% belong to
analysis of the collected data. students who were defined as dropouts.

 Data Collection Table 1 presents the percentages of all participants’


In this section data were collected from the admission demographic characteristics. The number of male students
portal of the selected schools which includes the (70%) was greater than the number of female (30%)
demographic information (Gender, Ag, support and students, and the students’ ages ranged from 20 to 50 with
resources, occupation, and challenges and Barrier) and an average of 24. The majority of the students were
examination records from the selected school. The undergraduate and graduate student (Diploma or HND
examination records were collected with authorized students) (60.3%). Nearly half of the students (49.7%) have
permission from Examination Department of those full-time or part-time jobs and only a few of them (10.5%)
Institutions. The two records were joined together for data have sponsorship support.
cleansing and analysis.

IJISRT23OCT1694 www.ijisrt.com 2025


Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Table 1: The Demographic Characteristics of Participants
Gender Number of registered Number of dropout percentage of percentage of dropout
participants participants registered participants
participants
First Assumption
Female 25658.1 2163.00 30 8.43
Male 59868.9 3309.08 70 19.67
Second Assumption
Female 9074.60 1189.68 25.83 13.11
Male 26057.40 7971.00 74.17 30.59
Third Assumption
Female 2751 231.10 17.50 8.4
Male 12969 2541.92 82.50 19.6
Fourth Assumption
Female 3902.13 652.05 49.17 16.71
Male 4033.87 1572.81 50.83 38.99
Age
20-39 70,559.78 55,780.71 82.50 65.22
30-50 10,690.88 23,554.12 12.50 27.54
Sponsorship
YES 8980.335
NO 76,546.67

IV. DISCUSSION OF THE RESULTS however, the dropout rates of female are still high
percentages specially if translated into absolute amounts
In this section, results obtained from the analysis are also it is because the number of female admitted is always
presented to show the percentage and the distribution of less than that of male.
student’s dropout.

Fig 2: Distribution of scholarship student against Non-


Fig 1: percentage Dropout for Male and Female based on scholarship student
the four assumptions
Figure 2. Shows the distribution of scholarship
Figure 1, shows the trend in dropout rate of students students and no-scholarship students from the distribution
who got admitted for the period 2016-2022, classified by one can clearly see that non-scholarship students are higher
gender based on the above four assumption. The dropout than those with scholarship this is one of the contributing
rate for male is high compare to the dropout rate of Female; factor that leads to high number of dropout around the

IJISRT23OCT1694 www.ijisrt.com 2026


Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
schools due to economical factors that affects the leaving REFERECES
standard of those students.
[1]. Tinto, V.: Dropout from higher education: a
theoretical synthesis of recent research. Rev. Educ.
Res. 45(1), 89-125 (1975) 22. Wirth, R.: CRISP-DM:
towards a standard process model for data mining. In:
Proceedings of the Fourth International Conference
on the Practical Application of Knowledge Discovery
and Data Mining, pp. 29-39 (2000)
[2]. Yadav, S. K., Bharadwaj, B., & Pal, S. (2012).
Mining Education data to predict student's retention:
a comparative study. arXiv preprint arXiv:1203.2987.
[3]. Baradwaj, B.K., Pal, S.: Mining educational data to
analyze students' performance. arXiv preprint
arXiv:1201.3417 (2012)
[4]. Bhardwaj, B.K., Pal, S. (2012) Data mining: a
prediction for performance improvement using
classification. arXiv preprint arXiv:1201.3418
[5]. Kovacic, Z. (2010). Early prediction of student
success: Mining students' enrolment data.
[6]. Devasia, T., Vinushree, T. P., &Hegde, V. (2016,
March). Prediction of students performance using
Educational Data Mining. In 2016 International
Fig 3. Showing the Distribution of dropout by age
Conference on Data Mining and Advanced
Computing (SAPIENCE) (pp. 91-95). IEEE
From the distribution above it can be seen that the
[7]. Tekin, A. (2014). Early prediction of students' grade
number of students between the ages of 20-30 has high
point averages at graduation: A data mining
number of dropout this as a result of the fact that most of
them lacks academic, financial, and emotional support to approach. Eurasian Journal of Educational Research,
54, 207-226.
make them persevere in their studies. While the student
[8]. Yukselturk, E., Ozekes, S., &Türel, Y. K. (2014).
between the ages of 40-50 has low number of dropout
Predicting dropout student: an application of data
compare to the former one also here, the dropout is as a
mining methods in an online education program.
result of challenges they faced at their working place, family
European Journal of Open, Distance and e-learning,
challenges and health related issues due to their ages.
17(1), 118-133.
[9]. Brunsden, V., Davies, M., Shevlin, M., & Bracken,
V. CONCLUSIONS
M. (2000). Why do HE students drop out? A test of
Tinto's model. Journal of further and Higher
The high school dropout rate among students is a key
Education, 24(3), 301-310.
performance indicator for higher institutions. Telling the
student that they are in danger, are more likely to study more [10]. Durso, S. D. O., & Cunha, J. V. A. D. (2018).
Determinant factors for undergraduate student's
in tensely and better organize their semester workload. The
dropout in an accounting studies department of a
level coordinator can use the accurate prediction to
Brazilian public university. EducaçãoemRevista, 34.
determine whether or not to approve or deny students’
[11]. Kim, D., & Kim, S. (2018). Sustainable education:
requests to repeat failed courses.
Analyzing the determinants of university student
dropout by nonlinear panel data models.
ACKNOWLEGEMENT
Sustainability, 10(4), 954.
[12]. Tan, M. & Shao P. (2015). Prediction of student
This work was funded by a grant for Institutionbased
dropout in e-learning program through the use of
research (IBR) 2022 and 2023 marger for Kebbi State
machine learning method. International Journal of
Polytechnic Dakingari from the Nigeria’s Tertiary Education
Emerging Technologies in Learning (iJET), 10(1),
Trust Fund (Tetfund).
11-17.
[13]. Mustafa M. N., Chowdhury L., & Kamal M. S.
(2012, May). Students dropout prediction for
intelligent system from tertiary level in developing
country. In 2012 International Conference on
Informatics, Electronics & Vision (ICIEV) (pp. 113-
118). IEEE.

IJISRT23OCT1694 www.ijisrt.com 2027


Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[14]. Abu-Oda G. S. & El-Halees A. M. (2015). Data
mining in higher education : University student
dropout case study. International Journal of Data
Mining & Knowledge Management Process (IJDKP),
10(1), 15-27.
[15]. Al-Radaideh, Q. A., Al-Shawakfa, E. M., & Al-
Najjar, M. I. (2006, December). Mining student data
using decision trees. In International Arab
Conference on Information Technology
(ACIT'2006), Yarmouk University, Jordan.
[16]. Dekker, G. W., Pechenizkiy, M., &Vleeshouwers, J.
M. (2009). Predicting Students Drop Out: A Case
Study. International Working Group on Educational
Data Mining.
[17]. Jing, L.: Data mining and its applications in higher
education. New Dir. Inst. Res. 2002(113), 1736
(2002). https://doi.org/10.1002/ir.35,
https://onlinelibrary. wiley.com/doi/abs/10.1002/ir.35
[18]. Romero, C., Ventura, S.: Educational data mining: a
survey from 1995 to 2005. Expert Syst. Appl. 33(1),
135 - 146 (2007).
https://doi.org/10.1016/j.eswa.2006.04.005,http://ww
w.sciencedirect.com/science/article/pii/S0957417406
001266 Applying DM Techniques to Predict Student
Dropout 125
[19]. Aulck, L., Velagapudi, N., Blumenstock, J., & West,
J. (2016). Predicting student dropout in higher
education. arXiv preprint arXiv:1606.06364.
[20]. Herzog, S. (2006). Estimating student retention and
degree?completion time: Decision trees and neural
networks vis?à?vis regression. New directions for
institutional research, 2006(131), 1733.
[21]. Chapman, P., Clinton, J., Kerber, R., Khabaza, T.,
Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-
DM 1.0: Step-by-step data mining guide. SPSS inc,
16.Seidman, A.: Retention revisited: R= E, Id+ E &
In, Iv. Coll. Univ. 71(4), 18-20 (1996)
[22]. Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R.
X. (2013). Applied logistic regression (Vol. 398).
John Wiley & Sons.

IJISRT23OCT1694 www.ijisrt.com 2028

You might also like