1
Student name:
Student Id:
Institution name:
Course:
Date:
2
Topic selected:
Business intelligence
Data mining techniques in database system
3
Abstract:
Data mining has become increasingly important and is being used widely by
many different types of businesses, such as those in the marketing, e-
commerce, e-business, healthcare, and retail industries. It includes a vast
quantity of data related to the education sector, including details about
student performance, the impact of extracurricular activities on student
performance, and the relationship between teacher quality and student
performance. Sadly, we are unable to retrieve the confidential information
from this data. Advanced data mining techniques can be used to find hidden
information in the educational sector. Educational data mining (EDM) is the
practice of analyzing massive amounts of student data with data mining
methods at educational institutions. In order to forecast student
performance, identify incorrect student behavior, classify students, and
model students, the researchers explore whether data mining methodologies
give the most reliable and accurate conclusions. EDM aims to enhance the
learning experience by identifying and resolving problems in the educational
system. This study focuses on data mining techniques like artificial neural
4
networks, decision trees applied to large amounts of educational data, and
database knowledge-finding approaches.
Keywords:
Data mining, Clustering, Pattern Analysis, Educational systems, Web mining,
Web-based educational systems, Classification
5
Table of contents:
Introduction:
----------------------------------------------------------------------------- 6
Problem statement:
---------------------------------------------------------------------- 11
Key objectives:
---------------------------------------------------------------------------- 15
Premise:
------------------------------------------------------------------------------------ 16
Definitions:
--------------------------------------------------------------------------------- 17
Delimitations:
------------------------------------------------------------------------------ 20
Limitations:
--------------------------------------------------------------------------------- 21
Secondary analysis:
----------------------------------------------------------------------- 26
Primary analysis:
-------------------------------------------------------------------------- 30
Findings:
------------------------------------------------------------------------------------- 39
6
Conclusion:
---------------------------------------------------------------------------------- 44
References:
----------------------------------------------------------------------------------- 42
7
Chapter 1:
Introduction:
Business intelligence is the systematic procedure of collecting, refining,
structuring, evaluating, and delivering data to enable its application in
formulating strategic decisions inside an organization. Business intelligence
(BI) gives manager’s insights into their goods and services' usage and
performance patterns.
The term 'data mining' originated in the 1930s and played a prominent role
in Alan Turing's development of a computational machine (Bradley, 2021).
Over time, significant progress has been made in Business Intelligence,
highlighting the imperative nature of extracting information from data
through Information Retrieval1. Data mining is the systematic and rigorous
1
Bradley, V. M. (2021). Learning Management System (LMS) used with
online instruction. International Journal of Technology in
Education, 4(1), 68–92.
8
process of deriving valuable and significant insights from large volumes of
data. Imagine a hypothetical scenario in which a vast amount of data exists;
however, it lacks any means of analysis or examination. Several decades
ago, a prevalent situation emerged which paved the way for the
advancement of technology.
The term "educational data mining" (EDM) describes methods, tools, and
studies intended to automatically derive insights from massive data
repositories from or connected to individuals' educational activities. EDM is a
burgeoning field concerned with creating methods for finding connections
among the distinct and progressively more significant amounts of data
produced by educational domains and then using those methods to get
deeper insights into student behavior and learning 2.
2
Brijesh Kumar, S., & Sourabh, P. (2011). Mining educational data to
analyze students' performance. IJACSA, 2(6).
9
Educational systems are progressively gathering and storing data on user
activities. Statistical analysis, machine learning, and data mining can be
used to study these data (such as trace data, system log data, and massive
data sets) (Brijesh Kumar & Sourabh, 2011).
The development of data analysis computational tools, data recording format
standardization, and increased computation/processing capability are
opening up new study areas for teach scientists to investigate 3.
Learner performance and behavior, subject-matter expertise, test scores,
lesson plans, and practical applications are some areas where EDM can spot
and predict patterns. For example, LMSs record details such as when and for
how long each student views a particular learning object and the total
amount of time the item is visible on their screen. Intelligent tutoring
3
Freeman, R. (2017). The relationship between extracurricular activities
and academic achievement.
10
systems, for example, record details such as the time a solution was
submitted, whether or not it matched the expected solution, how long it took
to submit subsequent solutions, and in what order the solution's components
were entered into the interface (Freeman, 2017). A relatively brief session
(say, 30 minutes) in a computer-based learning environment can provide a
large amount of process data for examination because of the precision and
breadth of this data. Data can also be coarser in other contexts. For
example, a university transcript may detail the student's coursework, final
grades, and the date they declared or changed their major. 4
The following four criteria are what the researchers used to assess the
different EDM implementations (Kumar & Pal, 2011).
4
Kumar, B., & Pal, S. (2011). Mining Educational Data to Analyze
Student's Performance. International Journal of Advanced Computer
Science and Applications, 2(6). doi:10.14569/ijacsa.2011.02060
11
The prediction of the student's performance was the first requirement. This
criterion aims to enhance learning, foresee academic failure in students, and
fortify the educational process. In addition, it helps universities and colleges
plan for the future. According to the article, the most popular methods for
forecasting student performance are Bayesian classification, decision trees,
neural networks, rule-based, and feature selection 5. According to the results
of the research, the Rule-Based model is the most effective prediction model.
Unacceptable student behavior is the focus of the second criterion. Student
misconduct, lack of desire, cheating, and academic failure are just some of
the issues that can be discovered by employing these indicators. 6 The most
widely used techniques in this category include decision trees, neural
6
Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., & Navarro-
Colorado, B. (2019). A systematic review of deep learning approaches
to educational data mining. Complexity, 2019.
12
networks, classification, clustering, outlier identification, and feature
selection (Hernandez-Blanco et al., 2019).
The study on student grouping was the third element that the researchers
discussed. This criterion attempts to group or classify students according to
the qualities of their knowledge. This program is well-liked by stakeholders in
education because it is used in many activities to enhance the learning
process. Function filtering, neural networks, and clustering are the most
often used techniques in this area.
Ultimately, the researchers' assessment was only centered on student
modeling. These characteristics define a number of the student's personality
traits. Cognition, skills, feelings, domain awareness, learning strategies,
achievements, characteristics, learning habits, outcomes, and assessment
are included.
13
Problem statement:
Data was highlighted as being essential to success. In the modern world,
analysis and information extraction from data are utilized to make critical
judgments7. To give you an idea of how challenging information extraction
7
Jung, S., & Huh, J. H. (2019). An Efficient LMS Platform and Its Test
Bed. Electronics, 8(2), 154. Retrieved from
14
could be. Several alternative formats for storing data include relational
databases, data warehouses, transactional databases, and object-relational
databases8. These advancements were solely intended to manage the
massive amounts of data generated and to retrieve and store the data
gathered as required. The different kinds of data that are stored in each of
these databases further complicate issues. Text, multimedia, numerical data,
and other formats can all be stored with it. This is when data mining is
practical (Jung & Huh, 2019).
What happens if the conclusions drawn from the data are false? This query
identifies the most common problem the sector runs into when trying to get
helpful information out of data. Due to the abundance of many data kinds,
the data has noise or incompleteness. Because of the anomalies, this makes
the mining process more difficult and could lead to inaccurate findings.
https://www.mdpi.com/2079-9292/8
8
15
A company's daily data generation averages billions of records. It takes an
effective model to process such large volumes of data and extract
knowledge. There is no guarantee that the data in databases is structured.
Semi-structured, unstructured, and structured data are all mixed in it. Apart
from that, classifying the data according to its nature is essential. Numerous
anomalies and hidden patterns can be found in databases (kumar & Pal,
2019).
Identifying hidden patterns and abnormalities to ensure the recovered data
is accurate. For the future result to be predicted, each variable must be
present. If there are discrepancies in the data, the forecast is invalidated and
has consequences because it leads to wrong decisions. All of these possible
mistakes advanced the idea of data mining9. Data has become essential to
business, and evaluating data manually is no longer feasible. There was an
9
Kumar, B., & Pal, S. (2011). Mining Educational Data to Analyze
Student's Performance. International Journal of Advanced Computer
Science and Applications, 2(6). doi:10.14569/ijacsa.2011.02060
16
urgent need to automate the information retrieval process. Every step of the
process, from gathering data to creating reports, needed to be automated.
Data mining is the process's last stage. Before being designated as
mineable, the data undergoes several preparation phases. Data
preprocessing should be finished before the data mining process is
integrated with database management systems (DBMS) or any other target
data source. The effectiveness of the model is mainly dependent on the
integration schema that is used10.
Determining the usefulness of the data is another issue. The goal of a data
mining process is to produce concrete results, which can be predictive or
descriptive, to aid in making important decisions. Determining if the data will
10
Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and predicting
students' academic performance using data mining
techniques. International Journal of Modern Education and Computer
Science, 8(11), 36.
17
help achieve the intended outcome is crucial. The information derived from
the data should be known before the execution. Each element plays a part in
producing high-quality work and facilitating effective decision-making
(Mueen et al., 2016).
Based on the analysis, the categorization and visualization can be carried
out. These are the essential steps in developing a robust data mining model.
Parameter identification is necessary for models to be accurate and useful.
In summary, the challenges faced while attempting to extract significant and
trustworthy information from data encompass a range of factors, including
but not restricted to:
The existence of anomalous data in different combinations of the types
and types of data stored in each database.
Determining the best data sources, techniques for data extraction, and
the kind of data mining strategy to employ (Makhtar et al., 2017).
18
Recognizing irregularities and discrepancies in data that lead to noisy
and inadequate information11.
Key objectives of EDM:
While improving our understanding of how students learn is the main goal of
EDM, it can also serve a variety of informational purposes for researchers,
instructors, administrators, and students (Mueen et al., 2017).
Delivering pertinent assignments, learner activities, and resources to
maximize student learning is the aim of EDM when students are the center of
attention.12. Web-based educational systems that are intelligent, adaptable,
and real-time utilize goals, preferences, and assessments of student aptitude
to provide the best possible learning content.
11
Makhtar, M., Nawang, H., & WAN SHAMSUDDIN, S. N. (2017). ANALYSIS
OF STUDENTS PERFORMANCE USING NAÏVE BAYES CLASSIFIER. Journal
of Theoretical & Applied Information Technology, 95(16).
12
19
More generally, suggestions about what can be helpful (or harmful) for
learning and engagement in general might be given to students. Examples
include course sequencing, using university facilities, online and social media
behavior, etc.
Obtaining input on the subject matter, mode of delivery, and instruction
organization is the aim while concentrating on teachers 13. These comments
can draw attention to frequent misconceptions and erratic learning styles,
enabling educators to enhance their education methods.
One such use is in figuring out the optimal order of lessons to help pupils
learn.
13
Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and Predicting
Students’ Academic Performance Using Data Mining Techniques.
International Journal of Modern Education and Computer Science,
8(11), 36–42. doi:10.5815/ijmecs.2016.11.05
20
Focusing on administrative personnel can help improve learning
management systems, servers, and user interfaces while also providing a
better understanding of student attendance and retention. Using information
gleaned from a university's LMS, statisticians could build models to foretell
students' levels of participation and persistence (Razaque et al., 2017).
Researchers might also be the focus of EDM 14. Data mining researchers
operate in any of the domains above and develop and assess data mining
approaches for effectiveness.
Premise:
14
Razaque, F., Soomro, N., Shaikh, S. A., Soomro, S., Samo, J. A., Kumar,
N., & Dharejo, H. (2017). Using naïve Bayes algorithm to students'
bachelor academic performances analysis. In 2017 4th IEEE
International Conference on Engineering Technologies and Applied
Sciences (ICETAS) (pp. 1–5). IEEE.
21
The primary purpose of data mining techniques in databases is data
retrieval. Imagine a world where data is abundant, but analysis tools still
need to be improved. A few decades ago, this was the norm, and it paved
the way for technological advances. Therefore, the primary purpose of
research is to introduce techniques to retrieve the data.
Definitions:
Business intelligence:
With the use of modern computing resources, business intelligence (BI) can
be used to analyze data and reveal useful insights for decision-makers at any
level of an organization (Razaque et al., 2017). Organizations collect data
from internal and external IT systems, prepare it for analysis, run queries
against the data, and create data visualizations, BI dashboards, and reports
as part of the BI process to provide analytics results to business users for
operational decision-making and strategic planning 15.
15
Razaque, F., Soomro, N., Shaikh, S. A., Soomro, S., Samo, J. A., Kumar,
N., & Dharejo, H. (2017). Using naïve Bayes algorithm to students'
22
BI is the process of gathering and analyzing data to make more informed
company decisions, which in turn leads to increased profits, smoother
operations, and a competitive advantage. Business intelligence accomplishes
this by fusing together analytics, data management, and reporting
technologies.
Data mining:
Data mining is searching through large data sets for functional patterns and
relationships that may be used to help solve problems in business 16. Thanks
to data mining tools and methodologies, companies can benefit from
improved foresight and decision-making (Makhtar et al., 2017).
Data mining techniques:
Preprocessing techniques, also referred to as data mining techniques, are
simply methods of transforming large-scale data into a format that is
bachelor academic performances
Makhtar, M., Nawang, H., & WAN SHAMSUDDIN, S. N. (2017). ANALYSIS OF
16
STUDENT'S PERFORMANCE USING NAÏVE BAYES CLASSIFIER
23
suitable for data mining. This reduces the possibility of redundancies and
inconsistencies resulting from the large data size, find associations between
the data, normalize the data, and truncates outliers to extract high-quality
data that can be useful for the intended purpose. 17. Two data mining
techniques exist: descriptive and predictive. Each solution has a different
precision and accuracy rate based on the kind of data and the environment.
Cluster analysis:
Using no prior knowledge of labels or information classes, clustering is an
unsupervised technique that identifies related components of large amounts
of data and groups them into clusters. Next, using its descriptive
methodology, the algorithm determines the distance to the center of each
cluster and analyses the findings to find commonalities. K-means,
hierarchical clustering, and density-based spatial clustering of applications
with noise are only a few of the clustering methods that are accessible
17
Reddy, C. (2021). Data Mining: Purpose, Characteristics, Benefits &
Limitations. https://content.wisestep.com/data-mining-purpose-
characteristics-benefits-limitations
24
(Reddy, 2021). However, because it can easily cluster large datasets, K-
means is the most accurate clustering method.
Regression analysis:
Since it shows the links and similarities between different data variables and
presents the data in a linear graph, regression analysis, also known as linear
regression, is one of the most important data mining techniques 18. Lastly, it
predicts the ideal trait using the model's best-fitting line. This is the most
often used approach to statistical analysis and the most fundamental
method among data mining approaches. Finding an equation between the
dependent and independent variable, or variables, if any, is the primary
objective of such procedures (Jung & Huh, 2019).
Classification analysis:
18
Jung, S., & Huh, J. H. (2019). An Efficient LMS Platform and Its Test
Bed. Electronics, 8(2), 154. Retrieved from
https://www.mdpi.com/2079-9292/8
25
Data miners most frequently employ the classification technique, which
assigns pre-categorized instances to create a model that can more broadly
classify the population of other records.
According to Khanna, Singh, and Alam (2016), this technique classifies data
according to their training sets. Then, it uses the sequence to classify
additional new data that is also a training set.
Decision tree:
Massive discrete data groups are split into two halves using the Decision
Tree technique, known as leaf nodes and nodes, where nodes stand for any
variable class contained within the input variable. When dealing with
complex data, decision trees are frequently used to select the best variables
and limit the number of variables. It shows the degree of linkage between
the child nodes and either the parent or leaf nodes. The most vital factors in
the decision tree are found at the top and so on down the tree until it hits the
leaf node (Zhenzhen et al., 2020). The hierarchy of EDM is different from that
26
of traditional data mining, despite the appearance that both types use the
same methodologies.19.
Naïve byes:
Nave Bayes has attracted the attention of numerous academics lately
because of its reliable results and data on practical issues. The foundation of
the Nave Bayes Technique is Bayes' Theorem, a technique for figuring out
the likelihood of an objective given a predictor or the posterior probability
(Hernandez-Blanco et al., 2019). Nave Bayes also yielded the most accurate
result. 74% of the findings produced by the Naive Bayes Classifier were
19
Zhenzhen, M., Zhu, J., & Zhang, S. (2021). Probabilistic-based
expressions in behavioral multiattribute decision making considering
pre-evaluation. Fuzzy Optimization and Decision Making, 20(1), 145–
173. http://dx.doi.org/10.1007/s10700-020-09335-
27
accurate in identifying hidden data between factors that affected student
performance20.
Delimitations of study:
i. An aid to future trend predicting
The data mining framework preserves the components' and structures'
illuminating variables. The fact that it will help identify upcoming predictive
analyses is one of the advantages. It is entirely attainable with individual
behavioral changes and inventiveness.
ii. Making a Decision
In the decision-making process, selecting the appropriate course of action is
crucial. Some data mining techniques can be implemented in any method or
20
Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., & Navarro-
Colorado, B. (2019). A systematic review of deep learning approaches
to educational data mining. Complexity, 2019.
28
technique to find variations and trends in marketing strategies. Therefore, it
will be beneficial down the road.
With technology, almost any data could be easily assessed in the recent
past. Such an invention allows one to make an exact decision about
something mysterious and unexpected.
iii. Rises in Business Income
One process that gets ready for different kinds of innovation processes is
data mining. Decision-making can be generated from the data by translating
it from a business perspective. Numerous corporate sectors and commercial
organizations can apply these techniques 21. Any firm with data and forms can
be broken down; information mining enables us to take important data out of
21
Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and predicting
students' academic performance using data mining
techniques. International Journal of Modern Education and Computer
Science, 8(11), 36.
29
these assets, improve commerce forms, boost proficiency, and boost
productivity (Mueen et al, 2016).
It may concern advertising, marketing, e-commerce, supply chain
management, healthcare, or neighborliness. In essence, data mining aims to
penetrate the display to enable us to make official decisions that will
profoundly influence decisions made in the future (Mueen et al., 2016).
iv. Identifying Fraud
The most crucial and hazardous thing on the internet is a data
breach22. With data mining, artificial intelligence can categorize and
divide information and intuitively identify fraud and rules within the
data, suggesting patterns and designs that may include extortion-
22
Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and predicting
students' academic performance using data mining
techniques. International Journal of Modern Education and Computer
Science, 8(11), 36.
30
related ones. Data mining makes it simple to identify credit card
transaction fraud. Most information mining components are developed
using market research data.
Through the use of these investigative techniques, counterfeit items
and acts that are on display can be found.
v. Customer’s preferences
With the usage of data mining procedures, marketing professionals will be
able to comprehend all of the customer data tactics. Since data mining
systems can handle vast amounts of pertinent information, we are
developing data mining approaches that will help track customer habits.
Limitations:
While data mining can be advantageous for an individual or organization
during decision-making, there are significant limitations when utilizing real-
time data. The following are some drawbacks or limitations of the data
mining technique (Reddy, 2021).
31
The gathering of user data and the application of diverse marketing
strategies within businesses constitute the main components of the
Data Mining process.23.
Due to the large volume of user data these analysis techniques handle,
there is a significant chance that user privacy will be infringed and
data will be compromised.
There are no limitations on the amount of user data used for analysis
and marketing using these tactics24
23
Reddy, C. (2021). Data Mining: Purpose, Characteristics, Benefits &
Limitations. https://content.wisestep.com/data-mining-purpose-
characteristics-benefits-limitations
24
Razaque, F., Soomro, N., Shaikh, S. A., Soomro, S., Samo, J. A., Kumar,
N., & Dharejo, H. (2017). Using naïve Bayes algorithm to students'
bachelor academic performances analysis. In 2017 4th IEEE
International Conference on Engineering Technologies and Applied
Sciences (ICETAS) (pp. 1–5). IEEE.
32
Because this involves gathering massive amounts of user data and
storing it in conventional databases or cloud-based systems, there is a
significant risk of data misuse.
The data mining process guarantees data accuracy for individuals or
teams inside an organization, regardless of limitations (Razaque et al.,
2017).
One of the main obstacles to data mining is security. Data theft is a
significant risk because these firms keep many customer data, like
SSNs, addresses, phone numbers, etc.
Any company can have a data breach; big businesses like Ford,
Equifax, CapitalOne, and others have recently suffered.
For businesses, the initial cost of setting up the infrastructure for data
mining will be high. However, this expense will be covered as the
companies advance with data mining technologies or frameworks that
use data mining technologies.
33
These data are fully accessible to governments for better governance
strategies.
Everything related to a client, including chat conversations, is fully
accessible to the government
The government can obtain information about a person at any time
and keep tabs on every aspect of that person's social and personal life.
Essential members of society have expressed significant concern and
disagreement over this.25
The solution to these issues lies in transparent policies between the
government, organizations, and users (Razaque et al., 2017).
Governments, corporations, and individuals can all gain immensely
from data mining; yet, if these benefits are not properly handled, they
can raise serious issues with user data security and privacy.
Chapter 2
Razaque, F., Soomro, N., Shaikh, S. A., Soomro, S., Samo, J. A., Kumar, N., &
25
Dharejo, H. (2017). Using naïve Bayes algorithm to students' bachelor
academic performances analysis.
34
Secondary analysis of data:
A significant surge in the volume of data and information within companies
characterizes the contemporary era of information technology. The vast bulk
of the material is accessible in digital format and is stored within a
substantial database. Data mining plays a crucial role in identifying and
extracting algorithms, correlations, and patterns within extensive datasets,
facilitating the ability to forecast future outcomes. Organizations that employ
this method get enhancements in their revenues, reductions in their
expenses, and mitigations of their investment risk 26.
Data mining, which involves extracting information and developing skills
related to large-scale data sets, is widely acknowledged as a significant
26
Zhenzhen, M., Zhu, J., & Zhang, S. (2021). Probabilistic-based
expressions in behavioral multiattribute decision making considering
pre-evaluation. Fuzzy Optimization and Decision Making, 20(1), 145–
173. http://dx.doi.org/10.1007/s10700-020-09335-
35
research topic within database systems and machine learning. The industry
in question holds considerable importance for the business, as it possesses
the capacity to generate revenue (Zhenzhen et al., 2021). Data mining has
garnered significant attention and intrigue among academics and scholars
across several academic disciplines. The advent of data-providing services,
such as data warehousing and internet services, has facilitated the
comprehension of user behavior to enhance service provision and capitalize
on business prospects. The organization can receive several forms of media,
including text, audio files, video, and photographs (Zhenzhen et al., 2021).
The fundamental objective of data mining is to acquire data, perform data
cleansing, and employ contemporary data mining methodologies to enhance
decision-making processes. Its ongoing growth and dynamic nature
characterize data mining, as numerous scholars have documented the
emergence of innovative methods. Providing a comprehensive overview of
data mining techniques and applications within the confines of concise
research is challenging. Data mining is a relatively nascent and promising
36
area of investigation that presents numerous unanswered issues and
obstacles. This novel research question warrants further exploration 27.
This study paper offers a comprehensive viewpoint on data mining
challenges and techniques, as seen through the lens of a database
researcher. Various data mining methodologies have been extensively
studied and contrasted in academic studies. This study article examines the
incorporation and influence of data mining processes inside relational data
set frameworks. It explores the accessibility of data mining services in
databases and identifies the appropriate tools required for implementation .
28
27
Zhenzhen, M., Zhu, J., & Zhang, S. (2021). Probabilistic-based
expressions in behavioral multiattribute decision making considering
pre-evaluation. Fuzzy Optimization and Decision Making, 20(1), 145–
173. http://dx.doi.org/10.1007/s10700-020-09335-
28
Jung, S., & Huh, J. H. (2019). An Efficient LMS Platform and Its Test
Bed. Electronics, 8(2), 154. Retrieved from
https://www.mdpi.com/2079-9292/8
37
Furthermore, the paper highlights the significant advantages and
disadvantages of data mining across various data sets. This inquiry has
demonstrated the integration of database technology with data mining
strategies (Jung & Huh, 2019).
A common practice in many economic areas, data mining concentrates on
specifics and potential data sequences from enormous amounts of data that
are impossible to explain manually. In addition, several data mining methods
vary in efficiency and usability, including regression, classification,
clustering, association rules, time series analysis, etc. The following paper
will cover decision trees, Nave Bayes, classification analysis, regression
analysis, cluster analysis, and association rule. Using data mining techniques
facilitates teaching and indicates that data mining techniques are an
excellent way to anticipate students' academic progress. Khan & Ghosh
(2020) emphasized that to provide the education sector with more accurate
and dependable data, it is imperative to understand the factors that may
38
impact a student's overall academic performance 29. Abizada et al. (2020)
considered extracurricular activities to be one factor that can influence
students' academic performance. Students' overall academic success is
influenced by whole teaching quality, which includes monitoring, student-
tutor contact, and other elements. Defined attributes provide researchers
with a conceptual understanding of the algorithms as a last step. After that,
the data will be examined using a suitable data mining technique, producing
forecasts of students' academic success that are reliable and trustworthy.
29
Khan, A., & Ghosh, S. K. (2020). Student performance analysis and
prediction in classroom learning: A review of educational data mining
studies. Education and Information Technologies, 26(1), 205– 240.
doi:10.1007/s10639-020-10230-3
39
Chapter 3
Primary analysis of data:
These days, there are a lot of data mining frameworks available. While some
data mining systems are more comprehensive and flexible, others are
specific frameworks dedicated to a particular information source or limited to
specific data mining features.
To generate more systematic results, data mining is a recursive process that
clarifies the mining interaction and coordinates a large amount of new
information. Practical, flexible, and adaptive data analytics are necessary for
data mining30. It could be considered a standard assessment of data
30
Zhenzhen, M., Zhu, J., & Zhang, S. (2021). Probabilistic-based
expressions in behavioral multiattribute decision making considering
pre-evaluation. Fuzzy Optimization and Decision Making, 20(1), 145–
173. http://dx.doi.org/10.1007/s10700-020-09335-
40
innovation. Data readiness and data mining errands complete the data
mining process as an information detection method (Zhenzhen et al., 2021).
Any information, including databases and sophisticated data sets, such as
time schedules, can be the basis for data mining cycles. There are unique
difficulties with the data mining measure. Both obscure and potentially
helpful information can be revealed through data mining.
Regardless of how new, valuable, or fascinating the data is, it is abstract and
depends on the client and application. Data mining undoubtedly can produce
a very high number of patterns. Sometimes, it is easy to expand the number
of ways to a large number. To lessen the larger-than-usual information data
mining results, most organizations can even think about doing a metadata
mining stage. to minimize the quantity of found rules and techniques that
are more likely to be of little interest to the individual who has to assign a
value to the examples (Watson & Watson, 2012).
Evaluation of mined information and the Knowledge Discovery in database
interaction depends on identifying and quantifying the fascinating quality of
41
methods and patterns found or yet to be found. On the other hand, some
current estimates suggest that a major investigation issue is the intriguing
quality of the found data31.
Thanks to data mining, numerous techniques are available for identifying
hidden patterns in vast amounts of data. Data mining can be done using
various techniques and algorithms, but which ones are used depends on the
application. Predictive data mining techniques are appropriate when we have
a particular objective that we would instead anticipate regarding your
information. Combining data mining techniques with online analytical
processing is desirable for obtaining valuable data from big business
datasets.
31
Watson, R., & Watson, S. (2012). An argument for clarity: What are
learning management systems, what are they not, and what should
they become? TechTrends, 51(2), 28 – 3
42
This paper identifies and tackles the most prevalent challenges when
extracting accurate and valuable information from collected data. The study
focuses on the issues that arise and the suitable data mining methods that
can apply to the available data type and the intended outcome of data
mining. Based on the results of valuable information obtained through data
mining, data mining techniques will help organizations make decisions
(Wang & Miao, 2020). We looked into the different issues that arose when
extracting data from extensive databases using data mining, and we found
some restrictions and boundaries, in addition to problem statements, in our
paper. Organizations must adhere to specific security protocols to ensure
data extraction occurs without cyberattack vulnerability 32. Together, we
come up with some sound hypotheses to address the problems associated
with data mining, and we keep looking for more workable approaches to
32
Wang, G., & Miao, J. (2020). Design of data mining algorithm based on
rough entropy for us stock market abnormality. Journal of Intelligent &
Fuzzy Systems, 39(4), 5213-5221.
43
employing different data mining techniques to extract valuable information
from massive datasets in businesses.
Data mining application:
Data mining has recently alleviated stress and workload in various areas.
Data mining is a tool used by large corporations to reduce labor costs and
increase profits. It also has a significant impact on the business community
(Thamilselvana & Sathiaseelan, 2015). Data mining is widely used in
education to enhance and add value to the educational process. As a result,
data mining has a wide range of applications and operates in various
departments and sectors33.
Educational data mining:
33
Thamilselvana, P., & Sathiaseelan, J. G. R. (2015). A Comparative
Study of Data Mining Algorithms for Image Classification. International
Journal of Education and Management Engineering, 5(2), 1.
http://dx.doi.org/10.5815/ijeme.2015.02.01
44
Educational data mining is the product of incorporating and employing data
mining techniques in the educational sector with student educational data
like grades and attendance. In addition, Educational Data Mining provided
accurate projections of student performance and behavior. Data analysis and
visualization, student behavior detection, classifying students, social network
analysis, instructor feedback, student recommendations, grade prediction,
and content development were all activities described by Educational Data
Mining.
Taxonomy & application of educational data mining:
A set of standardized steps are followed in the EDM process to achieve the
desired roles. First, unprocessed educational data is gathered from academic
surveys, registration, and learning management systems. Next, the collected
data will be preprocessed, with null data truncated. The intended objective of
the EDM is then addressed using the proper data mining techniques, such as
weather predictors or descriptors (Abu Saa et al., 2019). Although the same
45
data mining techniques may be used in EDM and traditional data mining, the
hierarchy of EDM distinguishes EDM from conventional data mining 34.
Academic performance:
Academic performance is defined as the educational outcome of a student,
learner, tutor, or institution in meeting their educational objectives 35.
Furthermore, academic performance is an essential key criterion for
determining the level of education to make amends and corrective actions.
34
Abu Saa, A., Al-Emran, M., & Shaalan, K. (2019). Factors Affecting Students’
Performance in Higher Education: A Systematic Review of Predictive Data
Mining Techniques. Technology, Knowledge and Learning, 24(4), 567–598.
doi:10.1007/s10758-019-094
35
Adnan, K., & Akbar, R. (2019). An analytical study of information
extraction from unstructured and multidimensional big data. J Big Data
6, 91. https://doi.org/10.1186/s40537-019- 0254-8
46
Furthermore, accurately predicting low academic performance is critical for
assisting those in need.
Factors affecting student academic process:
In order to process and predict data efficiently, it is imperative to identify the
various aspects that influence student academic performance in several
complex ways. A few factors affecting a student's performance are their age,
previous schooling, neighborhood, and behavior (Adnan & Akbar, 2019).
Other potential influences include parent occupation, student demographics,
behavior, and social background. Additionally, the quality of instruction could
impact student achievement, which could indirectly affect students'
motivation, satisfaction, and other aspects of the course.
Additionally, prior knowledge of the subject matter could likely have an
impact on how well pupils perform academically. There is a favorable
correlation between students' course achievement. Lastly, it has been
demonstrated that extracurricular activities may impact students' academic
achievement, and a correlation between students' participation in
47
extracurricular activities and academic achievement has been found. 36.
However, the only elements highlighted as impacting students' academic
achievement were extracurricular activities and the quality of the instruction
they received (Adnan & Akbar, 2019). These data types could be easily
gathered in similar prerequisites, which implied prior knowledge of the
courses or, in this case, the location.
Extracurricular activities affecting student academic process:
A positive association exists between participation in extracurricular
activities and language and mathematics scores. Furthermore, the number
of clubs joined reliably predicts academic outcomes 37
36
Adnan, K., & Akbar, R. (2019). An analytical study of information
extraction from unstructured and multidimensional big data. J Big Data
6, 91. https://doi.org/10.1186/s40537-019- 0254-8
37
Freeman, R. (2017). The relationship between extracurricular activities
and academic achievement.
48
The Influence of Teaching Quality on Students' Academic
Performance
The quality of teaching in terms of connection with students, developing a
supportive relationship, and offering enough assistance and feedback were
studied to determine the relationship between them and overall satisfaction
with the tutoring process (Freeman, 2017). As expected, a positive
relationship was extracted from the correlation between the quality of
teaching and students' academic performance, demonstrating the critical
role of education quality in overall student satisfaction with the taught
material, which affects student performance.
49
50
Effect of educational data mining on academic performance
51
Chapter 4:
Findings:
Data classification techniques to predict student performance:
The classification technique uses the data at hand to forecast a model. This
makes it the most appropriate method for analyzing educational data, and it
is how the educational sector adapts it by using the academic performance
of older students as a criterion to forecast students' future academic
performance (Hernandez-Blanco et al., 2019). To be clear, most search
studies focus on using a classification approach for forecasting based on
admissions data, the progression of individual course students, grade
inflation, the anticipated percentage of failing students, and aiding in the
grading system. The classification was our choice because educational data
mining classification approaches aim to identify the factors that most impact
52
student achievement.38. Type aids in the final grade classification of
students.
Decision tree classifier to predict the performance of students:
Exploring and classifying student educational data into decision trees
enhances students' overall academic achievements by facilitating decision-
making by highlighting the educational data analysis of the students that
predicts their academic performance. It is appropriate for the educational
sector with an emphasis on improving students' academic performance
because, as per Hamsa et al. (2017), their results using a decision tree
approach increased the at-risk students, motivating tutors to focus on the
teaching process and thereby improving the quality of teaching.
Naïve Bayer classifier to predict the performance of students:
38
Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., & Navarro-
Colorado, B. (2019). A systematic review of deep learning approaches
to educational data mining. Complexity, 2019.
53
Makhatar et al. (2017) successfully retrieved hidden data between attributes
that compromise students' academic achievements and then looked into the
potential use of Nave Bayes as a trustworthy classifier/predictor to enhance
academic performance. Student performance on final exams was defined by
hidden data extracted using the Nave Bayes classifier 39. Consequently, this
method helped identify students who needed more attention, predicted
dropout rates, and enabled tutors to step in and provide the necessary
support.40. EDMin's support and trustworthy data on students' academic
39
Makhtar, M., Nawang, H., & WAN SHAMSUDDIN, S. N. (2017). ANALYSIS
OF STUDENTS PERFORMANCE USING NAÏVE BAYES CLASSIFIER. Journal
of Theoretical & Applied Information Technology, 95(16).
40
Hamsa, H., Indiradevi, S., & Kizhakkethottam, J. J. (2016). Student
academic performance prediction model using decision tree and fuzzy
genetic algorithm. Procedia Technology, 25, 326-332.
54
performance make Nave Bayes a suitable approach for the educational
sector (Hamsa et al., 2016).
Conclusion:
Academic data derived from a face-to-face (traditional) tutoring system or a
virtual one using information stored in LMS using EDM, such as grades,
attendance, online quizzes, or even feedback on the quality of material and
education, are attributes that measure academic performance to build
relevant and efficient educational data mining applications. All of these
relevant attributes aid in forming a conceptual and technical understanding
to build relevant and efficient educational data mining applications. In order
to construct a user knowledge model, the student's knowledge and the
caliber of their studies—that is, the duration of their study sessions or the
number of correct answers they provide—must be considered. Furthermore,
altitude and absenteeism help develop a User Behaviour Model that helps
predict academic progress.41. The educational system has recently
41
Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., & Navarro-
Colorado, B. (2019). A systematic review of deep learning approaches
55
emphasized two factors—internal assessments and completed semester
exams—that it believes improve the veracity of outcomes regarding
academic performance. The tutor administers internal assessments through
homework, participation, and the outcomes of in-class exams (Hernandez-
Blanco et al., 2019).
References:
Abu Saa, A., Al-Emran, M., & Shaalan, K. (2019). Factors Affecting Students’
Performance in Higher Education: A Systematic Review of Predictive
Data Mining Techniques. Technology, Knowledge and Learning, 24(4),
567–598. doi:10.1007/s10758-019-094
Adnan, K., & Akbar, R. (2019). An analytical study of information extraction
from unstructured and multidimensional big data. J Big Data 6, 91.
https://doi.org/10.1186/s40537-019- 0254-8
to educational data mining. Complexity, 2019.
56
Bradley, V. M. (2021). Learning Management System (LMS) used with online
instruction. International Journal of Technology in Education, 4(1), 68–
92.
Brijesh Kumar, S., & Sourabh, P. (2011). Mining educational data to analyze
students' performance. IJACSA, 2(6).
Freeman, R. (2017). The relationship between extracurricular activities and
academic achievement.
Hamsa, H., Indiradevi, S., & Kizhakkethottam, J. J. (2016). Student academic
performance prediction model using decision tree and fuzzy genetic
algorithm. Procedia Technology, 25, 326-332.
Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., & Navarro-Colorado, B.
(2019). A systematic review of deep learning approaches to
educational data mining. Complexity, 2019.
57
Jung, S., & Huh, J. H. (2019). An Efficient LMS Platform and Its Test Bed.
Electronics, 8(2), 154. Retrieved from https://www.mdpi.com/2079-
9292/8
Kumar, B., & Pal, S. (2011). Mining Educational Data to Analyze Student's
Performance. International Journal of Advanced Computer Science and
Applications, 2(6). doi:10.14569/ijacsa.2011.02060
Khan, A., & Ghosh, S. K. (2020). Student performance analysis and prediction
in classroom learning: A review of educational data mining studies.
Education and Information Technologies, 26(1), 205– 240.
doi:10.1007/s10639-020-10230-3
Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and predicting students'
academic performance using data mining techniques. International
Journal of Modern Education and Computer Science, 8(11), 36.
Makhtar, M., Nawang, H., & WAN SHAMSUDDIN, S. N. (2017). ANALYSIS OF
STUDENTS PERFORMANCE USING NAÏVE BAYES CLASSIFIER. Journal of
Theoretical & Applied Information Technology, 95(16).
58
Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and Predicting
Students’ Academic Performance Using Data Mining Techniques.
International Journal of Modern Education and Computer Science,
8(11), 36–42. doi:10.5815/ijmecs.2016.11.05
Razaque, F., Soomro, N., Shaikh, S. A., Soomro, S., Samo, J. A., Kumar, N., &
Dharejo, H. (2017). Using naïve Bayes algorithm to students' bachelor
academic performances analysis. In 2017 4th IEEE International
Conference on Engineering Technologies and Applied Sciences
(ICETAS) (pp. 1–5). IEEE.
Reddy, C. (2021). Data Mining: Purpose, Characteristics, Benefits &
Limitations. https://content.wisestep.com/data-mining-purpose-
characteristics-benefits-limitations
Thamilselvana, P., & Sathiaseelan, J. G. R. (2015). A Comparative Study of
Data Mining Algorithms for Image Classification. International Journal of
Education and Management Engineering, 5(2), 1.
http://dx.doi.org/10.5815/ijeme.2015.02.01
59
Wang, G., & Miao, J. (2020). Design of data mining algorithm based on rough
entropy for us stock market abnormality. Journal of Intelligent & Fuzzy
Systems, 39(4), 5213-5221.
Watson, R., & Watson, S. (2012). An argument for clarity: What are learning
management systems, what are they not, and what should they
become? TechTrends, 51(2), 28 – 3
Zhenzhen, M., Zhu, J., & Zhang, S. (2021). Probabilistic-based expressions in
behavioral multiattribute decision making considering pre-evaluation.
Fuzzy Optimization and Decision Making, 20(1), 145–173.
http://dx.doi.org/10.1007/s10700-020-09335-