0% found this document useful (0 votes)
26 views59 pages

(Fa) Fianl Research Paper Data Mining..

The document discusses the significance of data mining techniques in the educational sector, particularly through Educational Data Mining (EDM), which analyzes student data to enhance learning experiences and predict performance. It highlights various data mining methods such as artificial neural networks and decision trees, and addresses challenges like data noise and the need for effective data retrieval. The study aims to improve educational outcomes by identifying patterns in student behavior and performance through advanced data analysis techniques.

Uploaded by

shanerabi90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views59 pages

(Fa) Fianl Research Paper Data Mining..

The document discusses the significance of data mining techniques in the educational sector, particularly through Educational Data Mining (EDM), which analyzes student data to enhance learning experiences and predict performance. It highlights various data mining methods such as artificial neural networks and decision trees, and addresses challenges like data noise and the need for effective data retrieval. The study aims to improve educational outcomes by identifying patterns in student behavior and performance through advanced data analysis techniques.

Uploaded by

shanerabi90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 59

1

Student name:

Student Id:

Institution name:

Course:

Date:
2

Topic selected:

Business intelligence

Data mining techniques in database system


3

Abstract:

Data mining has become increasingly important and is being used widely by

many different types of businesses, such as those in the marketing, e-

commerce, e-business, healthcare, and retail industries. It includes a vast

quantity of data related to the education sector, including details about

student performance, the impact of extracurricular activities on student

performance, and the relationship between teacher quality and student

performance. Sadly, we are unable to retrieve the confidential information

from this data. Advanced data mining techniques can be used to find hidden

information in the educational sector. Educational data mining (EDM) is the

practice of analyzing massive amounts of student data with data mining

methods at educational institutions. In order to forecast student

performance, identify incorrect student behavior, classify students, and

model students, the researchers explore whether data mining methodologies

give the most reliable and accurate conclusions. EDM aims to enhance the

learning experience by identifying and resolving problems in the educational

system. This study focuses on data mining techniques like artificial neural
4

networks, decision trees applied to large amounts of educational data, and

database knowledge-finding approaches.

Keywords:

Data mining, Clustering, Pattern Analysis, Educational systems, Web mining,

Web-based educational systems, Classification


5

Table of contents:

Introduction:
----------------------------------------------------------------------------- 6

Problem statement:
---------------------------------------------------------------------- 11

Key objectives:
---------------------------------------------------------------------------- 15

Premise:
------------------------------------------------------------------------------------ 16

Definitions:
--------------------------------------------------------------------------------- 17

Delimitations:
------------------------------------------------------------------------------ 20

Limitations:
--------------------------------------------------------------------------------- 21

Secondary analysis:
----------------------------------------------------------------------- 26

Primary analysis:
-------------------------------------------------------------------------- 30

Findings:
------------------------------------------------------------------------------------- 39
6

Conclusion:
---------------------------------------------------------------------------------- 44

References:
----------------------------------------------------------------------------------- 42
7

Chapter 1:

Introduction:

Business intelligence is the systematic procedure of collecting, refining,

structuring, evaluating, and delivering data to enable its application in

formulating strategic decisions inside an organization. Business intelligence

(BI) gives manager’s insights into their goods and services' usage and

performance patterns.

The term 'data mining' originated in the 1930s and played a prominent role

in Alan Turing's development of a computational machine (Bradley, 2021).

Over time, significant progress has been made in Business Intelligence,

highlighting the imperative nature of extracting information from data

through Information Retrieval1. Data mining is the systematic and rigorous

1
Bradley, V. M. (2021). Learning Management System (LMS) used with

online instruction. International Journal of Technology in

Education, 4(1), 68–92.


8

process of deriving valuable and significant insights from large volumes of

data. Imagine a hypothetical scenario in which a vast amount of data exists;

however, it lacks any means of analysis or examination. Several decades

ago, a prevalent situation emerged which paved the way for the

advancement of technology.

The term "educational data mining" (EDM) describes methods, tools, and

studies intended to automatically derive insights from massive data

repositories from or connected to individuals' educational activities. EDM is a

burgeoning field concerned with creating methods for finding connections

among the distinct and progressively more significant amounts of data

produced by educational domains and then using those methods to get

deeper insights into student behavior and learning 2.

2
Brijesh Kumar, S., & Sourabh, P. (2011). Mining educational data to

analyze students' performance. IJACSA, 2(6).


9

Educational systems are progressively gathering and storing data on user

activities. Statistical analysis, machine learning, and data mining can be

used to study these data (such as trace data, system log data, and massive

data sets) (Brijesh Kumar & Sourabh, 2011).

The development of data analysis computational tools, data recording format

standardization, and increased computation/processing capability are

opening up new study areas for teach scientists to investigate 3.

Learner performance and behavior, subject-matter expertise, test scores,

lesson plans, and practical applications are some areas where EDM can spot

and predict patterns. For example, LMSs record details such as when and for

how long each student views a particular learning object and the total

amount of time the item is visible on their screen. Intelligent tutoring

3
Freeman, R. (2017). The relationship between extracurricular activities

and academic achievement.


10

systems, for example, record details such as the time a solution was

submitted, whether or not it matched the expected solution, how long it took

to submit subsequent solutions, and in what order the solution's components

were entered into the interface (Freeman, 2017). A relatively brief session

(say, 30 minutes) in a computer-based learning environment can provide a

large amount of process data for examination because of the precision and

breadth of this data. Data can also be coarser in other contexts. For

example, a university transcript may detail the student's coursework, final

grades, and the date they declared or changed their major. 4

The following four criteria are what the researchers used to assess the

different EDM implementations (Kumar & Pal, 2011).

4
Kumar, B., & Pal, S. (2011). Mining Educational Data to Analyze

Student's Performance. International Journal of Advanced Computer

Science and Applications, 2(6). doi:10.14569/ijacsa.2011.02060


11

The prediction of the student's performance was the first requirement. This

criterion aims to enhance learning, foresee academic failure in students, and

fortify the educational process. In addition, it helps universities and colleges

plan for the future. According to the article, the most popular methods for

forecasting student performance are Bayesian classification, decision trees,

neural networks, rule-based, and feature selection 5. According to the results

of the research, the Rule-Based model is the most effective prediction model.

Unacceptable student behavior is the focus of the second criterion. Student

misconduct, lack of desire, cheating, and academic failure are just some of

the issues that can be discovered by employing these indicators. 6 The most

widely used techniques in this category include decision trees, neural

6
Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., & Navarro-

Colorado, B. (2019). A systematic review of deep learning approaches

to educational data mining. Complexity, 2019.


12

networks, classification, clustering, outlier identification, and feature

selection (Hernandez-Blanco et al., 2019).

The study on student grouping was the third element that the researchers

discussed. This criterion attempts to group or classify students according to

the qualities of their knowledge. This program is well-liked by stakeholders in

education because it is used in many activities to enhance the learning

process. Function filtering, neural networks, and clustering are the most

often used techniques in this area.

Ultimately, the researchers' assessment was only centered on student

modeling. These characteristics define a number of the student's personality

traits. Cognition, skills, feelings, domain awareness, learning strategies,

achievements, characteristics, learning habits, outcomes, and assessment

are included.
13

Problem statement:

Data was highlighted as being essential to success. In the modern world,

analysis and information extraction from data are utilized to make critical

judgments7. To give you an idea of how challenging information extraction

7
Jung, S., & Huh, J. H. (2019). An Efficient LMS Platform and Its Test

Bed. Electronics, 8(2), 154. Retrieved from


14

could be. Several alternative formats for storing data include relational

databases, data warehouses, transactional databases, and object-relational

databases8. These advancements were solely intended to manage the

massive amounts of data generated and to retrieve and store the data

gathered as required. The different kinds of data that are stored in each of

these databases further complicate issues. Text, multimedia, numerical data,

and other formats can all be stored with it. This is when data mining is

practical (Jung & Huh, 2019).

What happens if the conclusions drawn from the data are false? This query

identifies the most common problem the sector runs into when trying to get

helpful information out of data. Due to the abundance of many data kinds,

the data has noise or incompleteness. Because of the anomalies, this makes

the mining process more difficult and could lead to inaccurate findings.

https://www.mdpi.com/2079-9292/8

8
15

A company's daily data generation averages billions of records. It takes an

effective model to process such large volumes of data and extract

knowledge. There is no guarantee that the data in databases is structured.

Semi-structured, unstructured, and structured data are all mixed in it. Apart

from that, classifying the data according to its nature is essential. Numerous

anomalies and hidden patterns can be found in databases (kumar & Pal,

2019).

Identifying hidden patterns and abnormalities to ensure the recovered data

is accurate. For the future result to be predicted, each variable must be

present. If there are discrepancies in the data, the forecast is invalidated and

has consequences because it leads to wrong decisions. All of these possible

mistakes advanced the idea of data mining9. Data has become essential to

business, and evaluating data manually is no longer feasible. There was an


9
Kumar, B., & Pal, S. (2011). Mining Educational Data to Analyze

Student's Performance. International Journal of Advanced Computer

Science and Applications, 2(6). doi:10.14569/ijacsa.2011.02060


16

urgent need to automate the information retrieval process. Every step of the

process, from gathering data to creating reports, needed to be automated.

Data mining is the process's last stage. Before being designated as

mineable, the data undergoes several preparation phases. Data

preprocessing should be finished before the data mining process is

integrated with database management systems (DBMS) or any other target

data source. The effectiveness of the model is mainly dependent on the

integration schema that is used10.

Determining the usefulness of the data is another issue. The goal of a data

mining process is to produce concrete results, which can be predictive or

descriptive, to aid in making important decisions. Determining if the data will

10
Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and predicting

students' academic performance using data mining

techniques. International Journal of Modern Education and Computer

Science, 8(11), 36.


17

help achieve the intended outcome is crucial. The information derived from

the data should be known before the execution. Each element plays a part in

producing high-quality work and facilitating effective decision-making

(Mueen et al., 2016).

Based on the analysis, the categorization and visualization can be carried

out. These are the essential steps in developing a robust data mining model.

Parameter identification is necessary for models to be accurate and useful.

In summary, the challenges faced while attempting to extract significant and

trustworthy information from data encompass a range of factors, including

but not restricted to:

 The existence of anomalous data in different combinations of the types

and types of data stored in each database.

 Determining the best data sources, techniques for data extraction, and

the kind of data mining strategy to employ (Makhtar et al., 2017).


18

 Recognizing irregularities and discrepancies in data that lead to noisy

and inadequate information11.

Key objectives of EDM:

While improving our understanding of how students learn is the main goal of

EDM, it can also serve a variety of informational purposes for researchers,

instructors, administrators, and students (Mueen et al., 2017).

Delivering pertinent assignments, learner activities, and resources to

maximize student learning is the aim of EDM when students are the center of

attention.12. Web-based educational systems that are intelligent, adaptable,

and real-time utilize goals, preferences, and assessments of student aptitude

to provide the best possible learning content.

11
Makhtar, M., Nawang, H., & WAN SHAMSUDDIN, S. N. (2017). ANALYSIS

OF STUDENTS PERFORMANCE USING NAÏVE BAYES CLASSIFIER. Journal

of Theoretical & Applied Information Technology, 95(16).

12
19

More generally, suggestions about what can be helpful (or harmful) for

learning and engagement in general might be given to students. Examples

include course sequencing, using university facilities, online and social media

behavior, etc.

Obtaining input on the subject matter, mode of delivery, and instruction

organization is the aim while concentrating on teachers 13. These comments

can draw attention to frequent misconceptions and erratic learning styles,

enabling educators to enhance their education methods.

One such use is in figuring out the optimal order of lessons to help pupils

learn.

13
Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and Predicting

Students’ Academic Performance Using Data Mining Techniques.

International Journal of Modern Education and Computer Science,

8(11), 36–42. doi:10.5815/ijmecs.2016.11.05


20

Focusing on administrative personnel can help improve learning

management systems, servers, and user interfaces while also providing a

better understanding of student attendance and retention. Using information

gleaned from a university's LMS, statisticians could build models to foretell

students' levels of participation and persistence (Razaque et al., 2017).

Researchers might also be the focus of EDM 14. Data mining researchers

operate in any of the domains above and develop and assess data mining

approaches for effectiveness.

Premise:

14
Razaque, F., Soomro, N., Shaikh, S. A., Soomro, S., Samo, J. A., Kumar,

N., & Dharejo, H. (2017). Using naïve Bayes algorithm to students'

bachelor academic performances analysis. In 2017 4th IEEE

International Conference on Engineering Technologies and Applied

Sciences (ICETAS) (pp. 1–5). IEEE.


21

The primary purpose of data mining techniques in databases is data

retrieval. Imagine a world where data is abundant, but analysis tools still

need to be improved. A few decades ago, this was the norm, and it paved

the way for technological advances. Therefore, the primary purpose of

research is to introduce techniques to retrieve the data.

Definitions:

Business intelligence:

With the use of modern computing resources, business intelligence (BI) can

be used to analyze data and reveal useful insights for decision-makers at any

level of an organization (Razaque et al., 2017). Organizations collect data

from internal and external IT systems, prepare it for analysis, run queries

against the data, and create data visualizations, BI dashboards, and reports

as part of the BI process to provide analytics results to business users for

operational decision-making and strategic planning 15.

15
Razaque, F., Soomro, N., Shaikh, S. A., Soomro, S., Samo, J. A., Kumar,

N., & Dharejo, H. (2017). Using naïve Bayes algorithm to students'


22

BI is the process of gathering and analyzing data to make more informed

company decisions, which in turn leads to increased profits, smoother

operations, and a competitive advantage. Business intelligence accomplishes

this by fusing together analytics, data management, and reporting

technologies.

Data mining:

Data mining is searching through large data sets for functional patterns and

relationships that may be used to help solve problems in business 16. Thanks

to data mining tools and methodologies, companies can benefit from

improved foresight and decision-making (Makhtar et al., 2017).

Data mining techniques:

Preprocessing techniques, also referred to as data mining techniques, are

simply methods of transforming large-scale data into a format that is


bachelor academic performances

Makhtar, M., Nawang, H., & WAN SHAMSUDDIN, S. N. (2017). ANALYSIS OF


16

STUDENT'S PERFORMANCE USING NAÏVE BAYES CLASSIFIER


23

suitable for data mining. This reduces the possibility of redundancies and

inconsistencies resulting from the large data size, find associations between

the data, normalize the data, and truncates outliers to extract high-quality

data that can be useful for the intended purpose. 17. Two data mining

techniques exist: descriptive and predictive. Each solution has a different

precision and accuracy rate based on the kind of data and the environment.

Cluster analysis:

Using no prior knowledge of labels or information classes, clustering is an

unsupervised technique that identifies related components of large amounts

of data and groups them into clusters. Next, using its descriptive

methodology, the algorithm determines the distance to the center of each

cluster and analyses the findings to find commonalities. K-means,

hierarchical clustering, and density-based spatial clustering of applications

with noise are only a few of the clustering methods that are accessible
17
Reddy, C. (2021). Data Mining: Purpose, Characteristics, Benefits &

Limitations. https://content.wisestep.com/data-mining-purpose-

characteristics-benefits-limitations
24

(Reddy, 2021). However, because it can easily cluster large datasets, K-

means is the most accurate clustering method.

Regression analysis:

Since it shows the links and similarities between different data variables and

presents the data in a linear graph, regression analysis, also known as linear

regression, is one of the most important data mining techniques 18. Lastly, it

predicts the ideal trait using the model's best-fitting line. This is the most

often used approach to statistical analysis and the most fundamental

method among data mining approaches. Finding an equation between the

dependent and independent variable, or variables, if any, is the primary

objective of such procedures (Jung & Huh, 2019).

Classification analysis:

18
Jung, S., & Huh, J. H. (2019). An Efficient LMS Platform and Its Test

Bed. Electronics, 8(2), 154. Retrieved from

https://www.mdpi.com/2079-9292/8
25

Data miners most frequently employ the classification technique, which

assigns pre-categorized instances to create a model that can more broadly

classify the population of other records.

According to Khanna, Singh, and Alam (2016), this technique classifies data

according to their training sets. Then, it uses the sequence to classify

additional new data that is also a training set.

Decision tree:

Massive discrete data groups are split into two halves using the Decision

Tree technique, known as leaf nodes and nodes, where nodes stand for any

variable class contained within the input variable. When dealing with

complex data, decision trees are frequently used to select the best variables

and limit the number of variables. It shows the degree of linkage between

the child nodes and either the parent or leaf nodes. The most vital factors in

the decision tree are found at the top and so on down the tree until it hits the

leaf node (Zhenzhen et al., 2020). The hierarchy of EDM is different from that
26

of traditional data mining, despite the appearance that both types use the

same methodologies.19.

Naïve byes:

Nave Bayes has attracted the attention of numerous academics lately

because of its reliable results and data on practical issues. The foundation of

the Nave Bayes Technique is Bayes' Theorem, a technique for figuring out

the likelihood of an objective given a predictor or the posterior probability

(Hernandez-Blanco et al., 2019). Nave Bayes also yielded the most accurate

result. 74% of the findings produced by the Naive Bayes Classifier were

19
Zhenzhen, M., Zhu, J., & Zhang, S. (2021). Probabilistic-based

expressions in behavioral multiattribute decision making considering

pre-evaluation. Fuzzy Optimization and Decision Making, 20(1), 145–

173. http://dx.doi.org/10.1007/s10700-020-09335-
27

accurate in identifying hidden data between factors that affected student

performance20.

Delimitations of study:

i. An aid to future trend predicting

The data mining framework preserves the components' and structures'

illuminating variables. The fact that it will help identify upcoming predictive

analyses is one of the advantages. It is entirely attainable with individual

behavioral changes and inventiveness.

ii. Making a Decision

In the decision-making process, selecting the appropriate course of action is

crucial. Some data mining techniques can be implemented in any method or

20
Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., & Navarro-

Colorado, B. (2019). A systematic review of deep learning approaches

to educational data mining. Complexity, 2019.


28

technique to find variations and trends in marketing strategies. Therefore, it

will be beneficial down the road.

With technology, almost any data could be easily assessed in the recent

past. Such an invention allows one to make an exact decision about

something mysterious and unexpected.

iii. Rises in Business Income

One process that gets ready for different kinds of innovation processes is

data mining. Decision-making can be generated from the data by translating

it from a business perspective. Numerous corporate sectors and commercial

organizations can apply these techniques 21. Any firm with data and forms can

be broken down; information mining enables us to take important data out of

21
Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and predicting

students' academic performance using data mining

techniques. International Journal of Modern Education and Computer

Science, 8(11), 36.


29

these assets, improve commerce forms, boost proficiency, and boost

productivity (Mueen et al, 2016).

It may concern advertising, marketing, e-commerce, supply chain

management, healthcare, or neighborliness. In essence, data mining aims to

penetrate the display to enable us to make official decisions that will

profoundly influence decisions made in the future (Mueen et al., 2016).

iv. Identifying Fraud

 The most crucial and hazardous thing on the internet is a data

breach22. With data mining, artificial intelligence can categorize and

divide information and intuitively identify fraud and rules within the

data, suggesting patterns and designs that may include extortion-

22
Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and predicting

students' academic performance using data mining

techniques. International Journal of Modern Education and Computer

Science, 8(11), 36.


30

related ones. Data mining makes it simple to identify credit card

transaction fraud. Most information mining components are developed

using market research data.

 Through the use of these investigative techniques, counterfeit items

and acts that are on display can be found.

v. Customer’s preferences

With the usage of data mining procedures, marketing professionals will be

able to comprehend all of the customer data tactics. Since data mining

systems can handle vast amounts of pertinent information, we are

developing data mining approaches that will help track customer habits.

Limitations:

While data mining can be advantageous for an individual or organization

during decision-making, there are significant limitations when utilizing real-

time data. The following are some drawbacks or limitations of the data

mining technique (Reddy, 2021).


31

 The gathering of user data and the application of diverse marketing

strategies within businesses constitute the main components of the

Data Mining process.23.

 Due to the large volume of user data these analysis techniques handle,

there is a significant chance that user privacy will be infringed and

data will be compromised.

 There are no limitations on the amount of user data used for analysis

and marketing using these tactics24

23
Reddy, C. (2021). Data Mining: Purpose, Characteristics, Benefits &

Limitations. https://content.wisestep.com/data-mining-purpose-

characteristics-benefits-limitations

24
Razaque, F., Soomro, N., Shaikh, S. A., Soomro, S., Samo, J. A., Kumar,

N., & Dharejo, H. (2017). Using naïve Bayes algorithm to students'

bachelor academic performances analysis. In 2017 4th IEEE

International Conference on Engineering Technologies and Applied

Sciences (ICETAS) (pp. 1–5). IEEE.


32

 Because this involves gathering massive amounts of user data and

storing it in conventional databases or cloud-based systems, there is a

significant risk of data misuse.

 The data mining process guarantees data accuracy for individuals or

teams inside an organization, regardless of limitations (Razaque et al.,

2017).

 One of the main obstacles to data mining is security. Data theft is a

significant risk because these firms keep many customer data, like

SSNs, addresses, phone numbers, etc.

 Any company can have a data breach; big businesses like Ford,

Equifax, CapitalOne, and others have recently suffered.

 For businesses, the initial cost of setting up the infrastructure for data

mining will be high. However, this expense will be covered as the

companies advance with data mining technologies or frameworks that

use data mining technologies.


33

 These data are fully accessible to governments for better governance

strategies.

 Everything related to a client, including chat conversations, is fully

accessible to the government

 The government can obtain information about a person at any time

and keep tabs on every aspect of that person's social and personal life.

Essential members of society have expressed significant concern and

disagreement over this.25

 The solution to these issues lies in transparent policies between the

government, organizations, and users (Razaque et al., 2017).

 Governments, corporations, and individuals can all gain immensely

from data mining; yet, if these benefits are not properly handled, they

can raise serious issues with user data security and privacy.

Chapter 2
Razaque, F., Soomro, N., Shaikh, S. A., Soomro, S., Samo, J. A., Kumar, N., &
25

Dharejo, H. (2017). Using naïve Bayes algorithm to students' bachelor


academic performances analysis.
34

Secondary analysis of data:

A significant surge in the volume of data and information within companies

characterizes the contemporary era of information technology. The vast bulk

of the material is accessible in digital format and is stored within a

substantial database. Data mining plays a crucial role in identifying and

extracting algorithms, correlations, and patterns within extensive datasets,

facilitating the ability to forecast future outcomes. Organizations that employ

this method get enhancements in their revenues, reductions in their

expenses, and mitigations of their investment risk 26.

Data mining, which involves extracting information and developing skills

related to large-scale data sets, is widely acknowledged as a significant

26
Zhenzhen, M., Zhu, J., & Zhang, S. (2021). Probabilistic-based

expressions in behavioral multiattribute decision making considering

pre-evaluation. Fuzzy Optimization and Decision Making, 20(1), 145–

173. http://dx.doi.org/10.1007/s10700-020-09335-
35

research topic within database systems and machine learning. The industry

in question holds considerable importance for the business, as it possesses

the capacity to generate revenue (Zhenzhen et al., 2021). Data mining has

garnered significant attention and intrigue among academics and scholars

across several academic disciplines. The advent of data-providing services,

such as data warehousing and internet services, has facilitated the

comprehension of user behavior to enhance service provision and capitalize

on business prospects. The organization can receive several forms of media,

including text, audio files, video, and photographs (Zhenzhen et al., 2021).

The fundamental objective of data mining is to acquire data, perform data

cleansing, and employ contemporary data mining methodologies to enhance

decision-making processes. Its ongoing growth and dynamic nature

characterize data mining, as numerous scholars have documented the

emergence of innovative methods. Providing a comprehensive overview of

data mining techniques and applications within the confines of concise

research is challenging. Data mining is a relatively nascent and promising


36

area of investigation that presents numerous unanswered issues and

obstacles. This novel research question warrants further exploration 27.

This study paper offers a comprehensive viewpoint on data mining

challenges and techniques, as seen through the lens of a database

researcher. Various data mining methodologies have been extensively

studied and contrasted in academic studies. This study article examines the

incorporation and influence of data mining processes inside relational data

set frameworks. It explores the accessibility of data mining services in

databases and identifies the appropriate tools required for implementation .


28

27
Zhenzhen, M., Zhu, J., & Zhang, S. (2021). Probabilistic-based

expressions in behavioral multiattribute decision making considering

pre-evaluation. Fuzzy Optimization and Decision Making, 20(1), 145–

173. http://dx.doi.org/10.1007/s10700-020-09335-

28
Jung, S., & Huh, J. H. (2019). An Efficient LMS Platform and Its Test

Bed. Electronics, 8(2), 154. Retrieved from

https://www.mdpi.com/2079-9292/8
37

Furthermore, the paper highlights the significant advantages and

disadvantages of data mining across various data sets. This inquiry has

demonstrated the integration of database technology with data mining

strategies (Jung & Huh, 2019).

A common practice in many economic areas, data mining concentrates on

specifics and potential data sequences from enormous amounts of data that

are impossible to explain manually. In addition, several data mining methods

vary in efficiency and usability, including regression, classification,

clustering, association rules, time series analysis, etc. The following paper

will cover decision trees, Nave Bayes, classification analysis, regression

analysis, cluster analysis, and association rule. Using data mining techniques

facilitates teaching and indicates that data mining techniques are an

excellent way to anticipate students' academic progress. Khan & Ghosh

(2020) emphasized that to provide the education sector with more accurate

and dependable data, it is imperative to understand the factors that may


38

impact a student's overall academic performance 29. Abizada et al. (2020)

considered extracurricular activities to be one factor that can influence

students' academic performance. Students' overall academic success is

influenced by whole teaching quality, which includes monitoring, student-

tutor contact, and other elements. Defined attributes provide researchers

with a conceptual understanding of the algorithms as a last step. After that,

the data will be examined using a suitable data mining technique, producing

forecasts of students' academic success that are reliable and trustworthy.

29
Khan, A., & Ghosh, S. K. (2020). Student performance analysis and

prediction in classroom learning: A review of educational data mining

studies. Education and Information Technologies, 26(1), 205– 240.

doi:10.1007/s10639-020-10230-3
39

Chapter 3

Primary analysis of data:

These days, there are a lot of data mining frameworks available. While some

data mining systems are more comprehensive and flexible, others are

specific frameworks dedicated to a particular information source or limited to

specific data mining features.

To generate more systematic results, data mining is a recursive process that

clarifies the mining interaction and coordinates a large amount of new

information. Practical, flexible, and adaptive data analytics are necessary for

data mining30. It could be considered a standard assessment of data

30
Zhenzhen, M., Zhu, J., & Zhang, S. (2021). Probabilistic-based

expressions in behavioral multiattribute decision making considering

pre-evaluation. Fuzzy Optimization and Decision Making, 20(1), 145–

173. http://dx.doi.org/10.1007/s10700-020-09335-
40

innovation. Data readiness and data mining errands complete the data

mining process as an information detection method (Zhenzhen et al., 2021).

Any information, including databases and sophisticated data sets, such as

time schedules, can be the basis for data mining cycles. There are unique

difficulties with the data mining measure. Both obscure and potentially

helpful information can be revealed through data mining.

Regardless of how new, valuable, or fascinating the data is, it is abstract and

depends on the client and application. Data mining undoubtedly can produce

a very high number of patterns. Sometimes, it is easy to expand the number

of ways to a large number. To lessen the larger-than-usual information data

mining results, most organizations can even think about doing a metadata

mining stage. to minimize the quantity of found rules and techniques that

are more likely to be of little interest to the individual who has to assign a

value to the examples (Watson & Watson, 2012).

Evaluation of mined information and the Knowledge Discovery in database

interaction depends on identifying and quantifying the fascinating quality of


41

methods and patterns found or yet to be found. On the other hand, some

current estimates suggest that a major investigation issue is the intriguing

quality of the found data31.

Thanks to data mining, numerous techniques are available for identifying

hidden patterns in vast amounts of data. Data mining can be done using

various techniques and algorithms, but which ones are used depends on the

application. Predictive data mining techniques are appropriate when we have

a particular objective that we would instead anticipate regarding your

information. Combining data mining techniques with online analytical

processing is desirable for obtaining valuable data from big business

datasets.

31
Watson, R., & Watson, S. (2012). An argument for clarity: What are

learning management systems, what are they not, and what should

they become? TechTrends, 51(2), 28 – 3


42

This paper identifies and tackles the most prevalent challenges when

extracting accurate and valuable information from collected data. The study

focuses on the issues that arise and the suitable data mining methods that

can apply to the available data type and the intended outcome of data

mining. Based on the results of valuable information obtained through data

mining, data mining techniques will help organizations make decisions

(Wang & Miao, 2020). We looked into the different issues that arose when

extracting data from extensive databases using data mining, and we found

some restrictions and boundaries, in addition to problem statements, in our

paper. Organizations must adhere to specific security protocols to ensure

data extraction occurs without cyberattack vulnerability 32. Together, we

come up with some sound hypotheses to address the problems associated

with data mining, and we keep looking for more workable approaches to

32
Wang, G., & Miao, J. (2020). Design of data mining algorithm based on

rough entropy for us stock market abnormality. Journal of Intelligent &

Fuzzy Systems, 39(4), 5213-5221.


43

employing different data mining techniques to extract valuable information

from massive datasets in businesses.

Data mining application:

Data mining has recently alleviated stress and workload in various areas.

Data mining is a tool used by large corporations to reduce labor costs and

increase profits. It also has a significant impact on the business community

(Thamilselvana & Sathiaseelan, 2015). Data mining is widely used in

education to enhance and add value to the educational process. As a result,

data mining has a wide range of applications and operates in various

departments and sectors33.

Educational data mining:

33
Thamilselvana, P., & Sathiaseelan, J. G. R. (2015). A Comparative

Study of Data Mining Algorithms for Image Classification. International

Journal of Education and Management Engineering, 5(2), 1.

http://dx.doi.org/10.5815/ijeme.2015.02.01
44

Educational data mining is the product of incorporating and employing data

mining techniques in the educational sector with student educational data

like grades and attendance. In addition, Educational Data Mining provided

accurate projections of student performance and behavior. Data analysis and

visualization, student behavior detection, classifying students, social network

analysis, instructor feedback, student recommendations, grade prediction,

and content development were all activities described by Educational Data

Mining.

Taxonomy & application of educational data mining:

A set of standardized steps are followed in the EDM process to achieve the

desired roles. First, unprocessed educational data is gathered from academic

surveys, registration, and learning management systems. Next, the collected

data will be preprocessed, with null data truncated. The intended objective of

the EDM is then addressed using the proper data mining techniques, such as

weather predictors or descriptors (Abu Saa et al., 2019). Although the same
45

data mining techniques may be used in EDM and traditional data mining, the

hierarchy of EDM distinguishes EDM from conventional data mining 34.

Academic performance:

Academic performance is defined as the educational outcome of a student,

learner, tutor, or institution in meeting their educational objectives 35.

Furthermore, academic performance is an essential key criterion for

determining the level of education to make amends and corrective actions.

34
Abu Saa, A., Al-Emran, M., & Shaalan, K. (2019). Factors Affecting Students’

Performance in Higher Education: A Systematic Review of Predictive Data

Mining Techniques. Technology, Knowledge and Learning, 24(4), 567–598.

doi:10.1007/s10758-019-094

35
Adnan, K., & Akbar, R. (2019). An analytical study of information

extraction from unstructured and multidimensional big data. J Big Data

6, 91. https://doi.org/10.1186/s40537-019- 0254-8


46

Furthermore, accurately predicting low academic performance is critical for

assisting those in need.

Factors affecting student academic process:

In order to process and predict data efficiently, it is imperative to identify the

various aspects that influence student academic performance in several

complex ways. A few factors affecting a student's performance are their age,

previous schooling, neighborhood, and behavior (Adnan & Akbar, 2019).

Other potential influences include parent occupation, student demographics,

behavior, and social background. Additionally, the quality of instruction could

impact student achievement, which could indirectly affect students'

motivation, satisfaction, and other aspects of the course.

Additionally, prior knowledge of the subject matter could likely have an

impact on how well pupils perform academically. There is a favorable

correlation between students' course achievement. Lastly, it has been

demonstrated that extracurricular activities may impact students' academic

achievement, and a correlation between students' participation in


47

extracurricular activities and academic achievement has been found. 36.

However, the only elements highlighted as impacting students' academic

achievement were extracurricular activities and the quality of the instruction

they received (Adnan & Akbar, 2019). These data types could be easily

gathered in similar prerequisites, which implied prior knowledge of the

courses or, in this case, the location.

Extracurricular activities affecting student academic process:

A positive association exists between participation in extracurricular

activities and language and mathematics scores. Furthermore, the number

of clubs joined reliably predicts academic outcomes 37

36
Adnan, K., & Akbar, R. (2019). An analytical study of information

extraction from unstructured and multidimensional big data. J Big Data

6, 91. https://doi.org/10.1186/s40537-019- 0254-8

37
Freeman, R. (2017). The relationship between extracurricular activities

and academic achievement.


48

The Influence of Teaching Quality on Students' Academic

Performance

The quality of teaching in terms of connection with students, developing a

supportive relationship, and offering enough assistance and feedback were

studied to determine the relationship between them and overall satisfaction

with the tutoring process (Freeman, 2017). As expected, a positive

relationship was extracted from the correlation between the quality of

teaching and students' academic performance, demonstrating the critical

role of education quality in overall student satisfaction with the taught

material, which affects student performance.


49
50

Effect of educational data mining on academic performance


51

Chapter 4:

Findings:

Data classification techniques to predict student performance:

The classification technique uses the data at hand to forecast a model. This

makes it the most appropriate method for analyzing educational data, and it

is how the educational sector adapts it by using the academic performance

of older students as a criterion to forecast students' future academic

performance (Hernandez-Blanco et al., 2019). To be clear, most search

studies focus on using a classification approach for forecasting based on

admissions data, the progression of individual course students, grade

inflation, the anticipated percentage of failing students, and aiding in the

grading system. The classification was our choice because educational data

mining classification approaches aim to identify the factors that most impact
52

student achievement.38. Type aids in the final grade classification of

students.

Decision tree classifier to predict the performance of students:

Exploring and classifying student educational data into decision trees

enhances students' overall academic achievements by facilitating decision-

making by highlighting the educational data analysis of the students that

predicts their academic performance. It is appropriate for the educational

sector with an emphasis on improving students' academic performance

because, as per Hamsa et al. (2017), their results using a decision tree

approach increased the at-risk students, motivating tutors to focus on the

teaching process and thereby improving the quality of teaching.

Naïve Bayer classifier to predict the performance of students:

38
Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., & Navarro-

Colorado, B. (2019). A systematic review of deep learning approaches

to educational data mining. Complexity, 2019.


53

Makhatar et al. (2017) successfully retrieved hidden data between attributes

that compromise students' academic achievements and then looked into the

potential use of Nave Bayes as a trustworthy classifier/predictor to enhance

academic performance. Student performance on final exams was defined by

hidden data extracted using the Nave Bayes classifier 39. Consequently, this

method helped identify students who needed more attention, predicted

dropout rates, and enabled tutors to step in and provide the necessary

support.40. EDMin's support and trustworthy data on students' academic

39
Makhtar, M., Nawang, H., & WAN SHAMSUDDIN, S. N. (2017). ANALYSIS

OF STUDENTS PERFORMANCE USING NAÏVE BAYES CLASSIFIER. Journal

of Theoretical & Applied Information Technology, 95(16).

40
Hamsa, H., Indiradevi, S., & Kizhakkethottam, J. J. (2016). Student

academic performance prediction model using decision tree and fuzzy

genetic algorithm. Procedia Technology, 25, 326-332.


54

performance make Nave Bayes a suitable approach for the educational

sector (Hamsa et al., 2016).

Conclusion:

Academic data derived from a face-to-face (traditional) tutoring system or a

virtual one using information stored in LMS using EDM, such as grades,

attendance, online quizzes, or even feedback on the quality of material and

education, are attributes that measure academic performance to build

relevant and efficient educational data mining applications. All of these

relevant attributes aid in forming a conceptual and technical understanding

to build relevant and efficient educational data mining applications. In order

to construct a user knowledge model, the student's knowledge and the

caliber of their studies—that is, the duration of their study sessions or the

number of correct answers they provide—must be considered. Furthermore,

altitude and absenteeism help develop a User Behaviour Model that helps

predict academic progress.41. The educational system has recently


41
Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., & Navarro-

Colorado, B. (2019). A systematic review of deep learning approaches


55

emphasized two factors—internal assessments and completed semester

exams—that it believes improve the veracity of outcomes regarding

academic performance. The tutor administers internal assessments through

homework, participation, and the outcomes of in-class exams (Hernandez-

Blanco et al., 2019).

References:

Abu Saa, A., Al-Emran, M., & Shaalan, K. (2019). Factors Affecting Students’

Performance in Higher Education: A Systematic Review of Predictive

Data Mining Techniques. Technology, Knowledge and Learning, 24(4),

567–598. doi:10.1007/s10758-019-094

Adnan, K., & Akbar, R. (2019). An analytical study of information extraction

from unstructured and multidimensional big data. J Big Data 6, 91.

https://doi.org/10.1186/s40537-019- 0254-8

to educational data mining. Complexity, 2019.


56

Bradley, V. M. (2021). Learning Management System (LMS) used with online

instruction. International Journal of Technology in Education, 4(1), 68–

92.

Brijesh Kumar, S., & Sourabh, P. (2011). Mining educational data to analyze

students' performance. IJACSA, 2(6).

Freeman, R. (2017). The relationship between extracurricular activities and

academic achievement.

Hamsa, H., Indiradevi, S., & Kizhakkethottam, J. J. (2016). Student academic

performance prediction model using decision tree and fuzzy genetic

algorithm. Procedia Technology, 25, 326-332.

Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., & Navarro-Colorado, B.

(2019). A systematic review of deep learning approaches to

educational data mining. Complexity, 2019.


57

Jung, S., & Huh, J. H. (2019). An Efficient LMS Platform and Its Test Bed.

Electronics, 8(2), 154. Retrieved from https://www.mdpi.com/2079-

9292/8

Kumar, B., & Pal, S. (2011). Mining Educational Data to Analyze Student's

Performance. International Journal of Advanced Computer Science and

Applications, 2(6). doi:10.14569/ijacsa.2011.02060

Khan, A., & Ghosh, S. K. (2020). Student performance analysis and prediction

in classroom learning: A review of educational data mining studies.

Education and Information Technologies, 26(1), 205– 240.

doi:10.1007/s10639-020-10230-3

Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and predicting students'

academic performance using data mining techniques. International

Journal of Modern Education and Computer Science, 8(11), 36.

Makhtar, M., Nawang, H., & WAN SHAMSUDDIN, S. N. (2017). ANALYSIS OF

STUDENTS PERFORMANCE USING NAÏVE BAYES CLASSIFIER. Journal of

Theoretical & Applied Information Technology, 95(16).


58

Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and Predicting

Students’ Academic Performance Using Data Mining Techniques.

International Journal of Modern Education and Computer Science,

8(11), 36–42. doi:10.5815/ijmecs.2016.11.05

Razaque, F., Soomro, N., Shaikh, S. A., Soomro, S., Samo, J. A., Kumar, N., &

Dharejo, H. (2017). Using naïve Bayes algorithm to students' bachelor

academic performances analysis. In 2017 4th IEEE International

Conference on Engineering Technologies and Applied Sciences

(ICETAS) (pp. 1–5). IEEE.

Reddy, C. (2021). Data Mining: Purpose, Characteristics, Benefits &

Limitations. https://content.wisestep.com/data-mining-purpose-

characteristics-benefits-limitations

Thamilselvana, P., & Sathiaseelan, J. G. R. (2015). A Comparative Study of

Data Mining Algorithms for Image Classification. International Journal of

Education and Management Engineering, 5(2), 1.

http://dx.doi.org/10.5815/ijeme.2015.02.01
59

Wang, G., & Miao, J. (2020). Design of data mining algorithm based on rough

entropy for us stock market abnormality. Journal of Intelligent & Fuzzy

Systems, 39(4), 5213-5221.

Watson, R., & Watson, S. (2012). An argument for clarity: What are learning

management systems, what are they not, and what should they

become? TechTrends, 51(2), 28 – 3

Zhenzhen, M., Zhu, J., & Zhang, S. (2021). Probabilistic-based expressions in

behavioral multiattribute decision making considering pre-evaluation.

Fuzzy Optimization and Decision Making, 20(1), 145–173.

http://dx.doi.org/10.1007/s10700-020-09335-

You might also like