Computer Science Students Academic Performance Prediction Using Ai[1]
Computer Science Students Academic Performance Prediction Using Ai[1]
ABSTRACT
Students performance is one of the key success factors in educational
institutions. Understanding the performance of students help identify issues,
enabling real-time action where and when necessary. Further, early
identification and improvement of students academic performance at all levels
has been a major challenge in these institutions. Students experience some
difficulties which impair their study and negatively impact their academic
performance. These issues efficiently addressed if students' data is pre-analysed,
and students' performance predicted early to allow immediate decisions on
support. Early prediction by educators and policy makers assist in improving
student and class performance. This work applied machine learning (ML)
algorithms to analyse significant factors that influence students academic
performance, which could be used to inform decisions on support, or to identify
and notify the students who require assistance thus, taking effective steps to
improving their performance. Predicting academic outcomes is complex and
influenced by factors like socioeconomic background, motivation, student
academic performance. The highlights several challenges, including a lack of
standardization in performance metrics, limited model generalizability, and
potential bias in training data. It also notes the impact of individual and
environmental factors on academic performance, emphasizing the role of
instructors and policymakers in improving educational outcomes. Student
academic success.
CHAPTER 1
INTRODUCTION
1.1 History
Many studies in the learning field investigated the ways of applying machine
learning techniques for various educational purposes. One of the focuses of
these studies is to identify high-risk students, as well as to identify features that
affect the performance of students. Students are the major strength of numerous
universities. Universities and students play a significant part in producing
graduates superior with academic performance accomplishments. However,
academic performance achievement changes as various sorts of students may
have diverse degrees of performance achievement. Machine learning is the
ability of a system to consequently gain from experience and improve
performance. Nowadays machine learning for education is gaining more
attention. Machine learning is used for analyzing information based on
experience and predicting future performance.
The primary objective of this research is to develop a predictive model that can
accurately forecast student academic performance based on a comprehensive set
of data. By identifying the key factors that influence student success, educators
can implement targeted interventions and personalized learning strategies to
support students throughout their educational journey. Identifying the most
relevant variables that impact student performance - Collecting and processing
large datasets from diverse sources - Developing accurate and reliable
predictive models Ensuring the ethical and responsible use of student data.
Using ML
LITERATURE SURVEY
Nguyen Thai-Nghe, Andre Busche, and Lars Schmidt- Thieme [3] have
implemented system gaining knowledge of strategies to enhance the
prediction effects of instructional performances for 2 the actual case studies.
Three strategies were used to address the magnificence imbalance hassle and
they all display high-quality effects. They first re balanced the datasets after
which used each cost-insensitive and touchy gaining knowledge of with
SVM for the small datasets and with Decision Tree for the bigger datasets.
The fashions are to begin with deployed at the web
Vi et al labored [7] with the image facts from Computer Science college
students programming path of Helsinki University and attempted to are
expecting whether or not a scholar will fail introductory arithmetic path.
Bayer et. al [8] expected whether or not a bachelor scholar will drop-out
from university. They with the facts of Applied Informatics bachelor college
students from Masaryk University and predicted pupil’s studies, sports
activities with special university college students thru email or discussion.
They located college students who talk with college students having proper
grades can effectively graduate with a better opportunity than college
students with comparable overall performance however now no longer
speaking with a hit collage student this case, J48 choice tree learner, IB1 lazy
learner, PART rule learner, SMO guide vector machines were used.
Bhardwaj and Pal [9] predicts college student overall performance and
located out dwelling place has excessive have a on college students very last
grade. They used University Department of Computer Applications
scholar’s facts and used Bayesian Classifier for predicting.
studied the C++ path in Yarmouk University, Jordan Three [10] exclusive
type strategies specifically ID3, C4 five and the Naïve Bayes are used.
effects indicated that higher prediction than different fashions.
CHAPTER 3
EXISTING ALGORITHM
1. Objective:
The primary goal is to develop a system that can analyze various factors
influencing academic performance and predict outcomes based on historical
data.
2.Data Collection:
3. Data Preprocessing:
5. Model Selection:
- Linear Regression
- Random Forests
- Neural Networks
6. Model Training:
Training Data: Split the dataset into training and testing sets to evaluate the
model's performance.
7. Model Evaluation:
Metrics:- Use evaluation metrics such as accuracy, precision, recall, and F1-
score to assess the model's performance.
8. Implementation:
User Interface:- Design a user-friendly interface for students and educators to
input data and receive predictions.
9. Deployment:
Bias and Fairness: Ensure the model is free from bias and does not unfairly
disadvantage any group of students.
Privacy:- Safeguard personal data and comply with data protection regulations.
Conclusion
2. Limited Features
3. Modeling Limitations
Overfitting:- Complex models can overfit the training data, resulting in poor
generalization to new or unseen data.
5. Ethical Concerns
Privacy Issues:- Handling sensitive student data raises privacy concerns, which
can affect data availability and ethical compliance.
Equity and Fairness:- Without careful design, predictions can reinforce existing
biases and inequalities in the academic environment.
7. Integration Challenges
generic Predictions:- Predictions may lack the depth needed for personalized
recommendations, limiting their usefulness for individual student support.
Conclusion
While traditional systems for predicting academic performance using AI and
ML can provide valuable insights, they also face significant limitations.
Addressing these challenges requires a comprehensive approach that combines
robust data practices, algorithmic fairness, user engagement strategies, and
ethical considerations to enhance the effectiveness of the predictive system.
CHAPTER 4
PROPOSED SYSTEM
4.1 Overview
Objective:
Key Components:
1. Data Collection:
2. Data Preprocessing:
Clean and preprocess the data to handle missing values, outliers, and
inconsistencies.
3. Model Selection:
Linear Regression
Decision Trees
Random Forests
Support Vector Machines (SVM)
Neural Networks
Split the dataset into training and testing sets to evaluate model performance.
7. Impact Assessment:
Gather feedback from educators to refine the model and improve its predictive
capabilities.
Benefits
Challenges
2.Data Collection
Data Cleaning:
o Handle Missing Values: Fill in missing values with the mean, median,
or mode for numerical data, and use common or frequent labels for
categorical data.
Data Transformation:
Data Reduction:
Train-Test Split: Split data into training and testing sets (typically 70/30
or 80/20).
Regression Task: If you want to predict a continuous variable like final grade
(on a percentage or GPA scale), use regression models.
4. Model Training
Data Splitting: Split the data into training and test sets (e.g., 80/20 split). You
might also reserve a portion for validation if using deep learning models.
Regression Metrics: For regression tasks, use Mean Absolute Error (MAE),
Mean Squared Error (MSE), and R² score.
Key Concepts:
Linear SVM: Involves finding the optimal hyperplane that separates the data
into two classes with the maximum margin.
Non-linear SVM: Uses kernel tricks (e.g., Radial Basis Function - RBF) to
map data into higher-dimensional space to make it linearly separable.
Support Vectors: Data points that lie closest to the decision boundary, which
directly influence the positioning of the hyperplane.
Margin: The distance between the support vectors and the decision boundary.
A larger margin is considered better as it generally leads to better
generalization.
Key Concepts:
Neurons: Basic units that take in inputs, apply weights, and pass the output
through an activation function.
Layers: Neurons are organized into layers—input, hidden, and output layers.
Activation Function: Functions like Sigmoid, ReLU, and Tanh that introduce
non-linearity to help the network learn complex patterns.
Backpropagation: A technique used to train the network by minimizing error
through gradient descent.
4.5 Proposed Algorithm:
Use Case: Used for both classification and regression and performs well on
structured/tabular data.
Use Case: Suitable for both classification and regression tasks when there is
a large amount of data and non-linear relationships among features.
Use Case: Best for categorical prediction tasks (e.g., high, medium, low
performance).
BLOCK DIAGRAM
CHAPTER 6
SOFTWARE ENVIRONMENTS
What is Python?
The biggest strength of Python is huge collection of standard library which can
be used for the following –
Machine Learning
Test frameworks
Multimedia
Advantages of Python
1. Extensive Libraries
Python downloads with an extensive library and it contain code for various
purposes like regular expressions, documentation-generation, unit-testing, web
browsers, threading, databases, CGI, email, image manipulation, and more. So,
we don’t have to write the complete code for that manually.
2. Extensible
As we have seen earlier, Python can be extended to other languages. You can
write some of your code in languages like C++ or C. This comes in handy,
especially in projects.
3. Embeddable
4. Improved Productivity
5. IOT Opportunities
Since Python forms the basis of new platforms like Raspberry Pi, it finds the
future bright for the Internet Of Things. This is a way to connect the language
with the real world.
When working with Java, you may have to create a class to print ‘Hello World’.
But in Python, just a print statement will do. It is also quite easy to learn,
understand, and code. This is why when people pick up Python, they have a
hard time adjusting to other more verbose languages like Java.
7. Readable
Because it is not such a verbose language, reading Python is much like reading
English. This is the reason why it is so easy to learn, understand, and code. It
also does not need curly braces to define blocks, and indentation is mandatory.
This further aids the readability of the code.
8. Object-Oriented
Like said earlier, Python is freely available. But not only can you download
Python for free, but you can also download its source code, make changes to it,
and even distribute it. It downloads with an extensive collection of libraries to
help you with your tasks.
10. Portable
When you code your project in a language like C++, you may need to make
some changes to it if you want to run it on another platform. But it isn’t the
same with Python. Here, you need to code only once, and you can run it
anywhere. This is called Write Once Run Anywhere (WORA). However, you
need to be careful enough not to include any system-dependent features.
11. Interpreted
Lastly, will say that it is an interpreted language. Since statements are executed
one by one, debugging is easier than in compiled languages.
Any doubts till now in the advantages of Python? Mention in the comment
section.
1. Less Coding
Almost all of the tasks done in Python requires less coding when the same task
is done in other languages. Python also has an awesome standard library
support, so you don’t have to search for any third-party libraries to get your job
done. This is the reason that many people suggest learning Python to beginners.
2. Affordable
The 2019 Github annual survey showed us that Python has overtaken Java in
the most popular programming language category.
Disadvantages of Python
So far, we’ve seen why Python is a great choice for your project. But if you
choose it, you should be aware of its consequences as well. Let’s now see the
downsides of choosing Python over another language.
1. Speed Limitations
We have seen that Python code is executed line by line. But since Python is
interpreted, it often results in slow execution. This, however, isn’t a problem
unless speed is a focal point for the project. In other words, unless high speed is
a requirement, the benefits offered by Python are enough to distract us from its
speed limitations.
The reason it is not so famous despite the existence of Brython is that it isn’t
that secure.
3. Design Restrictions
As you know, Python is dynamically-typed. This means that you don’t need to
declare the type of variable while writing the code. It uses duck-typing. But
wait, what’s that? Well, it just means that if it looks like a duck, it must be a
duck. While this is easy on the programmers during coding, it can raise run-time
errors.
5. Simple
No, we’re not kidding. Python’s simplicity can indeed be a problem. Take my
example. I don’t do Java, I’m more of a Python person. To me, its syntax is so
simple that the verbosity of Java code seems unnecessary.
This was all about the Advantages and Disadvantages of Python Programming
Language.
History of Python
What do the alphabet and the programming language Python have in common?
Right, both start with ABC. If we are talking about ABC in the Python context,
it's clear that the programming language ABC is meant. ABC is a general-
purpose programming language and programming environment, which had been
developed in the Netherlands, Amsterdam, at the CWI (Centrum Wiskunde
&Informatica). The greatest achievement of ABC was to influence the design of
Python. Python was conceptualized in the late 1980s. Guido van Rossum
worked that time in a project at the CWI, called Amoeba, a distributed operating
system. In an interview with Bill Venners1, Guido van Rossum said: "In the
early 1980s, I worked as an implementer on a team building a language called
ABC at Centrum voor Wiskunde en Informatica (CWI). I don't know how well
people know ABC's influence on Python. I try to mention ABC's influence
because I'm indebted to everything I learned during that project and to the
people who worked on it. "Later on in the same Interview, Guido van Rossum
continued: "I remembered all my experience and some of my frustration with
ABC. I decided to try to design a simple scripting language that possessed some
of ABC's better properties, but without its problems. So I started typing. I
created a simple virtual machine, a simple parser, and a simple runtime. I made
my own version of the various ABC parts that I liked. I created a basic syntax,
used indentation for statement grouping instead of curly braces or begin-end
blocks, and developed a small number of powerful data types: a hash table (or
dictionary, as we call it), a list, strings, and numbers."
Guido Van Rossum published the first version of Python code (version 0.9.0) at
alt.sources in February 1991. This release included already exception handling,
functions, and the core data types of list, dict, str and others. It was also object
oriented and had a module system.
Python version 1.0 was released in January 1994. The major new features
included in this release were the functional programming tools lambda, map,
filter and reduce, which Guido Van Rossum never liked. Six and a half years
later in October 2000, Python 2.0 was introduced. This release included list
comprehensions, a full garbage collector and it was supporting unicode. Python
flourished for another 8 years in the versions 2.x before the next major release
as Python 3.0 (also known as "Python 3000" and "Py3K") was released. Python
3 is not backwards compatible with Python 2.x. The emphasis in Python 3 had
been on the removal of duplicate programming constructs and modules, thus
fulfilling or coming close to fulfilling the 13th law of the Zen of Python: "There
should be one -- and preferably only one -- obvious way to do it."Some changes
in Python
There is only one integer type left, i.e., int. long is int as well.
The division of two integers returns a float instead of an integer. "//" can be
used to have the "old" behaviour.
Purpose
Python
Python is Interactive − you can actually sit at a Python prompt and interact
with the interpreter directly to write your programs.
NumPy
Besides its obvious scientific uses, NumPy can also be used as an efficient
multi-dimensional container of generic data. Arbitrary datatypes can be defined
using NumPy which allows NumPy to seamlessly and speedily integrate with a
wide variety of databases.
Pandas
Matplotlib
Python is Interactive − you can actually sit at a Python prompt and interact
with the interpreter directly to write your programs.
There have been several updates in the Python version over the years. The
question is how to install Python? It might be confusing for the beginner who is
willing to start learning Python but this tutorial will solve your query. The latest
or the newest version of Python is version 3.7.4 or in other words, it is Python.
Before you start with the installation process of Python. First, you need to know
about your System Requirements. Based on your system type i.e. operating
system and based processor, you must download the python version. My system
type is a Windows 64-bit operating system. So the steps below are to install
python version 3.7.4 on Windows 7 device or to install Python 3. Download the
Python Cheatsheet here.The steps on how to install Python on Windows 10, 8
and 7 are divided into 4 parts to help understand better.
Now, check for the latest and the correct version for your operating system.
Step 3: You can either select the Download Python for windows 3.7.4 button in
Yellow Color or you can scroll further down and click on download with
respective to their version. Here, we are downloading the most recent python
version for windows 3.7.4
Step 4: Scroll down the page until you find the Files option.
Step 5: Here you see a different version of python along with the operating
system.
To download Windows 32-bit python, you can select any one from the three
options: Windows x86 embeddable zip file, Windows x86 executable
installer or Windows x86 web-based installer.
To download Windows 64-bit python, you can select any one from the three
options: Windows x86-64 embeddable zip file, Windows x86-64 executable
installer or Windows x86-64 web-based installer.
Here we will install Windows x86-64 web-based installer. Here your first part
regarding which version of python is to be downloaded is completed. Now we
move ahead with the second part in installing python i.e. Installation
Note: To know the changes or updates that are made in the version you can click
on the Release Note Option.
Installation of Python
Step 1: Go to Download and Open the downloaded python version to carry out
the installation process.
Step 2: Before you click on Install Now, Make sure to put a tick on Add Python
3.7 to PATH.
Step 3: Click on Install NOW After the installation is successful. Click on
Close.
With these above three steps on python installation, you have successfully and
correctly installed Python. Now is the time to verify the installation.
Step 4: Let us test whether the python is correctly installed. Type python –V and
press Enter.
Note: If you have any of the earlier versions of Python already installed. You
must first uninstall the earlier version and then install the new one.
Step 3: Click on IDLE (Python 3.7 64-bit) and launch the program
Step 4: To go ahead with working in IDLE you must first save the file. Click on
File > Click on Save
Step 5: Name the file and save as type should be Python files. Click on SAVE.
Here I have named the files as Hey World.
Step 6: Now for e.g. enter print (“Hey World”) and Press Enter.
You will see that the command given is launched. With this, we end our tutorial
on how to install Python. You have learned how to download python for
windows into your respective operating system.
CHAPTER 7
SYSTEM REQUIREMENTS
SOFTWARE REQUIREMENTS
Jupiter (or)
Google colab
HARDWARE REQUIREMENTS
Ram : minimum 4 GB
FUNCTIONAL REQUIREMENTS
OUTPUT DESIGN
Internal Outputs whose destination is within organization and they are the
OUTPUT DEFINITION
Input design is a part of overall system design. The main objective during the
input design is as given below:
INPUT STAGES
Data recording
Data transcription
Data conversion
Data verification
Data control
Data transmission
Data validation
Data correction
INPUT TYPES
INPUT MEDIA
At this stage choice has to be made about the input media. To conclude about
the input media consideration has to be given to;
Type of input
Flexibility of format
Speed
Accuracy
Verification methods
Rejection rates
Ease of correction
Security
Easy to use
Portability
Keeping in view the above description of the input types and input media, it can
be said that most of the inputs are of the form of internal and interactive. As
Input data is to be the directly keyed in by the user, the keyboard can be
considered to be the most suitable input device.
ERROR AVOIDANCE
At this stage care is to be taken to ensure that input data remains accurate form
the stage at which it is recorded up to the stage in which the data is accepted by
the system. This can be achieved only by means of careful control each time
the data is handled.
ERROR DETECTION
Even though every effort is make to avoid the occurrence of errors, still a small
proportion of errors is always likely to occur, these types of errors can be
discovered by using validations to check the input data.
DATA VALIDATION
Procedures are designed to detect errors in data at a lower level of detail. Data
validations have been included in the system in almost every area where there is
a possibility for the user to commit errors. The system will not accept invalid
data. Whenever an invalid data is keyed in, the system immediately prompts the
user and the user has to again key in the data and the system will accept the data
only if the data is correct. Validations have been included where necessary.
The system is designed to be a user friendly one. In other words the system has
been designed to communicate effectively with the user. The system has been
designed with popup menus.
It is essential to consult the system users and discuss their needs while
designing the user interface
User initiated interface the user is in charge, controlling the progress of the
user/computer dialogue. In the computer-initiated interface, the computer
selects the next stage in the interaction.
Computer initiated interfaces
Forms oriented interface: The user calls up an image of the form to his/her
screen and fills in the form. The forms-oriented interface is chosen because
it is the best choice.
COMPUTER-INITIATED INTERFACES
The menu system for the user is presented with a list of alternatives and the
user chooses one; of alternatives.
Questions – answer type dialog system where the computer asks question
and takes action based on the basis of the users reply.
Right from the start the system is going to be menu driven, the opening menu
displays the available options. Choosing one option gives another popup menu
with more options. In this way every option leads the users to data entry form
where the user can key in the data.
The design of error messages is an important part of the user interface design.
As user is bound to commit some errors or other while designing a system the
system should be designed to be helpful by providing the user with information
regarding the error he/she has committed.
PERFORMANCE REQUIREMENTS
The requirement specification for any system can be broadly stated as given
below:
The existing system is completely dependent on the user to perform all the
duties.
CHAPTER 9
SOURCE CODE
import pandas as pd
import joblib
# Visualization
#Scaling
# Models
#Evaluation
import os
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
import os
import joblib
[2] df=pd.read_csv(r'StudentsPerformance.csv')
df
[5] df['total']
[6] df.head()
[8] df
[9] df.isnull().sum()
[10] df.info ()
le= LabelEncoder()
if df[column].dtype == 'object':
df[column] = le.fit_transform(df[column])
df.head()
[12] df.describe()
[13] df.isnull().sum()
[14] df = pd.DataFrame(df)
[15] df.head()
[16] df.head()
le= LabelEncoder()
df1=le.fit_transform(df['grade_category'])
df1=pd.DataFrame(df1)
df1.head()
[18] df.head()
HE
df.drop(['grade_category','total'],axis=1)
[20] df['grade_category']=df1
[21] df.head()
[22] df=df.drop('total',axis=1)
[23] df=pd.DataFrame(df)
df
[24] df.info()
[25] #X_test=X_test.drop('total',axis=1)
plt.figure(figsize=(10, 6))
ax = sns.countplot(data=df, x='grade_category')
plt.xlabel('grade_category')
plt.ylabel('Count')
for p in ax.patches:
textcoords='offset points')
plt.show()
[27] X = df.drop(columns=['grade_category'])
y=df['grade_category']
y
[28] X_train,X_test,y_train,y_test= train_test_split(X,y,test_size
=0.2,random_state=42 )
[29] labels=['E','D','C','B','A']
precision = []
recall = []
fscore = []
accuracy = []
a = accuracy_score(testY,predict)*100
accuracy.append(a)
precision.append(p)
recall.append(r)
fscore.append(f)
report=classification_report(predict, testY,target_names=labels)
ax.set_ylim([0,len(labels)])
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show()
[32] df.info()
import joblib
import os
if os.path.exists('model/KNNClassifier.pkl'):
knn = joblib.load('model/KNNClassifier.pkl')
else:
knn.fit(X_train, y_train)
joblib.dump(knn, 'model/KNNClassifier.pkl')
predict = knn.predict(X_test)
import joblib
import os
[34]if os.path.exists('model/NaiveBayesClassifier.pkl'):
nb = joblib.load('model/NaiveBayesClassifier.pkl')
predict = nb.predict(X_test)
else:
# Train the model
nb = GaussianNB()
nb.fit(X_train, y_train)
Joblib.dump(nb, 'model/NaiveBayesClassifier.pkl')
predict = nb.predict(X_test)
1.Initial setup
Reading Data: The data is read from a CSV file (dataset.csv) using Pandas'
read_csv method. The first two rows are displayed to get a glimpse of the
data structure.
Missing values: : Any missing values in the dataset are dropped using
dropna() to avoid inconsistencies during model training.
3.Data visualization
5.Train-Test split
Defining Features and Target Variables: The dataset is split into features (X) and
target variable (y), where X contains all columns except rainfall, and y contains
the rainfall values.
Splitting the Data: The data is split into training and testing sets using an 80-20
ratio (train_test_split). This ensures that the model is trained on 80% of the data
and evaluated on 20%.
6.KNN classifier
1.Gender
2.race/ethinicity
3.study
4.test preparation
5.atendence
Accuracy (55.41)
Precision (20.0)
CONCLUSION
FUTURE SCOPE
The intersection of artificial intelligence (AI) and machine learning (ML) with
education offers a promising future for predicting student academic
performance. This technology-driven approach has the potential to revolutionize
the way educational institutions identify at-risk students, personalize learning
experiences, and optimize resource allocation.
Here are some key areas where AI and ML can significantly impact student
academic performance prediction:
1. Early Identification of At-Risk Students:
While the potential benefits are significant, there are challenges to overcome:
Data Quality and Privacy: Ensuring the quality and privacy of student data
is crucial for accurate predictions and ethical considerations.
Model Bias: It's essential to address potential biases in the data and models
to avoid unfair outcomes.
Future Directions:
By addressing these challenges and leveraging the power of AI and ML, we can
create a future where education is more personalized, effective, and equitable.
REFERENCES
[1] Pushpa S.K, Manjunath T.N, “Class end result prediction the use of system
learning”, International Conference on Smart Technology for Smart Nation,
2017 p1208-1212.
[2] Nguyen Thai-Nghe, Andre Busche, and Lars Schmidt- Thieme, “Improving
Academic Performance Prediction with the aid of using Dealing with Class
Imbalance”, 2009 Ninth International Conference on Intelligent Systems Design
and Applications.
[4] P. Cortez, and A. Silva, “Using Data Mining To Predict Secondary School
Student Performance”, In EUROSIS,A. Brito and J. Teixeira (Eds.), 2008, pp.5-
12
[14] Su Y-S, Lin YD, Liu TQ. Applying machine learning technologies to
explore students’ learning features and performance prediction. Frontiers in
Neuroscience. 2022; 16: 1018005. Available from: https://doi.org/10.3389/
fnins.2022.1018005.
[15] Mesarić J, Šebalj D. Decision trees for predicting the academic success of
students. Croatian Operational Research Review. 2016; 7(2): 367-388.
[16] Yadav SK, Bharadwaj B, Pal S. Mining education data to predict student’s
retention: A comparative study. International Journal of Computer Science and
Information Security. 2012; 10(2): 113-117. Available from: https://
doi.org/10.48550/arXiv.1203.2987