0% found this document useful (0 votes)

29 views65 pages

Mini

The document presents a mini project report on an optimal feature selection algorithm aimed at predicting students' academic performance, submitted for a Bachelor of Technology degree in Computer Science and Engineering. It outlines the project's purpose, existing systems and their disadvantages, and proposes a new system with advantages such as improved prediction accuracy and reduced overfitting. The report includes acknowledgments, a detailed index, and a review of related work in the field of educational data mining.

Uploaded by

vvsp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views65 pages

Mini

Uploaded by

vvsp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

A Mini Project Report on

OPTIMAL FEATURE SELECTION ALGORITHM FOR PREDICTING

STUDENT’S ACADEMIC PERFORMANCE
Submitted to
Jawaharlal Nehru Technological University, Hyderabad
in partial fulfillment of requirements for the award of the degree
of
BACHELOR OF TECHNOLOGY

COMPUTER SCIENCE AND ENGINEERING

Y.SAI PRIYA (21C51A0548)

CH.SANDHYA RANI (21C51A0509)
CH.UMA SHANKAR (21C51A0564)
N.CHARAN TEJA (21C51A0558)

Under the guidance of

V.V.SIVA PRASAD
Assistant Professor
Department of CSE

Department of Computer Science and Engineering

SAI SPURTHI INSTITUTE OF TECHNOLOGY
(Approved by AICTE, Affiliated to JNTUH, Accredited by NAAC B++)
B. GANGARAM, SATHUPALLY, KHAMMAM, TS- 507303
2024 – 2025
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE
This is to certify that the project entitled Optimal feature selection Algorithm for predicting
student’s academic performance being submitted by

Y.SAI PRIYA (21C51A0548)

CH.SANDHYA RANI (21C51A0509)
CH.UMA SHANKAR (21C51A0564)
N.CHARAN TEJA (21C51A0558)

during the academic year 2024-25, in partial fulfillment for the award of the degree of
Bachelor of Technology in Computer Science and Engineering offered by
Sai Spurthi Institute of Technology, affiliated to the Jawaharlal Nehru Technological
University, Hyderabad.

Supervisor Head of the Department

Mr.V.V.SIVA PRASAD Dr. SK. YAKOOB

Assistant Professor Associate Professor

Dr. VSR KUMARI

PRINCIPAL

Submitted for Viva Voce Examination held on_________________

DECLARATION

We hereby declare that the project report entitled "Optimal feature selection Algorithm
for predicting student’s academic performance” is done in the partial fulfillment for the
award of the Degree in Bachelor of Technology in Computer Science and Engineering affiliated
to Jawaharlal Nehru Technological University, Hyderabad. This project has not been submitted
anywhere else.

Y.SAI PRIYA (21C51A0548)

CH.SANDHYA RANI (21C51A0509)

CH.UMASHANKAR (21C51A0564)

N.CHARAN TEJA(21C51A0558)
ACKNOWLEDGMENT

On the submission of our project entitled “Optimal feature selection Algorithm for

predicting student’s academic performance”, we would like to extend our gratitude and sincere

thanks to our supervisor, Mr. V.V. Siva Prasad, Assistant Professor, Department of Computer

Science and Engineering for his valuable and timely suggestions.

We would also like to extend our sincere thanks to Dr. SK. YAKOOB, Associate Professor,

Head of the Department, Computer Science and Engineering, for his valuable suggestions and

motivating guidance during our project work.

We endow our sincere thanks to Principal Dr. VSR KUMARI for his consistent cooperation

and encouragement.

We would like to thank the entire CSE Department faculty, who helped us directly and

indirectly in the completion of the project. We sincerely thank our friends and family for their

constant motivation during the project work.

Y.SAI PRIYA (21C51A0548)

CH.SANDHYA RANI(21C51A0509)

CH.UMASHANKAR (21C51A0564)

N.CHARAN TEJA(21C51A0558)
INDEX

1. INTRODUCTION 1-4

1.1. INTRODUCTION TO PROJECT 1

1.2. PURPOSE OF THE PROJECT 2
1.3. EXISTING SYSTEM & ITS DISADVANTAGES 3
1.4. PROPOSED SYSTEM & ITS ADVANTAGES 5

2. REVIEW OF RELATED WORK

3. SOFTWARE REQUIREMENT SPECIFICATIONS

3.1 FUNCIONAL REQUIREMENTS
3.2 PERFORMANCE REQUIREMENTS
3.3 SOFTWARE REQUIREMENTS
3.4 HARDWARE REQUIREMENTS
3.5 TECHNOLOGY USED
3.5.1 INTRODUCTION TO TECHNOLOGY

4 SYSTEM DESIGN
4.1. INTRODUCTION
4.2 UML DIAGRAMS
4.3 . E-R DIAGRAM
4.4 . DATA DICTIONARY

5 SYSTEM STUDY

6 SYSTEM IMPLEMENTATION

7 SYSTEM TESTING

8 RESULTS AND SCREENSHOTS

9 CONCLUSION

10 REFERENCES
1.INTRODUCTION

1.1 INTRODUCTION TO PROJECT

Data Mining (DM) technique is used to find concealed information from more essential
data. Its use in education has become prevalent lately, and most researchers work in this area.
This broad field of EDM varies from predicting the student’s placement to academics. It is an
evolving interdisciplinary area in which DM techniques are useful in academic data. Nowadays,
educational systems store massive data that come from multiple sources and in diverse format. In
a real-life scenario, every educational problem requires different mining techniques. The reason
behind is that traditional DM techniques cannot be applied directly to all issues. So many
software tools have developed, but all do not handle the educational problem, and an information
officer is not able to use these tools without the understanding of DM. EDM is an essential
application of data mining techniques to solve the research issues of the educational problem. In
educational researcher’s community, main focused areas for research are Intelligent Tutoring
System (ITS), Online Tutorial System (OTS) and e-learning to fabricate enhanced educational
outcome. The university can determine the academic performance of a student by using the
number of parameters. It could have based on academic or non-academic factors. Previously,
students who excelled at the secondary education level can lose their interest due to social
lifestyles and peer pressure. As compared to those who were struggling earlier with family
distraction might be able to concentrate from home, excelling at the university. Feature Selection
is a productive and dynamic research area in the field of Machine Learning and Data Mining.

The principal purpose of these FS algorithms is to select the most predictive features
from the chosen dataset for analysis and ignore the rest of the attribute, which is non predictive.
It means that non-predictive elements are not affecting the actual result, but it reduces the
complexity of the analysis results. The accuracy and effectiveness of the student's performance
prediction model can have improved with the help of these feature selection algorithms. These
feature selection algorithms can have further divided into three different groups, namely filter,
wrapper and integrated methods. The filtering methods of feature selection algorithms is one of
the primary techniques which depends on the general characteristics of the learning data and get
performed during the pre-processing phase of the dataset. The Wrapper method is used to
evaluate functions using learning algorithms. Embedded methods are executed during the
classifier's learning process and be more specific to learning algorithms.

1.2 PURPOSE OF THE PROJECT

The purpose of using an optimal feature selection algorithm in predicting students' academic
performance is to improve the accuracy and efficiency of prediction models. Here are some
key points:

1. Enhanced Prediction Accuracy: By selecting the most relevant features, the algorithm
helps in building models that better capture the factors influencing academic
performance, leading to more accurate predictions.

2. Reduced Overfitting: Eliminating irrelevant or redundant features reduces the risk of

overfitting, where the model performs well on training data but poorly on new data.

3. Improved Model Performance: A streamlined set of features can lead to faster and
more efficient model training and execution.

4. Better Interpretability: Models with fewer, more relevant features are easier to
interpret, helping educators and administrators understand the key factors affecting
student performance.

5. Resource Optimization: Efficient use of computational resources by focusing only on

important features.

In essence, an optimal feature selection algorithm helps create more reliable, interpretable,
and efficient models for predicting students' academic performance, ultimately aiding in
making informed decisions to enhance educational outcomes.
1.3 EXISTING SYSTEM & ITS DISADVANTAGES

EXISTING SYSTEM:
The given section is a short review of work done in the area of feature selection
algorithm by a different researcher. Many authors used feature selection (FS) algorithms
in combination with classification algorithms to compare the prediction accuracy of
varying student dataset. Some of the exciting work in this field of EDM has reviewed.
Siva Kumar S, Venkataraman S, et al., "Predictive Modeling of Student Dropout
Indicators in Educational Data Mining using Improved Decision Tree," proposed an
improved version of decision tree algorithm which will predict the dropout students. The
dataset of 240 students has been collected by the authors via survey and then applied the
correlation-based feature selection algorithm for preprocessing of the dataset.

The classification accuracy of this dataset is more than 90%. K. W. Stephen et al., in his
study "Data Mining Model for Predicting Student Enrolment in STEM Courses in
Higher Education Institutions," predict the fresh students’ enrolment in the course of
STEM (Science, Technology, Engineering and Mathematics). They selected 18 different
features and collected data from students through the questionnaire. For the pre-
processing of data, authors used Chi-Square and IG feature selection algorithm and
found better prediction with CART decision tree algorithm. E. Osmanbegović, et al., in
his study "Determining Dominant Factor for students Performance Prediction by using
Data Mining Classification Algorithms," calculate the academic performance of the
secondary school education student at Tuzla. For the pre-processing phase of the
collected dataset, they used Gain Ratio (GR) feature selection algorithm. They found the
best prediction accuracy with the Random Forest (RF) algorithm as compared to other
classification algorithms. A. Figueira, et al., "Predicting Grades by Principal Component
Analysis: A Data Mining Approach to Learning Analytics'," predict the students’
academic grade in Bachelor Degree program. For pre-processing phase, authors used
Principal Component Analysis (PCA) feature selection algorithm.
In this study, PCA feature selection algorithm has been used to build a decision tree.
This tree is used to predict the grade of the student in academic. N. Rachburee and W.
Punlumjeak et al., in his study "A comparison of feature selection approach between
greedy, IG-ratio, Chi-square, and mRMR in educational mining, "compare different
feature selection algorithm like IG-ration, Chi-Square, Greedy Forward selection and
mRMR. This work has conducted on the first year's student's dataset (with 15 attributes)
of the University of Technology, Thailand. In this research, authors found better
prediction accuracy by using Greedy Forward (GF) selection with Artificial Neural
Network (ANN) as compared to other classification algorithms (Decision Tree, K-NN
and Naive Bayes). M. Zaffar, M. A. Hashmani et al., in his study "Performance analysis
of feature selection algorithm for educational data mining," implemented different filter
feature selection algorithms on the selected dataset of a student. In this research, authors
used the dataset of two different students with a different number of feature selection
algorithm and analyzed the result for prediction accuracy.
Disadvantages:
• Here, four FS algorithms such as CfsSubsetEval,
GainRatioAttributeEval, InfoGainAttributeEval and
ReliefAttributeEval are evaluated.
• Classifications algorithms Naive Bayes (NB), Logistic Regression (LR),
DecisionTable (DT), JRip, J48 and Random Forest (RF) has evaluated
through academic algorithms.
• cfsSubsetEval: Attributes subsets are evaluated based on both the
predictive ability and the degree of redundancy of each feature.
• Low intercorrelations have preferred for those features that have
extremely related to the class.
• Attribute Subset Evaluator (cfsSubsetEval) + Search Method (Best
first (forwards)) In table-2, best seven attributes (gender, Relation,
raisedhands, VisITedResources, AnnouncementsView,
ParentAnsweringSurvey, StudentAbsenceDays.) has selected based
on the FS mentioned above algorithm.
• It providing most classified instances with accuracy up to 77.29%
and least is JRip with accuracy up to 73.75%.

1.4 PROPOSED SYSTEM & ITS ADVANTAGES

PROPOSED SYSTEM:

In the proposed system, The principal purpose of these FS algorithms is to select the most
predictive features from the chosen dataset for analysis and ignore the rest of the attribute,
which is non predictive. It means that non-predictive elements are not affecting the actual
result, but it reduces the complexity of the analysis results. The accuracy and effectiveness of
the student's performance prediction model can have improved with the help of these feature
selection algorithms. These feature selection algorithms can have further divided into three
different groups, namely filter, wrapper and integrated methods. The filtering methods of
feature selection algorithms is one of the primary techniques which depends on the general
characteristics of the learning data and get performed during the pre-processing phase of the
dataset. The Wrapper method is used to evaluate functions using learning algorithms.
Embedded methods are executed during the classifier's learning process and be more specific
to learning algorithms.
Advantages

➢ The proposed methodology implements Hashing techniques method which is more fast
and reliable method to process the data.
➢ The proposed system implemented Clustering chain techniques based on Hashing for
improving the system performance
2.REVIEW OF RELATED WORK
The given section is a short review of work done in the area of feature selection algorithm by
a different researcher. Many authors used feature selection (FS) algorithms in combination
with classification algorithms to compare the prediction accuracy of varying student dataset.
Some of the exciting work in this field of EDM has reviewed.
• Siva Kumar S, Venkataraman S, et al., "Predictive Modeling of Student Dropout
Indicators in Educational Data Mining using Improved Decision Tree," proposed an
improved version of decision tree algorithm which will predict the dropout students.
The dataset of 240 students has been collected by the authors via survey and then
applied the correlation-based feature selection algorithm for preprocessing of the
dataset. The classification accuracy of this dataset is more than 90%.
• K. W. Stephen et al., in his study "Data Mining Model for Predicting Student
Enrolment in STEM Courses in Higher Education Institutions," predict the fresh
students’ enrolment in the course of STEM (Science, Technology, Engineering and
Mathematics). They selected 18 different features and collected data from students
through the questionnaire. For the pre-processing of data, authors used Chi-Square
and IG feature selection algorithm and found better prediction with CART decision
tree algorithm.
• E. Osmanbegović, et al., in his study "Determining Dominant Factor for students
Performance Prediction by using Data Mining Classification Algorithms," calculate
the academic performance of the secondary school education student at Tuzla. For the
pre-processing phase of the collected dataset, they used Gain Ratio (GR) feature
selection algorithm. They found the best prediction accuracy with the Random Forest
(RF) algorithm as compared to other classification algorithms.
• A. Figueira, et al., "Predicting Grades by Principal Component Analysis: A Data
Mining Approach to Learning Analytics'," predict the students’ academic grade in
Bachelor Degree program. For pre-processing phase, authors used Principal
Component Analysis (PCA) feature selection algorithm. In this study, PCA feature
selection algorithm has been used to build a decision tree. This tree is used to predict
the grade of the student in academic.
• N. Rachburee and W. Punlumjeak et al., in his study "A comparison of feature
selection approach between greedy, IG-ratio, Chi-square, and mRMR in educational
mining, "compare different feature selection algorithm like IG-ration, Chi-Square,
Greedy Forward selection and mRMR. This work has conducted on the first year's
student's dataset (with 15 attributes) of the University of Technology, Thailand. In
this research, authors found better prediction accuracy by using Greedy Forward (GF)
selection with Artificial Neural Network (ANN) as compared to other classification
algorithms (Decision Tree, K-NN and Naive Bayes).
• M. Zaffar, M. A. Hashmani et al., in his study "Performance analysis of feature
selection algorithm for educational data mining," implemented different filter feature
selection algorithms on the selected dataset of a student. In this research, authors used
the dataset of two different students with a different number of feature selection
algorithm and analyzed the result for prediction accuracy.
3.SOFTWARE REQUIREMENT SPECIFICATIONS

3.1 Functional Requirements:

3.1.1 Data Collection and Preprocessing
• The system must allow users to upload datasets in formats like CSV, Excel, or from a
database.
• It must clean the dataset by:
o Handling missing data through imputation or removal.
o Normalizing numerical values for consistent scales.
o Detecting and addressing outliers.
• Enable categorical encoding (e.g., One-Hot Encoding or Label Encoding).
3.1.2 Feature Selection
• The system must implement the following feature selection techniques:
o Filter Methods: Correlation Analysis, Mutual Information.
o Wrapper Methods: Recursive Feature Elimination (RFE), Forward/Backward
Selection.
o Embedded Methods: LASSO Regression, Decision Tree-based methods.
• Provide options to select the number of features to retain.
• Generate a report displaying the selected features.
3.1.3 Machine Learning Model Training
• Train predictive models (e.g., Decision Trees, Logistic Regression) on selected features.
• Provide options to split the dataset into training and test sets (e.g., 80-20, 70-30).
• Evaluate model performance using metrics such as:
o Accuracy, Precision, Recall, F1-score.
• Allow users to save and reuse trained models.
3.1.4 Insights and Visualization
• Visualize important features using:
o Bar plots for feature importance.
o Correlation heatmaps for relationships between features and performance.
• Generate a performance comparison (e.g., before and after feature selection).
3.1.5 User Interface
• Provide an intuitive GUI for uploading datasets, selecting algorithms, and viewing
results.
• Allow customization of parameters for feature selection and model training.

3.2. Performance Requirements

• The system should process datasets with up to 1 million rows and 100 features in less
than 5 minutes.
• Ensure response time for user actions (e.g., uploading files, selecting algorithms) is under
2 seconds.
• The system should improve model accuracy by at least 10% after applying feature
selection compared to using all features.

3.3. Software Requirements

• Operating System - Windows (8 or later)
• Coding Language - Java/J2EE(JSP,Servlet)
• Front End - J2EE
• Back End - MySQL
Optimal Feature Selection Algorithm for Predicting Students Academic Performance

3.4 Hardware Requirements

➢ Processor - Intel Core i5 or equivalent.
➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Key Board - Standard Windows Keyboard
➢ Mouse - Two or Three Button Mouse
➢ Monitor - SVGA

3.5 TECHNOLOGY USED

3.5.1 INTRODUCTION TO TECHNOLOGY

Java Technology:
Java technology is both a programming language and a platform.
The Java Programming Language
The Java programming language is a high-level language that can be characterized by all
of the following buzzwords:
▪ Simple
▪ Architecture neutral
▪ Object oriented
▪ Portable
▪ Distributed
▪ High performance
▪ Interpreted
▪ Multithreaded
▪ Robust
▪ Dynamic
▪ Secure

With most programming languages, you either compile or interpret a program so that you can
run it on your computer. The Java programming language is unusual in that a program is both
compiled and interpreted. With the compiler, first you translate a program into an
intermediate language called Java byte codes —the platform-independent codes interpreted
by the interpreter on the Java platform. The interpreter parses and runs each Java byte code
instruction on the computer. Compilation happens just once; interpretation occurs each time
the program is executed. The following figure illustrates how this works.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

You can think of Java byte codes as the machine code instructions for the Java Virtual
Machine (Java VM). Every Java interpreter, whether it’s a development tool or a Web
browser that can run applets, is an implementation of the Java VM. Java byte codes help
make “write once, run anywhere” possible. You can compile your program into byte codes
on any platform that has a Java compiler. The byte codes can then be run on any
implementation of the Java VM. That means that as long as a computer has a Java VM, the
same program written in the Java programming language can run on Windows 2000, a
Solaris workstation, or on an iMac.

The Java Platform

A platform is the hardware or software environment in which a program runs. We’ve already
mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and
MacOS. Most platforms can be described as a combination of the operating system and
hardware. The Java platform differs from most other platforms in that it’s a software-only
platform that runs on top of other hardware-based platforms.
The Java platform has two components:
• The Java Virtual Machine (Java VM)
• The Java Application Programming Interface (Java API)

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

You’ve already been introduced to the Java VM. It’s the base for the Java platform and is
ported onto various hardware-based platforms.
The Java API is a large collection of ready-made software components that provide many
useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped
into libraries of related classes and interfaces; these libraries are known as packages. The
next section, What Can Java Technology Do? Highlights what functionality some of the
packages in the Java API provide.
The following figure depicts a program that’s running on the Java platform. As the figure
shows, the Java API and the virtual machine insulate the program from the hardware.

Native code is code that after you compile it, the compiled code runs on a specific hardware
platform. As a platform-independent environment, the Java platform can be a bit slower than
native code. However, smart compilers, well-tuned interpreters, and just-in-time byte code
compilers can bring performance close to that of native code without threatening portability.
What Can Java Technology Do?
The most common types of programs written in the Java programming language are applets
and applications. If you’ve surfed the Web, you’re probably already familiar with applets.
An applet is a program that adheres to certain conventions that allow it to run within a Java-
enabled browser.
However, the Java programming language is not just for writing cute, entertaining applets for
the Web. The general-purpose, high-level Java programming language is also a powerful
software platform. Using the generous API, you can write many types of programs.
An application is a standalone program that runs directly on the Java platform. A special kind
of application known as a server serves and supports clients on a network. Examples of
servers are Web servers, proxy servers, mail servers, and print servers. Another specialized
program is a servlet. A servlet can almost be thought of as an applet that runs on the server
side. Java Servlets are a popular choice for building interactive web applications, replacing
the use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

applications. Instead of working in browsers, though, servlets run within Java Web servers,
configuring or tailoring the server.
How does the API support all these kinds of programs? It does so with packages of software
components that provides a wide range of functionality. Every full implementation of the
Java platform gives you the following features:
• The essentials: Objects, strings, threads, numbers, input and output, data structures,
system properties, date and time, and so on.
• Applets: The set of conventions used by applets.
• Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram
Protocol) sockets, and IP (Internet Protocol) addresses.
• Internationalization: Help for writing programs that can be localized for users worldwide.
Programs can automatically adapt to specific locales and be displayed in the appropriate
language.
• Security: Both low level and high level, including electronic signatures, public and
private key management, access control, and certificates.
• Software components: Known as JavaBeansTM, can plug into existing component
architectures.
• Object serialization: Allows lightweight persistence and communication via Remote
Method Invocation (RMI).
• Java Database Connectivity (JDBCTM): Provides uniform access to a wide range of
relational databases.
The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration,
telephony, speech, animation, and more. The following figure depicts what is included in the
Java 2 SDK.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

How Will Java Technology Change My Life?

We can’t promise you fame, fortune, or even a job if you learn the Java programming
language. Still, it is likely to make your programs better and requires less effort than other
languages. We believe that Java technology will help you do the following:
• Get started quickly: Although the Java programming language is a powerful object-
oriented language, it’s easy to learn, especially for programmers already familiar with C
or C++.
• Write less code: Comparisons of program metrics (class counts, method counts, and so
on) suggest that a program written in the Java programming language can be four times
smaller than the same program in C++.
• Write better code: The Java programming language encourages good coding practices,
and its garbage collection helps you avoid memory leaks. Its object orientation, its
JavaBeans component architecture, and its wide-ranging, easily extendible API let you
reuse other people’s tested code and introduce fewer bugs.
• Develop programs more quickly: Your development time may be as much as twice as fast
versus writing the same program in C++. Why? You write fewer lines of code and it is a
simpler programming language than C++.
• Avoid platform dependencies with 100% Pure Java: You can keep your program portable
by avoiding the use of libraries written in other languages. The 100% Pure JavaTM
Product Certification Program has a repository of historical process manuals, white
papers, brochures, and similar materials online.
• Write once, run anywhere: Because 100% Pure Java programs are compiled into
machine-independent byte codes, they run consistently on any Java platform.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

• Distribute software more easily: You can upgrade applets easily from a central server.
Applets take advantage of the feature of allowing new classes to be loaded “on the fly,”
without recompiling the entire program.
ODBC
Microsoft Open Database Connectivity (ODBC) is a standard programming interface for
application developers and database systems providers. Before ODBC became a de facto
standard for Windows programs to interface with database systems, programmers had to use
proprietary languages for each database they wanted to connect to. Now, ODBC has made
the choice of the database system almost irrelevant from a coding perspective, which is as it
should be. Application developers have much more important things to worry about than the
syntax that is needed to port their program from one database to another when business needs
suddenly change.
Through the ODBC Administrator in Control Panel, you can specify the particular database
that is associated with a data source that an ODBC application program is written to use.
Think of an ODBC data source as a door with a name on it. Each door will lead you to a
particular database. For example, the data source named Sales Figures might be a SQL
Server database, whereas the Accounts Payable data source could refer to an Access
database. The physical database referred to by a data source can reside anywhere on the
LAN.
The ODBC system files are not installed on your system by Windows 95. Rather, they are
installed when you setup a separate database application, such as SQL Server Client or
Visual Basic 4.0. When the ODBC icon is installed in Control Panel, it uses a file called
ODBCINST.DLL. It is also possible to administer your ODBC data sources through a stand-
alone program called ODBCADM.EXE. There is a 16-bit and a 32-bit version of this
program and each maintains a separate list of ODBC data sources.

From a programming perspective, the beauty of ODBC is that the application can be written
to use the same set of function calls to interface with any data source, regardless of the
database vendor. The source code of the application doesn’t change whether it talks to Oracle
or SQL Server. We only mention these two as an example. There are ODBC drivers available
for several dozen popular database systems. Even Excel spreadsheets and plain text files can
be turned into data sources. The operating system uses the Registry information written by

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

ODBC Administrator to determine which low-level ODBC drivers are needed to talk to the
data source (such as the interface to Oracle or SQL Server). The loading of the ODBC
drivers is transparent to the ODBC application program. In a client/server environment, the
ODBC API even handles many of the network issues for the application programmer.
The advantages of this scheme are so numerous that you are probably thinking there must be
some catch. The only disadvantage of ODBC is that it isn’t as efficient as talking directly to
the native database interface. ODBC has had many detractors make the charge that it is too
slow. Microsoft has always claimed that the critical factor in performance is the quality of the
driver software that is used. In our humble opinion, this is true. The availability of good
ODBC drivers has improved a great deal recently. And anyway, the criticism about
performance is somewhat analogous to those who said that compilers would never match the
speed of pure assembly language. Maybe not, but the compiler (or ODBC) gives you the
opportunity to write cleaner programs, which means you finish sooner. Meanwhile,
computers get faster every year.

JDBC
In an effort to set an independent database standard API for Java; Sun Microsystems
developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database
access mechanism that provides a consistent interface to a variety of RDBMSs. This
consistent interface is achieved through the use of “plug-in” database connectivity modules,
or drivers. If a database vendor wishes to have JDBC support, he or she must provide the
driver for each platform that the database and Java run on.
To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC. As you
discovered earlier in this chapter, ODBC has widespread support on a variety of platforms.
Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market much faster
than developing a completely new connectivity solution.
JDBC was announced in March of 1996. It was released for a 90 day public review that
ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released
soon after.
The remainder of this section will cover enough information about JDBC for you to know
what it is about and how to use it effectively. This is by no means a complete overview of
JDBC. That would fill an entire book.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

JDBC Goals
Few software packages are designed without goals in mind. JDBC is one that, because of its
many goals, drove the development of the API. These goals, in conjunction with early
reviewer feedback, have finalized the JDBC class library into a solid framework for building
database applications in Java.
The goals that were set for JDBC are important. They will give you some insight as to why
certain classes and functionalities behave the way they do. The eight design goals for JDBC
are as follows:

1. SQL Level API

The designers felt that their main goal was to define a SQL interface for Java. Although
not the lowest database interface level possible, it is at a low enough level for higher-level
tools and APIs to be created. Conversely, it is at a high enough level for application
programmers to use it confidently. Attaining this goal allows for future tool vendors to
“generate” JDBC code and to hide many of JDBC’s complexities from the end user.
2. SQL Conformance
SQL syntax varies as you move from database vendor to database vendor. In an effort to
support a wide variety of vendors, JDBC will allow any query statement to be passed through
it to the underlying database driver. This allows the connectivity module to handle non-
standard functionality in a manner that is suitable for its users.
3. JDBC must be implemental on top of common database interfaces
The JDBC SQL API must “sit” on top of other common SQL level APIs. This goal
allows JDBC to use existing ODBC level drivers by the use of a software interface. This
interface would translate JDBC calls to ODBC and vice versa.
4. Provide a Java interface that is consistent with the rest of the Java system
Because of Java’s acceptance in the user community thus far, the designers feel that they
should not stray from the current design of the core Java system.
5. Keep it simple
This goal probably appears in all software design goal listings. JDBC is no exception. Sun
felt that the design of JDBC should be very simple, allowing for only one method of

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

completing a task per mechanism. Allowing duplicate functionality only serves to confuse
the users of the API.
6. Use strong, static typing wherever possible
Strong typing allows for more error checking to be done at compile time; also, less error
appear at runtime.
7. Keep the common cases simple
Because more often than not, the usual SQL calls used by the programmer are simple
SELECT’s, INSERT’s, DELETE’s and UPDATE’s, these queries should be simple to
perform with JDBC. However, more complex SQL statements should also be possible.
Finally we decided to proceed the implementation using Java Networking. And for
dynamically updating the cache table we go for MS Access database.
Java ha two things: a programming language and a platform.
Java is a high-level programming language that is all of the following

▪ Simple
▪ Architecture-neutral
▪ Object-oriented
▪ Portable
▪ Distributed
▪ High-performance
▪ Interpreted
▪ multithreaded
▪ Robust
▪ Dynamic
▪ Secure

Java is also unusual in that each Java program is both compiled and interpreted. With a
compile you translate a Java program into an intermediate language called Java byte codes
the platform-independent code instruction is passed and run on the computer.

Compilation happens just once; interpretation occurs each time the program is executed. The
figure illustrates how this works.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

You can think of Java byte codes as the machine code instructions for the Java Virtual
Machine (Java VM). Every Java interpreter, whether it’s a Java development tool or a Web
browser that can run Java applets, is an implementation of the Java VM. The Java VM can
also be implemented in hardware.
Java byte codes help make “write once, run anywhere” possible. You can compile your Java
program into byte codes on my platform that has a Java compiler. The byte codes can then be
run any implementation of the Java VM. For example, the same Java program can run
Windows NT, Solaris, and Macintosh.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

4.1 INTRODUCTION
The System Design of the Optimal Feature Selection Algorithm for Predicting Students'
Academic Performance provides a blueprint for how the system will function and operate. It
focuses on defining the architecture, components, and workflow of the system to ensure it
meets functional and non-functional requirements effectively.

Purpose of System Design

The primary goal of the system design is to:

1. Translate the functional requirements into a structured and efficient system architecture.

2. Define how data flows through various components and how the modules interact.

3. Ensure scalability, reliability, and maintainability of the system.

This design will guide the development process, ensuring the system fulfills its objective of
improving predictive accuracy by selecting optimal features from student data.

Scope of System Design

The system design encompasses:

• Data Input and Preprocessing: Handling data formats, cleaning, and preparing for
analysis.

• Feature Selection Module: Identifying the most relevant features for prediction using
various algorithms.

• Model Training and Evaluation: Training machine learning models with selected features
and evaluating their performance.

• Visualization and Reporting: Presenting results in user-friendly visualizations and

detailed reports.

This design will support educators and researchers in analyzing factors influencing academic
performance while enabling timely interventions for students at risk.

Design Approach

1. Modular Architecture: The system is divided into discrete modules (e.g., data
preprocessing, feature selection, model training) for ease of development, testing, and
maintenance.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

2. Technology Stack: The system is implemented using Java, leveraging its robust
ecosystem for building scalable, platform-independent applications.

3. Three-Tier Architecture:

o Presentation Layer: A user-friendly interface for interacting with the system.

o Application Layer: Backend logic for implementing feature selection algorithms

and machine learning models.

o Data Layer: A database for managing datasets, feature selection results, and
model outputs.

4. Scalability: Designed to handle large datasets and adapt to new data formats or
algorithms.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

4.2 UML DIAGRAMS

4.2.1CLASS DIAGRAM:

A Class Diagram is a type of static structure diagram in Unified Modeling Language (UML)
that describes the structure of a system by showing its classes, their attributes, operations (or
methods), and the relationships among the classes. It is one of the most commonly used
diagrams in object-oriented design.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

4.2.2 DATA FLOW DIAGRAM:

A Data Flow Diagram (DFD) is a graphical representation of the flow of data through an
information system. It visually shows how data moves from input to output and through
processes within the system. DFDs help to understand the system’s functionality, data
requirements, and interactions between different components.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

4.2.3 SEQUENCE DIAGRAM:

A Sequence Diagram is a type of interaction diagram in Unified Modeling Language

(UML) that shows how objects or components in a system interact with each other over time.
It focuses on the sequence of messages exchanged between objects to complete a specific
function or process.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

4.3 Flow charts:

• Flow chart: Remote User

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

• Flow chart: Admin

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

4.4 DATA DICTIONARY

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

2.1 FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is put forth
with a very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the major
requirements for the system is essential.

Three key considerations involved in the feasibility analysis are

 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY

This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development
of the system is limited. The expenditures must be justified. Thus the developed system as well
within the budget and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased.

TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that is, the technical requirements
of the system. Any system developed must not have a high demand on the available technical
resources. This will lead to high demands on the available technical resources. This will lead to
high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this system.

SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel
threatened by the system, instead must accept it as a necessity. The level of acceptance by the
users solely depends on the methods that are employed to educate the user about the system and
to make him familiar with it. His level of confidence must be raised so that he is also able to
make some constructive criticism, which is welcomed, as he is the final user of the system.
SAI SPURTHI INSTITUTE OF TECHNOLOGY 2
Optimal Feature Selection Algorithm for Predicting Students Academic Performance

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

Modules

Admin:

In this module, the Admin has to login by using valid user name and password. After login
successful he can do some operations such as Login, View All Users and Authorize, Add
Category, View Category Hash code, View All Datasets, View All Datasets By Classification
Mining, View Grade Results.

Remote User:

In this module, there are n numbers of users are present. User should register before doing any
operations. Once user registers, their details will be stored to the database. After registration
successful, he has to login by using authorized user name and password. Once Login is
successful user will do some operations like Register and Login, My Profile, Upload Datasets,
View All Uploaded Datasets, Find Student Grade, Find Student Grade By Hash code.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

6. SYSTEM TESTING

The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the
Software system meets its requirements and user expectations and does not fail in an
unacceptable manner. There are various types of test. Each test type addresses a specific testing
requirement.

TYPES OF TESTS

Unit testing
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately
to the documented specifications and contains clearly defined inputs and expected results.

Integration testing

Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the combination of components is
correct and consistent. Integration testing is specifically aimed at exposing the problems that
arise from the combination of components.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

Functional test

Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key functions, or

special test cases. In addition, systematic coverage pertaining to identify Business process flows;
data fields, predefined processes, and successive processes must be considered for testing.
Before functional testing is complete, additional tests are identified and the effective value of
current tests is determined.

System Test
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.

White Box Testing

White Box Testing is a testing in which in which the software tester has knowledge of the
inner workings, structure and language of the software, or at least its purpose. It is purpose. It is
used to test areas that cannot be reached from a black box level.

Black Box Testing

Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of tests,
must be written from a definitive source document, such as specification or requirements

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

document, such as specification or requirements document. It is a testing in which the software

under test is treated, as a black box .you cannot “see” into it. The test provides inputs and
responds to outputs without considering how the software works.

6.1 Unit Testing:

Unit testing is usually conducted as part of a combined code and unit test phase of the
software lifecycle, although it is not uncommon for coding and unit testing to be conducted as
two distinct phases.

Test strategy and approach

Field testing will be performed manually and functional tests will be written in detail.

Test objectives
• All field entries must work properly.
• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.

Features to be tested
• Verify that the entries are of the correct format
• No duplicate entries should be allowed
• All links should take the user to the correct page.

6.2 Integration Testing

Software integration testing is the incremental integration testing of two or more

integrated software components on a single platform to produce failures caused by interface
defects.
The task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company level –
interact without error.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

Test Results: All the test cases mentioned above passed successfully. No defects encountered.

7.3 Acceptance Testing

User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.

TESTING METHODOLOGIES
The following are the Testing Methodologies:
o Unit Testing.
o Integration Testing.
o User Acceptance Testing.
o Output Testing.
o Validation Testing.

Unit Testing
Unit testing focuses verification effort on the smallest unit of Software design that is the
module. Unit testing exercises specific paths in a module’s control structure to ensure complete
coverage and maximum error detection. This test focuses on each module individually, ensuring
that it functions properly as a unit. Hence, the naming is Unit Testing.
During this testing, each module is tested individually and the module interfaces are
verified for the consistency with design specification. All important processing path are tested
for the expected results. All error handling paths are also tested.

Integration Testing
Integration testing addresses the issues associated with the dual problems of verification
and program construction. After the software has been integrated a set of high order tests are
conducted. The main objective in this testing process is to take unit tested modules and builds a
program structure that has been dictated by design.
The following are the types of Integration Testing:

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

1)Top Down Integration

This method is an incremental approach to the construction of program structure.
Modules are integrated by moving downward through the control hierarchy, beginning with the
main program module. The module subordinates to the main program module are incorporated
into the structure in either a depth first or breadth first manner.
In this method, the software is tested from main module and individual stubs are replaced
when the test proceeds downwards.

2. Bottom-up Integration
This method begins the construction and testing with the modules at the lowest level in
the program structure. Since the modules are integrated from the bottom up, processing required
for modules subordinate to a given level is always available and the need for stubs is eliminated.
The bottom up integration strategy may be implemented with the following steps:

▪ The low-level modules are combined into clusters into clusters that perform a
specific Software sub-function.
▪ A driver (i.e.) the control program for testing is written to coordinate test case input and
output.
▪ The cluster is tested.
▪ Drivers are removed and clusters are combined moving upward in the program
structure

The bottom up approaches tests each module individually and then each module is module is
integrated with a main module and tested for functionality.

OTHER TESTING METHODOLOGIES

User Acceptance Testing
User Acceptance of a system is the key factor for the success of any system. The system
under consideration is tested for user acceptance by constantly keeping in touch with the
prospective system users at the time of developing and making changes wherever required. The
system developed provides a friendly user interface that can easily be understood even by a
person who is new to the system.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

Output Testing
After performing the validation testing, the next step is output testing of the proposed
system, since no system could be useful if it does not produce the required output in the specified
format. Asking the users about the format required by them tests the outputs generated or
displayed by the system under consideration. Hence the output format is considered in 2 ways –
one is on screen and another in printed format.
Validation Checking
Validation checks are performed on the following fields.
Text Field:
The text field can contain only the number of characters lesser than or equal to its size.
The text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect entry
always flashes and error message.
Numeric Field:
The numeric field can contain only numbers from 0 to 9. An entry of any character
flashes an error messages. The individual modules are checked for accuracy and what it has to
perform. Each module is subjected to test run along with sample data. The individually tested
modules are integrated into a single system. Testing involves executing the real data
information is used in the program the existence of any program defect is inferred from the
output. The testing should be planned so that all the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and produces
and output revealing the errors in the system.

Preparation of Test Data

Taking various kinds of test data does the above testing. Preparation of test data plays a
vital role in the system testing. After preparing the test data the system under study is tested
using that test data. While testing the system by using test data errors are again uncovered and
corrected by using above testing steps and corrections are also noted for future use.
Using Live Test Data:
Live test data are those that are actually extracted from organization files. After a system
is partially constructed, programmers or analysts often ask users to key in a set of data from their
normal activities. Then, the systems person uses this data as a way to partially test the system. In
other instances, programmers or analysts extract a set of live data from the files and have them

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

entered themselves.
It is difficult to obtain live data in sufficient amounts to conduct extensive testing. And,
although it is realistic data that will show how the system will perform for the typical processing
requirement, assuming that the live data entered are in fact typical, such data generally will not
test all combinations or formats that can enter the system. This bias toward typical values then
does not provide a true systems test and in fact ignores the cases most likely to cause system
failure.
Using Artificial Test Data:
Artificial test data are created solely for test purposes, since they can be generated to test all
combinations of formats and values. In other words, the artificial data, which can quickly be
prepared by a data generating utility program in the information systems department, make
possible the testing of all login and control paths through the program.
The most effective test programs use artificial test data generated by persons other than those
who wrote the programs. Often, an independent team of testers formulates a testing plan, using
the systems specifications.
The package “Virtual Private Network” has satisfied all the requirements specified as per
software requirement specification and was accepted.
USER TRAINING
Whenever a new system is developed, user training is required to educate them about the
working of the system so that it can be put to efficient use by those for whom the system has
been primarily designed. For this purpose the normal working of the project was demonstrated to
the prospective users. Its working is easily understandable and since the expected users are
people who have good knowledge of computers, the use of this system is very easy.
MAINTAINENCE
This covers a wide range of activities including correcting code and design errors. To reduce the
need for maintenance in the long run, we have more accurately defined the user’s requirements
during the process of system development. Depending on the requirements, this system has been
developed to satisfy the needs to the largest possible extent. With development in technology, it
may be possible to add many more features based on the requirements in future. The coding and
designing is simple and easy to understand which will make maintenance easier.

TESTING STRATEGY :

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

A strategy for system testing integrates system test cases and design techniques into a well
planned series of steps that results in the successful construction of software. The testing
strategy must co-operate test planning, test case design, test execution, and the resultant data
collection and evaluation .A strategy for software testing must accommodate low-level tests
that are necessary to verify that a small source code segment has been correctly implemented
as well as high level tests that validate major system functions against user requirements.
Software testing is a critical element of software quality assurance and represents the ultimate
review of specification design and coding. Testing represents an interesting anomaly for the
software. Thus, a series of testing are performed for the proposed system before the system is
ready for user acceptance testing.
SYSTEM TESTING:
Software once validated must be combined with other system elements (e.g. Hardware, people,
database). System testing verifies that all the elements are proper and that overall system
function performance is achieved. It also tests to find discrepancies between the system and its
original objective, current specifications and system documentation.
UNIT TESTING:
In unit testing different are modules are tested against the specifications produced during the
design for the modules. Unit testing is essential for verification of the code produced during the
coding phase, and hence the goals to test the internal logic of the modules. Using the detailed
design description as a guide, important Conrail paths are tested to uncover errors within the
boundary of the modules. This testing is carried out during the programming stage itself. In this
type of testing step, each module was found to be working satisfactorily as regards to the
expected output from the module.
In Due Course, latest technology advancements will be taken into consideration. As part
of technical build-up many components of the networking system will be generic in nature so
that future projects can either use or interact with this. The future holds a lot to offer to the
development and refinement of this project.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

Admin login

Admin’s homepage:

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

Admin views all the users and Authorizes

Admin adds a category:

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

Admin views category hash codes:

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

Admin views all Datasets:

Admin views Datasets by Classification Mining:

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

Admin views all Grade results

User Login page:

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

Remote user Home page

User views his own profile:

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

User uploads Datasets

User views all Datasets:

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

User finds a Student’s grade:

Result:

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

User finds grade by hash code:

Result:

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

CONCLUSION

In this work, different FS algorithms are evaluated and analyzed with different classification
algorithms (like Random Forest, J Rip, J48, Decision Tree, Linear Regression). The
implementation result of these FS algorithms doesn’t show any significant change range from
67.9167% to 77.2917% using WEKA toolkit. The cfs Subset Eval algorithm with Random Forest
algorithm gave the highest accuracy up to 77.2917%, and Relief Attribute Eval algorithm with
Decision Tree gave the lowest accuracy up to 67.9167%. From Figure 1, it is very much clear
that Random Forest, with almost all feature selection algorithms, shows better accuracy than
other algorithms in combination. In future, more feature selection algorithms are analyzed with
different classification algorithms to get better efficiency. The same work can also have done on
different student academic dataset. Apart from this, we can't overlook the benefits of feature
selection techniques in Data Mining

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Optimal Feature Selection Algorithm for Predicting Students Academic Performance

REFERENCES:

❖ S. Sivakumar, S. Venkataraman, and R. Selvaraj, "Predictive Modeling of Student

Dropout Indicators in Educational Data Mining using Improved Decision Tree," Indian
Journal of Science and Technology, vol. 9, 2016.
❖ K. W. Stephen, "Data Mining Model for Predicting Student Enrolment in STEM Courses
in Higher Education Institutions," 2016.
❖ E. Osmanbegović, M. Suljić, and H. Agić, "Determining Dominant Factor for students
Performance Prediction by using Data Mining Classification Algorithms," Tranzicija,
vol. 16, pp. 147-158, 2015.
❖ A. Figueira, "Predicting Grades by Principal Component Analysis: A Data Mining
Approach to Learning Analyics," in Advanced Learning Technologies (ICALT), 2016
IEEE 16th International Conference on, 2016, pp. 465-467.
❖ N. Rachburee and W. Punlumjeak, "A comparison of feature selection approach between
greedy, IGratio, Chi-square, and mRMR in educational mining," in Information
Technology and Electrical Engineering (ICITEE), 2015 7th International Conference on,
2015, pp. 420-424.
❖ M. Zaffar, M. A. Hashmani, and K. Savita, "Performance analysis of feature selection
❖ algorithm for educational data mining," in Big Data and Analytics (ICBDA), 2017 IEEE
Conference on, 2017, pp. 7-12[7]. A. Mueen, B. Zafar, and U. Manzoor, "Modeling and
Predicting Students' Academic Performance Using Data Mining Techniques,"
International Journal of Modern Education and Computer Science, vol. 8, p. 36, 2016.
❖ Amrieh, E. A., Hamtini, T., & Aljarah, I. (2015, November). Pre-processing and
analyzing educational data set using X-API for improving student's performance. In
Applied Electrical Engineering and Computing Technologies(AEECT), 2015 IEEE
Jordan Conference on (pp. 1-5). IEEE.
❖ N. Rachburee and W. Punlumjeak, "A comparison of feature selection approach between
greedy, IGratio, Chi-square, and mRMR in educational mining," in Information
Technology and Electrical Engineering (ICITEE), 2015 7th International Conference on,
2015, pp. 420-424.
❖ J. Novaković, "Toward optimal feature selection using ranking methods and
classification algorithms," Yugoslav Journal of Operations Research, vol. 21, 2016.
❖ C. Anuradha and T. Velmurugan, "Feature Selection Techniques to Analyse Student
Academic Performance using Naïve Bayes Classifier," in The 3rd International
Conference on Small & Medium Business, 2016, pp. 345-350.
❖ K. W. Stephen, "Data Mining Model for Predicting Student Enrolment in STEM Courses
in Higher Education Institutions," 2016.
❖ A. Figueira, "Predicting Grades by Principal Component Analysis: A Data Mining
Approach to Learning Analytics’," in Advanced Learning Technologies (ICALT), 2016
IEEE 16th International Conference on, 2016, pp. 465-467.
❖ Amrieh, E. A., Hamtini, T., & Aljarah, I. (2016). Mining Educational Data to Predict
Student’s academic Performance using Ensemble Methods. International Journal of
Database Theory and Application, 9(8), 119-136.

SAI SPURTHI INSTITUTE OF TECHNOLOGY 2

Sample Project Report
No ratings yet
Sample Project Report
19 pages
Feature Selection for Student Performance
No ratings yet
Feature Selection for Student Performance
10 pages
Student Performance Prediction System
No ratings yet
Student Performance Prediction System
62 pages
Saps
No ratings yet
Saps
98 pages
Feature Selection Algorithms For Predicting Students Academic Performance Using Data Mining Techniques
No ratings yet
Feature Selection Algorithms For Predicting Students Academic Performance Using Data Mining Techniques
5 pages
Performance Evaluation of Feature Selection Algorithms in Educational Data Mining
No ratings yet
Performance Evaluation of Feature Selection Algorithms in Educational Data Mining
9 pages
Bee Jay1
No ratings yet
Bee Jay1
11 pages
1 Report
No ratings yet
1 Report
45 pages
Student Performance Analysis System Report
No ratings yet
Student Performance Analysis System Report
41 pages
Student Performance Analysis Report
No ratings yet
Student Performance Analysis Report
40 pages
Documentation Miniproject Alen-Final
No ratings yet
Documentation Miniproject Alen-Final
40 pages
2 Final
No ratings yet
2 Final
45 pages
Report Student Abhijeet 20-04-24
No ratings yet
Report Student Abhijeet 20-04-24
40 pages
Predicting Student Performance
No ratings yet
Predicting Student Performance
38 pages
Novel Approach To Evaluate Student Performance Using Data Mining
No ratings yet
Novel Approach To Evaluate Student Performance Using Data Mining
6 pages
Review 1
No ratings yet
Review 1
10 pages
A5 PDF
No ratings yet
A5 PDF
5 pages
5 Students Academic Performance SINGLE FINAL JAIPRAKASH
No ratings yet
5 Students Academic Performance SINGLE FINAL JAIPRAKASH
83 pages
Welcome
No ratings yet
Welcome
20 pages
Machine Learning Glob (22241a1237)
No ratings yet
Machine Learning Glob (22241a1237)
16 pages
Artificial Intelligent Approach To Predict The Student Behaviour and Performance
No ratings yet
Artificial Intelligent Approach To Predict The Student Behaviour and Performance
11 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
33 pages
Predicting The Admissions of Students in Masters Program Using Machine Learning
No ratings yet
Predicting The Admissions of Students in Masters Program Using Machine Learning
16 pages
Predictive Analytics for Students
No ratings yet
Predictive Analytics for Students
16 pages
1822 B.E Cse Batchno 7
No ratings yet
1822 B.E Cse Batchno 7
60 pages
Myfinaldoc
No ratings yet
Myfinaldoc
77 pages
Paper Predicting Student Scores
No ratings yet
Paper Predicting Student Scores
10 pages
Contents
No ratings yet
Contents
5 pages
1922 B.SC Cs Batchno 1
No ratings yet
1922 B.SC Cs Batchno 1
46 pages
First Project
No ratings yet
First Project
34 pages
Report No - 01
No ratings yet
Report No - 01
5 pages
Student Grade Prediction Model
No ratings yet
Student Grade Prediction Model
106 pages
Computer Science Students Academic Performance Prediction Using Ai
No ratings yet
Computer Science Students Academic Performance Prediction Using Ai
68 pages
ICSMB2016-C Anuradha
No ratings yet
ICSMB2016-C Anuradha
7 pages
Binder 1
No ratings yet
Binder 1
93 pages
PDL Sem 3
No ratings yet
PDL Sem 3
36 pages
Proj Report 4
No ratings yet
Proj Report 4
12 pages
Student Performance Analysis System
No ratings yet
Student Performance Analysis System
11 pages
Student Score Prediction with ML
No ratings yet
Student Score Prediction with ML
24 pages
Performance Improvement in Education Sector Using Datamining
No ratings yet
Performance Improvement in Education Sector Using Datamining
21 pages
Anmol MJ2
No ratings yet
Anmol MJ2
39 pages
Regression Analysis of Student Academic Performance Using Deep Learning
No ratings yet
Regression Analysis of Student Academic Performance Using Deep Learning
16 pages
Society of Information Technology Students Journal
No ratings yet
Society of Information Technology Students Journal
7 pages
An Mini Project Report On
64% (11)
An Mini Project Report On
31 pages
1.student Performance Prediction Techniques
No ratings yet
1.student Performance Prediction Techniques
5 pages
Student Data Mining Insights
No ratings yet
Student Data Mining Insights
3 pages
Student Database Project Report 2022-23
No ratings yet
Student Database Project Report 2022-23
30 pages
Yash 21BSDS12 Perdictive Analysis Report
No ratings yet
Yash 21BSDS12 Perdictive Analysis Report
20 pages
Project Interim
No ratings yet
Project Interim
13 pages
Student Placement Prediction System
No ratings yet
Student Placement Prediction System
5 pages
Future Eye Report
No ratings yet
Future Eye Report
40 pages
Lucky Mini Project
No ratings yet
Lucky Mini Project
32 pages
Project Thesis On Swarm Intelligence
No ratings yet
Project Thesis On Swarm Intelligence
43 pages
Student Academic Analyser and Career Guidance System Using Data Analytics and Vi
No ratings yet
Student Academic Analyser and Career Guidance System Using Data Analytics and Vi
6 pages
c45 K Nearest Neighbor Naïve Bayes and R b0991171
No ratings yet
c45 K Nearest Neighbor Naïve Bayes and R b0991171
10 pages
Student Performance Prediction Using Machine Learn
No ratings yet
Student Performance Prediction Using Machine Learn
8 pages
Retrieve 1
No ratings yet
Retrieve 1
9 pages
College Data Analysis and Prediction SRS
100% (1)
College Data Analysis and Prediction SRS
68 pages
App 23-24 Btech
No ratings yet
App 23-24 Btech
1 page
Emotion Recognition System: An Android Based Visual Training System For Autistic Children
No ratings yet
Emotion Recognition System: An Android Based Visual Training System For Autistic Children
5 pages
Chapter 10 Exam Answers 2019 - 100% Full
No ratings yet
Chapter 10 Exam Answers 2019 - 100% Full
27 pages
Chapter 11 Exam Answers 2019 - 100% Full
No ratings yet
Chapter 11 Exam Answers 2019 - 100% Full
24 pages
8 Tips For A Smooth Technical Interview
No ratings yet
8 Tips For A Smooth Technical Interview
10 pages
CS04
No ratings yet
CS04
2 pages
Internship Experience Confirmation Letter
No ratings yet
Internship Experience Confirmation Letter
1 page
Date: 8-07-2024: " Science"
No ratings yet
Date: 8-07-2024: " Science"
12 pages
Future Plans: Solutions 2nd Edition Advanced Classroom Activity
No ratings yet
Future Plans: Solutions 2nd Edition Advanced Classroom Activity
2 pages
Digdex 2019 Framework
No ratings yet
Digdex 2019 Framework
3 pages
Foxall - 2010 - Accounting For Consumer Choice Inter-Temporal Decision Making in Behavioural Perspective - Marketing Theory
No ratings yet
Foxall - 2010 - Accounting For Consumer Choice Inter-Temporal Decision Making in Behavioural Perspective - Marketing Theory
31 pages
Ankur Sharma
No ratings yet
Ankur Sharma
1 page
Benjamin Fowler - MS - E.8.9A - GeologicalEvents - EXPLORE - 4activity - StudentJournal
No ratings yet
Benjamin Fowler - MS - E.8.9A - GeologicalEvents - EXPLORE - 4activity - StudentJournal
5 pages
9D Research Group
No ratings yet
9D Research Group
8 pages
Architecture Portfolio.
No ratings yet
Architecture Portfolio.
57 pages
Penguin Readers Factsheets: by Jules Verne
No ratings yet
Penguin Readers Factsheets: by Jules Verne
2 pages
BENLAC Module 2 21st Century Skill Categories P - 085412
No ratings yet
BENLAC Module 2 21st Century Skill Categories P - 085412
44 pages
Day 8 Summer 2025 Program Slides
No ratings yet
Day 8 Summer 2025 Program Slides
17 pages
Sri Suparmi
No ratings yet
Sri Suparmi
247 pages
Genetics Packet Answer Key
No ratings yet
Genetics Packet Answer Key
5 pages
Pembuatan Bioplastik Berbasis Komposit Pati Sagu-Carboxymethyl
No ratings yet
Pembuatan Bioplastik Berbasis Komposit Pati Sagu-Carboxymethyl
7 pages
Ellora Rich - Seið The Magic Words
No ratings yet
Ellora Rich - Seið The Magic Words
69 pages
Abba Bishoy
No ratings yet
Abba Bishoy
2 pages
The Beauty and The Bolshevist by Miller, Alice Duer, 1874-1942
No ratings yet
The Beauty and The Bolshevist by Miller, Alice Duer, 1874-1942
49 pages
Mrs. Bectors IPO Details & Analysis
No ratings yet
Mrs. Bectors IPO Details & Analysis
7 pages
Coast Artillery Journal - Feb 1944
100% (3)
Coast Artillery Journal - Feb 1944
100 pages
Patterns of Development
No ratings yet
Patterns of Development
49 pages
Daily Crash Cart Inventory Checklist
No ratings yet
Daily Crash Cart Inventory Checklist
2 pages
A Study On The Satisfaction of Customers at SRM Hotels
No ratings yet
A Study On The Satisfaction of Customers at SRM Hotels
207 pages
Macropoxy 920 Pre-Prime: Protective & Marine Coatings
No ratings yet
Macropoxy 920 Pre-Prime: Protective & Marine Coatings
4 pages
Understanding Sensation and Perception
No ratings yet
Understanding Sensation and Perception
17 pages
History of Musical Notation
No ratings yet
History of Musical Notation
2 pages
Multivariate Statistics Made Simple A Practical Approach by K. v. S. Sarma, R. Vishnu Vardhan
100% (1)
Multivariate Statistics Made Simple A Practical Approach by K. v. S. Sarma, R. Vishnu Vardhan
259 pages
Mcom
No ratings yet
Mcom
44 pages
Factors Affecting Photosynthesis
No ratings yet
Factors Affecting Photosynthesis
18 pages
Nakamichi CD Player 4 Dac A..
No ratings yet
Nakamichi CD Player 4 Dac A..
19 pages
Lane Detection in Autonomous Vehicles
No ratings yet
Lane Detection in Autonomous Vehicles
11 pages
Ebooks File (Ebook PDF) Introductory Computer Forensics: A Hands-On Practical Approach 1st Ed. 2018 Edition All Chapters
100% (2)
Ebooks File (Ebook PDF) Introductory Computer Forensics: A Hands-On Practical Approach 1st Ed. 2018 Edition All Chapters
55 pages