Mini
Mini
in
By
CERTIFICATE
This is to certify that the project entitled Optimal feature selection Algorithm for predicting
student’s academic performance being submitted by
during the academic year 2024-25, in partial fulfillment for the award of the degree of
Bachelor of Technology in Computer Science and Engineering offered by
Sai Spurthi Institute of Technology, affiliated to the Jawaharlal Nehru Technological
University, Hyderabad.
We hereby declare that the project report entitled "Optimal feature selection Algorithm
for predicting student’s academic performance” is done in the partial fulfillment for the
award of the Degree in Bachelor of Technology in Computer Science and Engineering affiliated
to Jawaharlal Nehru Technological University, Hyderabad. This project has not been submitted
anywhere else.
CH.UMASHANKAR (21C51A0564)
N.CHARAN TEJA(21C51A0558)
ACKNOWLEDGMENT
On the submission of our project entitled “Optimal feature selection Algorithm for
predicting student’s academic performance”, we would like to extend our gratitude and sincere
thanks to our supervisor, Mr. V.V. Siva Prasad, Assistant Professor, Department of Computer
We would also like to extend our sincere thanks to Dr. SK. YAKOOB, Associate Professor,
Head of the Department, Computer Science and Engineering, for his valuable suggestions and
We endow our sincere thanks to Principal Dr. VSR KUMARI for his consistent cooperation
and encouragement.
We would like to thank the entire CSE Department faculty, who helped us directly and
indirectly in the completion of the project. We sincerely thank our friends and family for their
CH.SANDHYA RANI(21C51A0509)
CH.UMASHANKAR (21C51A0564)
N.CHARAN TEJA(21C51A0558)
INDEX
1. INTRODUCTION 1-4
4 SYSTEM DESIGN
4.1. INTRODUCTION
4.2 UML DIAGRAMS
4.3 . E-R DIAGRAM
4.4 . DATA DICTIONARY
5 SYSTEM STUDY
6 SYSTEM IMPLEMENTATION
7 SYSTEM TESTING
9 CONCLUSION
10 REFERENCES
1.INTRODUCTION
The principal purpose of these FS algorithms is to select the most predictive features
from the chosen dataset for analysis and ignore the rest of the attribute, which is non predictive.
It means that non-predictive elements are not affecting the actual result, but it reduces the
complexity of the analysis results. The accuracy and effectiveness of the student's performance
prediction model can have improved with the help of these feature selection algorithms. These
feature selection algorithms can have further divided into three different groups, namely filter,
wrapper and integrated methods. The filtering methods of feature selection algorithms is one of
the primary techniques which depends on the general characteristics of the learning data and get
performed during the pre-processing phase of the dataset. The Wrapper method is used to
evaluate functions using learning algorithms. Embedded methods are executed during the
classifier's learning process and be more specific to learning algorithms.
1. Enhanced Prediction Accuracy: By selecting the most relevant features, the algorithm
helps in building models that better capture the factors influencing academic
performance, leading to more accurate predictions.
3. Improved Model Performance: A streamlined set of features can lead to faster and
more efficient model training and execution.
4. Better Interpretability: Models with fewer, more relevant features are easier to
interpret, helping educators and administrators understand the key factors affecting
student performance.
In essence, an optimal feature selection algorithm helps create more reliable, interpretable,
and efficient models for predicting students' academic performance, ultimately aiding in
making informed decisions to enhance educational outcomes.
1.3 EXISTING SYSTEM & ITS DISADVANTAGES
EXISTING SYSTEM:
The given section is a short review of work done in the area of feature selection
algorithm by a different researcher. Many authors used feature selection (FS) algorithms
in combination with classification algorithms to compare the prediction accuracy of
varying student dataset. Some of the exciting work in this field of EDM has reviewed.
Siva Kumar S, Venkataraman S, et al., "Predictive Modeling of Student Dropout
Indicators in Educational Data Mining using Improved Decision Tree," proposed an
improved version of decision tree algorithm which will predict the dropout students. The
dataset of 240 students has been collected by the authors via survey and then applied the
correlation-based feature selection algorithm for preprocessing of the dataset.
The classification accuracy of this dataset is more than 90%. K. W. Stephen et al., in his
study "Data Mining Model for Predicting Student Enrolment in STEM Courses in
Higher Education Institutions," predict the fresh students’ enrolment in the course of
STEM (Science, Technology, Engineering and Mathematics). They selected 18 different
features and collected data from students through the questionnaire. For the pre-
processing of data, authors used Chi-Square and IG feature selection algorithm and
found better prediction with CART decision tree algorithm. E. Osmanbegović, et al., in
his study "Determining Dominant Factor for students Performance Prediction by using
Data Mining Classification Algorithms," calculate the academic performance of the
secondary school education student at Tuzla. For the pre-processing phase of the
collected dataset, they used Gain Ratio (GR) feature selection algorithm. They found the
best prediction accuracy with the Random Forest (RF) algorithm as compared to other
classification algorithms. A. Figueira, et al., "Predicting Grades by Principal Component
Analysis: A Data Mining Approach to Learning Analytics'," predict the students’
academic grade in Bachelor Degree program. For pre-processing phase, authors used
Principal Component Analysis (PCA) feature selection algorithm.
In this study, PCA feature selection algorithm has been used to build a decision tree.
This tree is used to predict the grade of the student in academic. N. Rachburee and W.
Punlumjeak et al., in his study "A comparison of feature selection approach between
greedy, IG-ratio, Chi-square, and mRMR in educational mining, "compare different
feature selection algorithm like IG-ration, Chi-Square, Greedy Forward selection and
mRMR. This work has conducted on the first year's student's dataset (with 15 attributes)
of the University of Technology, Thailand. In this research, authors found better
prediction accuracy by using Greedy Forward (GF) selection with Artificial Neural
Network (ANN) as compared to other classification algorithms (Decision Tree, K-NN
and Naive Bayes). M. Zaffar, M. A. Hashmani et al., in his study "Performance analysis
of feature selection algorithm for educational data mining," implemented different filter
feature selection algorithms on the selected dataset of a student. In this research, authors
used the dataset of two different students with a different number of feature selection
algorithm and analyzed the result for prediction accuracy.
Disadvantages:
• Here, four FS algorithms such as CfsSubsetEval,
GainRatioAttributeEval, InfoGainAttributeEval and
ReliefAttributeEval are evaluated.
• Classifications algorithms Naive Bayes (NB), Logistic Regression (LR),
DecisionTable (DT), JRip, J48 and Random Forest (RF) has evaluated
through academic algorithms.
• cfsSubsetEval: Attributes subsets are evaluated based on both the
predictive ability and the degree of redundancy of each feature.
• Low intercorrelations have preferred for those features that have
extremely related to the class.
• Attribute Subset Evaluator (cfsSubsetEval) + Search Method (Best
first (forwards)) In table-2, best seven attributes (gender, Relation,
raisedhands, VisITedResources, AnnouncementsView,
ParentAnsweringSurvey, StudentAbsenceDays.) has selected based
on the FS mentioned above algorithm.
• It providing most classified instances with accuracy up to 77.29%
and least is JRip with accuracy up to 73.75%.
PROPOSED SYSTEM:
In the proposed system, The principal purpose of these FS algorithms is to select the most
predictive features from the chosen dataset for analysis and ignore the rest of the attribute,
which is non predictive. It means that non-predictive elements are not affecting the actual
result, but it reduces the complexity of the analysis results. The accuracy and effectiveness of
the student's performance prediction model can have improved with the help of these feature
selection algorithms. These feature selection algorithms can have further divided into three
different groups, namely filter, wrapper and integrated methods. The filtering methods of
feature selection algorithms is one of the primary techniques which depends on the general
characteristics of the learning data and get performed during the pre-processing phase of the
dataset. The Wrapper method is used to evaluate functions using learning algorithms.
Embedded methods are executed during the classifier's learning process and be more specific
to learning algorithms.
Advantages
➢ The proposed methodology implements Hashing techniques method which is more fast
and reliable method to process the data.
➢ The proposed system implemented Clustering chain techniques based on Hashing for
improving the system performance
2.REVIEW OF RELATED WORK
The given section is a short review of work done in the area of feature selection algorithm by
a different researcher. Many authors used feature selection (FS) algorithms in combination
with classification algorithms to compare the prediction accuracy of varying student dataset.
Some of the exciting work in this field of EDM has reviewed.
• Siva Kumar S, Venkataraman S, et al., "Predictive Modeling of Student Dropout
Indicators in Educational Data Mining using Improved Decision Tree," proposed an
improved version of decision tree algorithm which will predict the dropout students.
The dataset of 240 students has been collected by the authors via survey and then
applied the correlation-based feature selection algorithm for preprocessing of the
dataset. The classification accuracy of this dataset is more than 90%.
• K. W. Stephen et al., in his study "Data Mining Model for Predicting Student
Enrolment in STEM Courses in Higher Education Institutions," predict the fresh
students’ enrolment in the course of STEM (Science, Technology, Engineering and
Mathematics). They selected 18 different features and collected data from students
through the questionnaire. For the pre-processing of data, authors used Chi-Square
and IG feature selection algorithm and found better prediction with CART decision
tree algorithm.
• E. Osmanbegović, et al., in his study "Determining Dominant Factor for students
Performance Prediction by using Data Mining Classification Algorithms," calculate
the academic performance of the secondary school education student at Tuzla. For the
pre-processing phase of the collected dataset, they used Gain Ratio (GR) feature
selection algorithm. They found the best prediction accuracy with the Random Forest
(RF) algorithm as compared to other classification algorithms.
• A. Figueira, et al., "Predicting Grades by Principal Component Analysis: A Data
Mining Approach to Learning Analytics'," predict the students’ academic grade in
Bachelor Degree program. For pre-processing phase, authors used Principal
Component Analysis (PCA) feature selection algorithm. In this study, PCA feature
selection algorithm has been used to build a decision tree. This tree is used to predict
the grade of the student in academic.
• N. Rachburee and W. Punlumjeak et al., in his study "A comparison of feature
selection approach between greedy, IG-ratio, Chi-square, and mRMR in educational
mining, "compare different feature selection algorithm like IG-ration, Chi-Square,
Greedy Forward selection and mRMR. This work has conducted on the first year's
student's dataset (with 15 attributes) of the University of Technology, Thailand. In
this research, authors found better prediction accuracy by using Greedy Forward (GF)
selection with Artificial Neural Network (ANN) as compared to other classification
algorithms (Decision Tree, K-NN and Naive Bayes).
• M. Zaffar, M. A. Hashmani et al., in his study "Performance analysis of feature
selection algorithm for educational data mining," implemented different filter feature
selection algorithms on the selected dataset of a student. In this research, authors used
the dataset of two different students with a different number of feature selection
algorithm and analyzed the result for prediction accuracy.
3.SOFTWARE REQUIREMENT SPECIFICATIONS
With most programming languages, you either compile or interpret a program so that you can
run it on your computer. The Java programming language is unusual in that a program is both
compiled and interpreted. With the compiler, first you translate a program into an
intermediate language called Java byte codes —the platform-independent codes interpreted
by the interpreter on the Java platform. The interpreter parses and runs each Java byte code
instruction on the computer. Compilation happens just once; interpretation occurs each time
the program is executed. The following figure illustrates how this works.
You can think of Java byte codes as the machine code instructions for the Java Virtual
Machine (Java VM). Every Java interpreter, whether it’s a development tool or a Web
browser that can run applets, is an implementation of the Java VM. Java byte codes help
make “write once, run anywhere” possible. You can compile your program into byte codes
on any platform that has a Java compiler. The byte codes can then be run on any
implementation of the Java VM. That means that as long as a computer has a Java VM, the
same program written in the Java programming language can run on Windows 2000, a
Solaris workstation, or on an iMac.
You’ve already been introduced to the Java VM. It’s the base for the Java platform and is
ported onto various hardware-based platforms.
The Java API is a large collection of ready-made software components that provide many
useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped
into libraries of related classes and interfaces; these libraries are known as packages. The
next section, What Can Java Technology Do? Highlights what functionality some of the
packages in the Java API provide.
The following figure depicts a program that’s running on the Java platform. As the figure
shows, the Java API and the virtual machine insulate the program from the hardware.
Native code is code that after you compile it, the compiled code runs on a specific hardware
platform. As a platform-independent environment, the Java platform can be a bit slower than
native code. However, smart compilers, well-tuned interpreters, and just-in-time byte code
compilers can bring performance close to that of native code without threatening portability.
What Can Java Technology Do?
The most common types of programs written in the Java programming language are applets
and applications. If you’ve surfed the Web, you’re probably already familiar with applets.
An applet is a program that adheres to certain conventions that allow it to run within a Java-
enabled browser.
However, the Java programming language is not just for writing cute, entertaining applets for
the Web. The general-purpose, high-level Java programming language is also a powerful
software platform. Using the generous API, you can write many types of programs.
An application is a standalone program that runs directly on the Java platform. A special kind
of application known as a server serves and supports clients on a network. Examples of
servers are Web servers, proxy servers, mail servers, and print servers. Another specialized
program is a servlet. A servlet can almost be thought of as an applet that runs on the server
side. Java Servlets are a popular choice for building interactive web applications, replacing
the use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of
applications. Instead of working in browsers, though, servlets run within Java Web servers,
configuring or tailoring the server.
How does the API support all these kinds of programs? It does so with packages of software
components that provides a wide range of functionality. Every full implementation of the
Java platform gives you the following features:
• The essentials: Objects, strings, threads, numbers, input and output, data structures,
system properties, date and time, and so on.
• Applets: The set of conventions used by applets.
• Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram
Protocol) sockets, and IP (Internet Protocol) addresses.
• Internationalization: Help for writing programs that can be localized for users worldwide.
Programs can automatically adapt to specific locales and be displayed in the appropriate
language.
• Security: Both low level and high level, including electronic signatures, public and
private key management, access control, and certificates.
• Software components: Known as JavaBeansTM, can plug into existing component
architectures.
• Object serialization: Allows lightweight persistence and communication via Remote
Method Invocation (RMI).
• Java Database Connectivity (JDBCTM): Provides uniform access to a wide range of
relational databases.
The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration,
telephony, speech, animation, and more. The following figure depicts what is included in the
Java 2 SDK.
• Distribute software more easily: You can upgrade applets easily from a central server.
Applets take advantage of the feature of allowing new classes to be loaded “on the fly,”
without recompiling the entire program.
ODBC
Microsoft Open Database Connectivity (ODBC) is a standard programming interface for
application developers and database systems providers. Before ODBC became a de facto
standard for Windows programs to interface with database systems, programmers had to use
proprietary languages for each database they wanted to connect to. Now, ODBC has made
the choice of the database system almost irrelevant from a coding perspective, which is as it
should be. Application developers have much more important things to worry about than the
syntax that is needed to port their program from one database to another when business needs
suddenly change.
Through the ODBC Administrator in Control Panel, you can specify the particular database
that is associated with a data source that an ODBC application program is written to use.
Think of an ODBC data source as a door with a name on it. Each door will lead you to a
particular database. For example, the data source named Sales Figures might be a SQL
Server database, whereas the Accounts Payable data source could refer to an Access
database. The physical database referred to by a data source can reside anywhere on the
LAN.
The ODBC system files are not installed on your system by Windows 95. Rather, they are
installed when you setup a separate database application, such as SQL Server Client or
Visual Basic 4.0. When the ODBC icon is installed in Control Panel, it uses a file called
ODBCINST.DLL. It is also possible to administer your ODBC data sources through a stand-
alone program called ODBCADM.EXE. There is a 16-bit and a 32-bit version of this
program and each maintains a separate list of ODBC data sources.
From a programming perspective, the beauty of ODBC is that the application can be written
to use the same set of function calls to interface with any data source, regardless of the
database vendor. The source code of the application doesn’t change whether it talks to Oracle
or SQL Server. We only mention these two as an example. There are ODBC drivers available
for several dozen popular database systems. Even Excel spreadsheets and plain text files can
be turned into data sources. The operating system uses the Registry information written by
ODBC Administrator to determine which low-level ODBC drivers are needed to talk to the
data source (such as the interface to Oracle or SQL Server). The loading of the ODBC
drivers is transparent to the ODBC application program. In a client/server environment, the
ODBC API even handles many of the network issues for the application programmer.
The advantages of this scheme are so numerous that you are probably thinking there must be
some catch. The only disadvantage of ODBC is that it isn’t as efficient as talking directly to
the native database interface. ODBC has had many detractors make the charge that it is too
slow. Microsoft has always claimed that the critical factor in performance is the quality of the
driver software that is used. In our humble opinion, this is true. The availability of good
ODBC drivers has improved a great deal recently. And anyway, the criticism about
performance is somewhat analogous to those who said that compilers would never match the
speed of pure assembly language. Maybe not, but the compiler (or ODBC) gives you the
opportunity to write cleaner programs, which means you finish sooner. Meanwhile,
computers get faster every year.
JDBC
In an effort to set an independent database standard API for Java; Sun Microsystems
developed Java Database Connectivity, or JDBC. JDBC offers a generic SQL database
access mechanism that provides a consistent interface to a variety of RDBMSs. This
consistent interface is achieved through the use of “plug-in” database connectivity modules,
or drivers. If a database vendor wishes to have JDBC support, he or she must provide the
driver for each platform that the database and Java run on.
To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC. As you
discovered earlier in this chapter, ODBC has widespread support on a variety of platforms.
Basing JDBC on ODBC will allow vendors to bring JDBC drivers to market much faster
than developing a completely new connectivity solution.
JDBC was announced in March of 1996. It was released for a 90 day public review that
ended June 8, 1996. Because of user input, the final JDBC v1.0 specification was released
soon after.
The remainder of this section will cover enough information about JDBC for you to know
what it is about and how to use it effectively. This is by no means a complete overview of
JDBC. That would fill an entire book.
JDBC Goals
Few software packages are designed without goals in mind. JDBC is one that, because of its
many goals, drove the development of the API. These goals, in conjunction with early
reviewer feedback, have finalized the JDBC class library into a solid framework for building
database applications in Java.
The goals that were set for JDBC are important. They will give you some insight as to why
certain classes and functionalities behave the way they do. The eight design goals for JDBC
are as follows:
completing a task per mechanism. Allowing duplicate functionality only serves to confuse
the users of the API.
6. Use strong, static typing wherever possible
Strong typing allows for more error checking to be done at compile time; also, less error
appear at runtime.
7. Keep the common cases simple
Because more often than not, the usual SQL calls used by the programmer are simple
SELECT’s, INSERT’s, DELETE’s and UPDATE’s, these queries should be simple to
perform with JDBC. However, more complex SQL statements should also be possible.
Finally we decided to proceed the implementation using Java Networking. And for
dynamically updating the cache table we go for MS Access database.
Java ha two things: a programming language and a platform.
Java is a high-level programming language that is all of the following
▪ Simple
▪ Architecture-neutral
▪ Object-oriented
▪ Portable
▪ Distributed
▪ High-performance
▪ Interpreted
▪ multithreaded
▪ Robust
▪ Dynamic
▪ Secure
Java is also unusual in that each Java program is both compiled and interpreted. With a
compile you translate a Java program into an intermediate language called Java byte codes
the platform-independent code instruction is passed and run on the computer.
Compilation happens just once; interpretation occurs each time the program is executed. The
figure illustrates how this works.
You can think of Java byte codes as the machine code instructions for the Java Virtual
Machine (Java VM). Every Java interpreter, whether it’s a Java development tool or a Web
browser that can run Java applets, is an implementation of the Java VM. The Java VM can
also be implemented in hardware.
Java byte codes help make “write once, run anywhere” possible. You can compile your Java
program into byte codes on my platform that has a Java compiler. The byte codes can then be
run any implementation of the Java VM. For example, the same Java program can run
Windows NT, Solaris, and Macintosh.
4.1 INTRODUCTION
The System Design of the Optimal Feature Selection Algorithm for Predicting Students'
Academic Performance provides a blueprint for how the system will function and operate. It
focuses on defining the architecture, components, and workflow of the system to ensure it
meets functional and non-functional requirements effectively.
1. Translate the functional requirements into a structured and efficient system architecture.
2. Define how data flows through various components and how the modules interact.
This design will guide the development process, ensuring the system fulfills its objective of
improving predictive accuracy by selecting optimal features from student data.
• Data Input and Preprocessing: Handling data formats, cleaning, and preparing for
analysis.
• Feature Selection Module: Identifying the most relevant features for prediction using
various algorithms.
• Model Training and Evaluation: Training machine learning models with selected features
and evaluating their performance.
This design will support educators and researchers in analyzing factors influencing academic
performance while enabling timely interventions for students at risk.
Design Approach
1. Modular Architecture: The system is divided into discrete modules (e.g., data
preprocessing, feature selection, model training) for ease of development, testing, and
maintenance.
2. Technology Stack: The system is implemented using Java, leveraging its robust
ecosystem for building scalable, platform-independent applications.
3. Three-Tier Architecture:
o Data Layer: A database for managing datasets, feature selection results, and
model outputs.
4. Scalability: Designed to handle large datasets and adapt to new data formats or
algorithms.
4.2.1CLASS DIAGRAM:
A Class Diagram is a type of static structure diagram in Unified Modeling Language (UML)
that describes the structure of a system by showing its classes, their attributes, operations (or
methods), and the relationships among the classes. It is one of the most commonly used
diagrams in object-oriented design.
A Data Flow Diagram (DFD) is a graphical representation of the flow of data through an
information system. It visually shows how data moves from input to output and through
processes within the system. DFDs help to understand the system’s functionality, data
requirements, and interactions between different components.
The feasibility of the project is analyzed in this phase and business proposal is put forth
with a very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the major
requirements for the system is essential.
ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development
of the system is limited. The expenditures must be justified. Thus the developed system as well
within the budget and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased.
TECHNICAL FEASIBILITY
This study is carried out to check the technical feasibility, that is, the technical requirements
of the system. Any system developed must not have a high demand on the available technical
resources. This will lead to high demands on the available technical resources. This will lead to
high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this system.
SOCIAL FEASIBILITY
The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel
threatened by the system, instead must accept it as a necessity. The level of acceptance by the
users solely depends on the methods that are employed to educate the user about the system and
to make him familiar with it. His level of confidence must be raised so that he is also able to
make some constructive criticism, which is welcomed, as he is the final user of the system.
SAI SPURTHI INSTITUTE OF TECHNOLOGY 2
Optimal Feature Selection Algorithm for Predicting Students Academic Performance
Modules
Admin:
In this module, the Admin has to login by using valid user name and password. After login
successful he can do some operations such as Login, View All Users and Authorize, Add
Category, View Category Hash code, View All Datasets, View All Datasets By Classification
Mining, View Grade Results.
Remote User:
In this module, there are n numbers of users are present. User should register before doing any
operations. Once user registers, their details will be stored to the database. After registration
successful, he has to login by using authorized user name and password. Once Login is
successful user will do some operations like Register and Login, My Profile, Upload Datasets,
View All Uploaded Datasets, Find Student Grade, Find Student Grade By Hash code.
6. SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the
Software system meets its requirements and user expectations and does not fail in an
unacceptable manner. There are various types of test. Each test type addresses a specific testing
requirement.
TYPES OF TESTS
Unit testing
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately
to the documented specifications and contains clearly defined inputs and expected results.
Integration testing
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the combination of components is
correct and consistent. Integration testing is specifically aimed at exposing the problems that
arise from the combination of components.
Functional test
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures: interfacing systems or procedures must be invoked.
System Test
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.
Unit testing is usually conducted as part of a combined code and unit test phase of the
software lifecycle, although it is not uncommon for coding and unit testing to be conducted as
two distinct phases.
Test objectives
• All field entries must work properly.
• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.
Features to be tested
• Verify that the entries are of the correct format
• No duplicate entries should be allowed
• All links should take the user to the correct page.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
TESTING METHODOLOGIES
The following are the Testing Methodologies:
o Unit Testing.
o Integration Testing.
o User Acceptance Testing.
o Output Testing.
o Validation Testing.
Unit Testing
Unit testing focuses verification effort on the smallest unit of Software design that is the
module. Unit testing exercises specific paths in a module’s control structure to ensure complete
coverage and maximum error detection. This test focuses on each module individually, ensuring
that it functions properly as a unit. Hence, the naming is Unit Testing.
During this testing, each module is tested individually and the module interfaces are
verified for the consistency with design specification. All important processing path are tested
for the expected results. All error handling paths are also tested.
Integration Testing
Integration testing addresses the issues associated with the dual problems of verification
and program construction. After the software has been integrated a set of high order tests are
conducted. The main objective in this testing process is to take unit tested modules and builds a
program structure that has been dictated by design.
The following are the types of Integration Testing:
2. Bottom-up Integration
This method begins the construction and testing with the modules at the lowest level in
the program structure. Since the modules are integrated from the bottom up, processing required
for modules subordinate to a given level is always available and the need for stubs is eliminated.
The bottom up integration strategy may be implemented with the following steps:
▪ The low-level modules are combined into clusters into clusters that perform a
specific Software sub-function.
▪ A driver (i.e.) the control program for testing is written to coordinate test case input and
output.
▪ The cluster is tested.
▪ Drivers are removed and clusters are combined moving upward in the program
structure
The bottom up approaches tests each module individually and then each module is module is
integrated with a main module and tested for functionality.
Output Testing
After performing the validation testing, the next step is output testing of the proposed
system, since no system could be useful if it does not produce the required output in the specified
format. Asking the users about the format required by them tests the outputs generated or
displayed by the system under consideration. Hence the output format is considered in 2 ways –
one is on screen and another in printed format.
Validation Checking
Validation checks are performed on the following fields.
Text Field:
The text field can contain only the number of characters lesser than or equal to its size.
The text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect entry
always flashes and error message.
Numeric Field:
The numeric field can contain only numbers from 0 to 9. An entry of any character
flashes an error messages. The individual modules are checked for accuracy and what it has to
perform. Each module is subjected to test run along with sample data. The individually tested
modules are integrated into a single system. Testing involves executing the real data
information is used in the program the existence of any program defect is inferred from the
output. The testing should be planned so that all the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data and produces
and output revealing the errors in the system.
entered themselves.
It is difficult to obtain live data in sufficient amounts to conduct extensive testing. And,
although it is realistic data that will show how the system will perform for the typical processing
requirement, assuming that the live data entered are in fact typical, such data generally will not
test all combinations or formats that can enter the system. This bias toward typical values then
does not provide a true systems test and in fact ignores the cases most likely to cause system
failure.
Using Artificial Test Data:
Artificial test data are created solely for test purposes, since they can be generated to test all
combinations of formats and values. In other words, the artificial data, which can quickly be
prepared by a data generating utility program in the information systems department, make
possible the testing of all login and control paths through the program.
The most effective test programs use artificial test data generated by persons other than those
who wrote the programs. Often, an independent team of testers formulates a testing plan, using
the systems specifications.
The package “Virtual Private Network” has satisfied all the requirements specified as per
software requirement specification and was accepted.
USER TRAINING
Whenever a new system is developed, user training is required to educate them about the
working of the system so that it can be put to efficient use by those for whom the system has
been primarily designed. For this purpose the normal working of the project was demonstrated to
the prospective users. Its working is easily understandable and since the expected users are
people who have good knowledge of computers, the use of this system is very easy.
MAINTAINENCE
This covers a wide range of activities including correcting code and design errors. To reduce the
need for maintenance in the long run, we have more accurately defined the user’s requirements
during the process of system development. Depending on the requirements, this system has been
developed to satisfy the needs to the largest possible extent. With development in technology, it
may be possible to add many more features based on the requirements in future. The coding and
designing is simple and easy to understand which will make maintenance easier.
TESTING STRATEGY :
A strategy for system testing integrates system test cases and design techniques into a well
planned series of steps that results in the successful construction of software. The testing
strategy must co-operate test planning, test case design, test execution, and the resultant data
collection and evaluation .A strategy for software testing must accommodate low-level tests
that are necessary to verify that a small source code segment has been correctly implemented
as well as high level tests that validate major system functions against user requirements.
Software testing is a critical element of software quality assurance and represents the ultimate
review of specification design and coding. Testing represents an interesting anomaly for the
software. Thus, a series of testing are performed for the proposed system before the system is
ready for user acceptance testing.
SYSTEM TESTING:
Software once validated must be combined with other system elements (e.g. Hardware, people,
database). System testing verifies that all the elements are proper and that overall system
function performance is achieved. It also tests to find discrepancies between the system and its
original objective, current specifications and system documentation.
UNIT TESTING:
In unit testing different are modules are tested against the specifications produced during the
design for the modules. Unit testing is essential for verification of the code produced during the
coding phase, and hence the goals to test the internal logic of the modules. Using the detailed
design description as a guide, important Conrail paths are tested to uncover errors within the
boundary of the modules. This testing is carried out during the programming stage itself. In this
type of testing step, each module was found to be working satisfactorily as regards to the
expected output from the module.
In Due Course, latest technology advancements will be taken into consideration. As part
of technical build-up many components of the networking system will be generic in nature so
that future projects can either use or interact with this. The future holds a lot to offer to the
development and refinement of this project.
Admin login
Admin’s homepage:
Result:
Result:
CONCLUSION
In this work, different FS algorithms are evaluated and analyzed with different classification
algorithms (like Random Forest, J Rip, J48, Decision Tree, Linear Regression). The
implementation result of these FS algorithms doesn’t show any significant change range from
67.9167% to 77.2917% using WEKA toolkit. The cfs Subset Eval algorithm with Random Forest
algorithm gave the highest accuracy up to 77.2917%, and Relief Attribute Eval algorithm with
Decision Tree gave the lowest accuracy up to 67.9167%. From Figure 1, it is very much clear
that Random Forest, with almost all feature selection algorithms, shows better accuracy than
other algorithms in combination. In future, more feature selection algorithms are analyzed with
different classification algorithms to get better efficiency. The same work can also have done on
different student academic dataset. Apart from this, we can't overlook the benefits of feature
selection techniques in Data Mining
REFERENCES: