0% found this document useful (0 votes)
12 views

DM Tools Sample-1

Data mining

Uploaded by

jayasreepalani02
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

DM Tools Sample-1

Data mining

Uploaded by

jayasreepalani02
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

MUTHURANGAM GOVERNMENT

ARTS COLLEGE(AUTONOMOUS)
VELLORE-2
PG & Research Department of Computer Science
M.Sc., COMPUTER SCIENCE

PRACTICAL RECORD
2023-2024

REG.NO : ……………………………. NAME :…………………………………………

SUB.CODE : …………………………………………………………………………………

SUBJECT : ………………………………………………………………………………....

CLASS :………………………………………………………………………………….
MUTHURANGAM GOVERNMENT
ARTS COLLEGE (AUTONOMOUS)
VELLORE-2

PG & Research Department of Computer Science


M.Sc., COMPUTER SCIENCE

BONAFIDE CERTIFICATE

Certified to be a bonafide record of work done by ....................................................

(Reg.No……………………………….) in the Laboratory of this College, submitted

for...……………. Semester Practical Examination in ………………………………… during the

Academic Year …………………......

Staff In-charge Head of the Department

Submitted for the Practical Examination held on ………………………

Internal Examiner External Examiner


INDEX

SI. PAGE
DATE TITLE SIGN
NO. NO.

1. Preprocessing on dataset student.arff 1

2. Preprocessing on dataset labor.arff 4

Association rule process on dataset contactlenses.arff


3. 7
using apriori algorithm

Association rule process on dataset test.arff using


4. 10
apriori algorithm

Classification rule process on dataset student.arff using


5. 13
j48 Algorithm

Classification rule process on dataset employee.arff


6. 17
using j48 algorithm

Classification rule process on dataset employee.arff


7. 20
using id3 algorithm

Classification rule process on dataset employee.arff


8. 23
using naive bayes algorithm

Clustering rule process on dataset iris.arff using simple


9. 26
k-means

Clustering rule process on dataset student.arff using


10. 29
simple k-means

11. Prediction analysis model on Student marks 33

12. Predictive analysis model for Opening bank account 40

13. Predictive analysis on Attendance percentage 47

14. Predictive analysis for check the Fruit is Apple or Not 53

15. Prediction analysis model on Iris flower 62


Ex. No: 01
PREPROCESSING ON DATASET STUDENT.ARFF

AIM:
This experiment illustrates some of the basic data pre-processing operations that can be
performed using WEKA-Explorer. The sample dataset used for this example is the student data
available in arff format.

PROCEDURE:

1. Loading the data. We can load the dataset into weka by clicking on open button in pre-
processing interface and selecting the appropriate file.
2. Once the data is loaded, weka will recognize the attributes and during the scan of the data weka
will compute some basic strategies on each attribute. The left panel in the above figure shows
the list of recognized attributes while the top panel indicates the names of the base relation or
table and the current working relation (which are same initially).
3. Clicking on an attribute in the left panel will show the basic statistics on the attributes for the
categorical attributes the frequency of each attribute value is shown, while for continuous
attributes we can obtain min, max, mean, standard deviation and deviation etc.,
4. The visualization in the right button panel in the form of cross-tabulation across two attributes.
5. Selecting or filtering attributes:
Removing an attribute - When we need to remove an attribute, we can do this by using the
attribute filters in weka. In the filter model panel, click on choose button, this will show a popup
window with a list of available filters.
Scroll down the list and select the “weka.filters.unsupervised.attribute.remove” filters.
6. Next click the textbox immediately to the right of the choose button. In the resulting dialog
box enter the index of the attribute to be filtered out.
7. Make sure that invert selection option is set to false. The click OK now in the filter box you
will see “Remove-R-7”.
8. Click the apply button to apply filter to this data. This will remove the attribute and create new
working relation.
9. Save the new working relation as an arff file by clicking save button on the top (button)
panel.(student.arff)

1
DISCRETIZATION:

Sometimes association rule mining can only be performed on categorical data. This requires
performing discretization on numeric or continuous attributes. In the following example let us
discretize age attribute.
 Let us divide the values of age attribute into three bins (intervals).
 First load the dataset into weka (student.arff)
 Select the age attribute.
 Activate filter-dialog box and select “WEKA.filters.unsupervised.attribute.discretize”
from the list.
 To change the defaults for the filters, click on the box immediately to the right of the
choose button.
 We enter the index for the attribute to be discretized. In this case the attribute is age. So,
we must enter ‘1’ corresponding to the age attribute.
 Enter ‘3’ as the number of bins. Leave the remaining field values as they are.
 Click OK button.
 Click apply in the filter panel. This will result in a new working relation with the selected
attribute partition into 3 bins.
 Save the new working relation in a file called student-data-discretized.arff.

Dataset student.arff

@relation student

@attribute age {<30, 30-40, >40}


@attribute income {low, medium, high}
@attribute student {yes, no}
@attribute credit-rating {fair, excellent}
@attribute buyspc {yes, no}

@data
%
<30, high, no, fair, no
<30, high, no, excellent, no
30-40, high, no, fair, yes

2
>40, medium, no, fair, yes
>40, low, yes, fair, yes
>40, low, yes, excellent, no
30-40, low, yes, excellent, yes
<30, medium, no, fair, no
<30, low, yes, fair, no
>40, medium, yes, fair, yes
<30, medium, yes, excellent, yes
30-40, medium, no, excellent, yes
30-40, high, yes, fair, yes
>40, medium, no, excellent, no
%

The following screenshot shows the effect of discretization.

3
Ex. No: 02
PREPROCESSING ON DATASET LABOR.ARFF

AIM:
This experiment illustrates some of the basic data pre-processing operations that can be
performed using WEKA-Explorer. The sample dataset used for this example is the labor data available
in arff format.

PROCEDURE:

1. Loading the data. We can load the dataset into weka by clicking on open button in pre-
processing interface and selecting the appropriate file.
2. Once the data is loaded, weka will recognize the attributes and during the scan of the data weka
will compute some basic strategies on each attribute. The left panel in the above figure shows
the list of recognized attributes while the top panel indicates the names of the base relation or
table and the current working relation (which are same initially).
3. Clicking on an attribute in the left panel will show the basic statistics on the attributes for the
categorical attributes the frequency of each attribute value is shown, while for continuous
attributes we can obtain min, max, mean, standard deviation and deviation etc.,
4. The visualization in the right button panel in the form of cross-tabulation across two attributes.
5. Selecting or filtering attributes:
Removing an attribute - When we need to remove an attribute, we can do this by using the
attribute filters in weka. In the filter model panel, click on choose button, this will show a popup
window with a list of available filters.
Scroll down the list and select the “weka.filters.unsupervised.attribute.remove” filters.
6. Next click the textbox immediately to the right of the choose button. In the resulting dialog
box enter the index of the attribute to be filtered out.
7. Make sure that invert selection option is set to false. The click OK now in the filter box you
will see “Remove-R-7”.
8. Click the apply button to apply filter to this data. This will remove the attribute and create new
working relation.
9. Save the new working relation as an arff file by clicking save button on the top (button)
panel.(labor.arff)

4
DISCRETIZATION:

Sometimes association rule mining can only be performed on categorical data. This requires
performing discretization on numeric or continuous attributes. In the following example let us
discretize age attribute.
 Let us divide the values of age attribute into three bins (intervals).
 First load the dataset into weka (labor.arff)
 Select the age attribute.
 Activate filter-dialog box and select “WEKA.filters.unsupervised.attribute.discretize”
from the list.
 To change the defaults for the filters, click on the box immediately to the right of the
choose button.
 We enter the index for the attribute to be discretized. In this case the attribute is age. So,
we must enter ‘1’ corresponding to the age attribute.
 Enter ‘3’ as the number of bins. Leave the remaining field values as they are.
 Click OK button.
 Click apply in the filter panel. This will result in a new working relation with the selected
attribute partition into 3 bins.
 Save the new working relation in a file called student-data-discretized.arff.
Dataset labor.arff

5
The following screenshot shows the effect of discretization

6
Ex. No: 03
ASSOCIATION RULE PROCESS ON DATASET
CONTACTLENSES.ARFF USING APRIORI ALGORITHM

AIM:
This experiment illustrates some of the basic elements of association rule mining using WEKA.
The sample dataset used for this example is contactlenses.arff

PROCEDURE:

1. Open the data file in Weka Explorer. It is presumed that the required data fields have been
discretized. In this example it is age attribute.
2. Clicking on the associate tab will bring up the interface for association rule algorithm.
3. We will use Apriori algorithm. This is the default algorithm.
4. In-order to change the parameters for the run (example support, confidence etc) we click on the
text box immediately to the right of the choose button.
Dataset contactlenses.arff

7
The following screenshot shows the association rules that were generated when apriori
algorithm is applied on the given dataset.

8
9
Ex. No: 04
ASSOCIATION RULE PROCESS ON DATASET TEST.ARFF
USING APRIORI ALGORITHM

AIM:
This experiment illustrates some of the basic elements of association rule mining using WEKA.
The sample dataset used for this example is test.arff

PROCEDURE:

1. Open the data file in Weka Explorer. It is presumed that the required data fields have been
discretized. In this example it is age attribute.
2. Clicking on the associate tab will bring up the interface for association rule algorithm.
3. We will use Apriori algorithm. This is the default algorithm.
4. In-order to change the parameters for the run (example support, confidence etc) we click on the
text box immediately to the right of the choose button.

Dataset test.arff

@relation test

@attribute admissionyear {2005, 2006, 2007, 2008, 2009, 2010}

@attribute course {cse, mech, it, ece}

@data

2005, cse

2005, it

2005, cse

2006, mech

2006, it

2006, ece

2007, it

2007, cse

10
2008, it

2008, cse

2009, it

2009, ece

The following screenshot shows the association rules that were generated when apriori
algorithm is applied on the given dataset.

11
12
Ex. No: 05
CLASSIFICATION RULE PROCESS ON DATASET
STUDENT.ARFF USING J48 ALGORITHM

AIM:
This experiment illustrates the use of j-48 classifier in weka. The sample data set used in this
experiment is “student” data available at arff format. This document assumes that appropriate data
pre-processing has been performed.

PROCEDURE:

1. We begin the experiment by loading the data (student.arff) into weka.


2. Next, we select the “classify” tab and click “choose” button to select the “j48” classifier.
3. Now we specify the various parameters. These can be specified by clicking in the text box to
the right of the chose button. In this example, we accept the default values. The default version
does perform some pruning but does not perform error pruning.
4. Under the “text” options in the main panel. We select the 10-fold cross validation as our
evaluation approach. Since we don’t have separate evaluation data set, this is necessary to get
a reasonable idea of accuracy of generated model.
5. We now click “start” to generate the model. The Ascii version of the tree as well as evaluation
statistic will appear in the right panel when the model construction is complete.
6. Note that the classification accuracy of model is about 69%. This indicates that we may find
more work. (Either in pre-processing or in selecting current parameters for the classification)
7. Now weka also lets us a view a graphical version of the classification tree. This can be done
by right clicking the last result set and selecting “visualize tree” from the pop-up menu.
8. We will use our model to classify the new instances.
9. In the main panel under “text” options click the “supplied test set” radio button and then click
the “set” button. This wills pop-up a window which will allow you to open the file containing
test instances.

13
Dataset student.arff

@relation student

@attribute age {<30, 30-40, >40}

@attribute income {low, medium, high}

@attribute student {yes, no}

@attribute credit-rating {fair, excellent}

@attribute buyspc {yes, no}

@data

<30, high, no, fair, no

<30, high, no, excellent, no

30-40, high, no, fair, yes

>40, medium, no, fair, yes

>40, low, yes, fair, yes

>40, low, yes, excellent, no

30-40, low, yes, excellent, yes

<30, medium, no, fair, no

<30, low, yes, fair, no

>40, medium, yes, fair, yes

<30, medium, yes, excellent, yes

30-40, medium, no, excellent, yes

30-40, high, yes, fair, yes

>40, medium, no, excellent, no

14
The following screenshot shows the classification rules that were generated when j48 algorithm
is applied on the given dataset.

15
16
Ex. No: 06
CLASSIFICATION RULE PROCESS ON DATASET
EMPLOYEE.ARFF USING J48 ALGORITHM

AIM:
This experiment illustrates the use of j-48 classifier in weka. The sample data set used in this
experiment is “employee” data available at arff format. This document assumes that appropriate data
pre-processing has been performed.

PROCEDURE:

1. We begin the experiment by loading the data (employee.arff) into weka.


2. Next, we select the “classify” tab and click “choose” button to select the “j48” classifier.
3. Now we specify the various parameters. These can be specified by clicking in the text box to
the right of the chose button. In this example, we accept the default values the default version
does perform some pruning but does not perform error pruning.
4. Under the “text “options in the main panel. We select the 10-fold cross validation as our
evaluation approach. Since we don’t have separate evaluation data set, this is necessary to get
a reasonable idea of accuracy of generated model.
5. We now click “start” to generate the model. The ASCII version of the tree as well as evaluation
statistic will appear in the right panel when the model construction is complete.
6. Note that the classification accuracy of model is about 69%. This indicates that we may find
more work. (Either in pre-processing or in selecting current parameters for the classification)
7. Now weka also lets us a view a graphical version of the classification tree. This can be done by
right clicking the last result set and selecting “visualize tree” from the pop-up menu.
8. We will use our model to classify the new instances.
9. In the main panel under “text “options click the “supplied test set” radio button and then click
the “set” button. This wills pop-up a window which will allow you to open the file containing
test instances.

Data set employee.arff:


@relation employee
@attribute age {25, 27, 28, 29, 30, 35, 48}
@attribute salary {10k, 15k, 17k, 20k, 25k, 30k, 35k, 32k}
@attribute performance {good, avg, poor}
@data
17
%
25, 10k, poor
27, 15k, poor
27, 17k, poor
28, 17k, poor
29, 20k, avg
30, 25k, avg
29, 25k, avg
30, 20k, avg
35, 32k, good
48, 34k, good
48, 32k, good
%

The following screenshot shows the classification rules that were generated whenj48 algorithm
is applied on the given dataset.

18
19
Ex. No: 07
CLASSIFICATION RULE PROCESS ON DATASET
EMPLOYEE.ARFF USING ID3 ALGORITHM

AIM:
This experiment illustrates the use of id3 classifier in weka. The sample data set used in this
experiment is “employee” data available at arff format. This document assumes that appropriate data
pre-processing has been performed.

PROCEDURE:

1. We begin the experiment by loading the data (employee.arff) into weka.


2. Next, we select the “classify” tab and click “choose” button to select the “id3” classifier.
3. Now we specify the various parameters. These can be specified by clicking in the text box to
the right of the chose button. In this example, we accept the default values his default version
does perform some pruning but does not perform error pruning.
4. Under the “text “options in the main panel. We select the 10-fold cross validation as our
evaluation approach. Since we don’t have separate evaluation data set, this is necessary to get
a reasonable idea of accuracy of generated model.
5. We now click “start” to generate the model. The ASCII version of the tree as well as evaluation
statistic will appear in the right panel when the model construction is complete.
6. Note that the classification accuracy of model is about 69%. This indicates that we may find
more work. (Either in pre-processing or in selecting current parameters for the classification)
7. Now weka also lets us a view a graphical version of the classification tree. This can be done by
right clicking the last result set and selecting “visualize tree” from the pop-up menu.
8. We will use our model to classify the new instances.
9. In the main panel under “text “options click the “supplied test set” radio button and then click
the “set” button. This will show pop-up window which will allow you to open the file containing
test instances.

Data set employee.arff:


@relation employee
@attribute age {25, 27, 28, 29, 30, 35, 48}
@attribute salary {10k, 15k, 17k, 20k, 25k, 30k, 35k, 32k}
@attribute performance {good, avg, poor}
@data
20
%
25, 10k, poor
27, 15k, poor
27, 17k, poor
28, 17k, poor
29, 20k, avg
30, 25k, avg
29, 25k, avg
30, 20k, avg
35, 32k, good
48, 34k, good
48, 32k, good
%
The following screenshot shows the classification rules that were generated when id3
algorithm is applied on the given dataset.

21
22
Ex. No: 08
CLASSIFICATION RULE PROCESS ON DATASET
EMPLOYEE.ARFF USING NAIVE BAYES ALGORITHM

AIM:
This experiment illustrates the use of Naive Bayes classifier in weka. The sample data set used
in this experiment is “employee” data available at arff format. This document assumes that appropriate
data pre-processing has been performed.

PROCEDURE:

1. We begin the experiment by loading the data (employee.arff) into weka.


2. Next, we select the “classify” tab and click “choose” button to select the “id3” classifier.
3. Now we specify the various parameters. These can be specified by clicking in the text box to
the right of the chose button. In this example, we accept the default values his default version
does perform some pruning but does not perform error pruning.
4. Under the “text “options in the main panel. We select the 10-fold cross validation as our
evaluation approach. Since we don’t have separate evaluation data set, this is necessary to get
a reasonable idea of accuracy of generated model.
5. We now click “start” to generate the model. The ASCII version of the tree as well as evaluation
statistic will appear in the right panel when the model construction is complete.
6. Note that the classification accuracy of model is about 69%. This indicates that we may find
more work. (Either in pre-processing or in selecting current parameters for the classification)
7. Now weka also lets us a view a graphical version of the classification tree. This can be done by
right clicking the last result set and selecting “visualize tree” from the pop-up menu.
8. We will use our model to classify the new instances.
9. In the main panel under “text “options click the “supplied test set” radio button and then click
the “set” button. This will show pop-up window which will allow you to open the file containing
test instances.

Data set employee.arff:


@relation employee
@attribute age {25, 27, 28, 29, 30, 35, 48}
@attribute salary {10k, 15k, 17k, 20k, 25k, 30k, 35k, 32k}
@attribute performance {good, avg, poor}
@data
23
%
25, 10k, poor
27, 15k, poor
27, 17k, poor
28, 17k, poor
29, 20k, avg
30, 25k, avg
29, 25k, avg
30, 20k, avg
35, 32k, good
48, 34k, good
48, 32k, good
%
The following screenshot shows the classification rules that were generated when naive bayes
algorithm is applied on the given dataset.

24
25
Ex. No: 09
LUSTERING RULE PROCESS ON DATASET IRIS.ARFF
USING SIMPLE K-MEANS

AIM:
This experiment illustrates the use of simple k-mean clustering with Weka explorer. The
sample data set used for this example is based on the iris data available in arff format. This document
assumes that appropriate pre-processing has been performed. This iris dataset includes 150 instances.

PROCEDURE:

1. Run the Weka explorer and load the data file iris.arff in pre-processing interface.
2. In-order to perform clustering select the ‘cluster’ tab in the explorer and click on the choose
button. This step results in a dropdown list of available clustering algorithms.
3. In this case we select ‘simple k-means’.
4. Next click in text button to the right of the choose button to get popup window shown in the
screenshots. In this window we enter six on the number of clusters and we leave the value of
the seed on as it is. The seed value is used in generating a random number which is used for
making the internal assignments of instances of clusters.
5. Once of the option have been specified. We run the clustering algorithm there we must make
sure that they are in the ‘cluster mode’ panel. The use of training set option is selected and then
we click ‘start’ button. This process and resulting window are shown in the following
screenshots.
6. The result window shows the centroid of each cluster as well as statistics on the number and
the percent of instances assigned to different clusters. Here clusters centroid are means vectors
for each cluster. This clusters can be used to characterized the cluster. For e.g., the centroid of
cluster1 shows the class iris. versicolor mean value of the sepal length is 5.4706, sepal width
2.4765, petal width 1.1294, petal length 3.7941.
7. Another way of understanding characteristic of each cluster through visualization, we can do
this, try right clicking the result set on the result. List panel and selecting the visualize cluster
assignments.

26
The following screenshot shows the clustering rules that were generated when simple k-means
algorithm is applied on the given dataset.

Interpretation of the above visualization

From the above visualization, we can understand the distribution of sepal length and petal
length in each cluster. For instance, for each cluster is dominated by petal length. In this case
by changing the color dimension to other attributes we can see their distribution with in each
of the cluster.

8. We can assure that resulting dataset which included each instance along with its assign
cluster. To do so we click the save button in the visualization window and save the result iris
k-mean. The top portion of this file is shown in the following figure.

27
28
Ex. No: 10
CLUSTERING RULE PROCESS ON DATASET STUDENT.ARFF
USING SIMPLE K-MEANS

AIM:
This experiment illustrates the use of simple k-mean clustering with Weka explorer. The
sample data set used for this example is based on the student data available in ARFF format. This
document assumes that appropriate pre-processing has been performed. This student dataset includes
14 instances.

PROCEDURE:

1. Run the Weka explorer and load the data file student.arff in pre-processing interface.
2. In-order to perform clustering select the ‘cluster’ tab in the explorer and click on the choose
button. This step results in a dropdown list of available clustering algorithms.
3. In this case we select ‘simple k-means’.
4. Next click in text button to the right of the choose button to get popup window shown in the
screenshots. In this window we enter six on the number of clusters and we leave the value of
the seed on as it is. The seed value is used in generating a random number which is used for
making the internal assignments of instances of clusters.
5. Once of the option have been specified. We run the clustering algorithm there we must make
sure that they are in the ‘cluster mode’ panel. The use of training set option is selected and then
we click ‘start’ button. This process and resulting window are shown in the following
screenshots.
6. The result window shows the centroid of each cluster as well as statistics on the number and
the percent of instances assigned to different clusters. Here clusters centroid are means vectors
for each cluster. This clusters can be used to characterized the cluster.
7. Another way of understanding characteristic of each cluster through visualization, we can do
this, try right clicking the result set on the result. List panel and selecting the visualize cluster
assignments.
Interpretation of the above visualization
From the above visualization, we can understand the distribution of age and instance number
in each cluster. For instance, for each cluster is dominated by age. In this case by changing the
color dimension to other attributes we can see their distribution with in each of the cluster.

29
8. We can assure that resulting dataset which included each instance along with its assign cluster.
To do so we click the save button in the visualization window and save the result student k-
mean. The top portion of this file is shown in the following figure.

Dataset student.arff

@relation student

@attribute age {<30, 30-40, >40}

@attribute income {low, medium, high}

@attribute student {yes, no}

@attribute credit-rating {fair, excellent}

@attribute buyspc {yes, no}

@data

<30, high, no, fair, no

<30, high, no, excellent, no

30-40, high, no, fair, yes

>40, medium, no, fair, yes

>40, low, yes, fair, yes

>40, low, yes, excellent, no

30-40, low, yes, excellent, yes

<30, medium, no, fair, no

<30, low, yes, fair, no

>40, medium, yes, fair, yes

<30, medium, yes, excellent, yes

30-40, medium, no, excellent, yes

30-40, high, yes, fair, yes

>40, medium, no, excellent, no %

30
The following screenshot shows the clustering rules that were generated when simple k-means
algorithm is applied on the given dataset.

31
32
Ex. No: 11
PREDICTION ANALYSIS MODEL ON STUDENT MARKS

AIM:
Predictive Analysis on Five Subject Student marks to check whether the given marks are Pass
or Fail.

PROCEDURE:

1. Go to Start and Open Excel 2019 for creating a Dataset for Student marks.
2. Prepare an Excel sheet with the attributes Mark1, Mark2, Mark3, Mark4, Mark5 and Result.
3. Create the Two different excel sheet for Training set and Test Set.
4. The Entire prediction analysis model will works based on the records entered in training set.
5. But unlike training set the test data set will does not have the value for Result attribute it will
be replaced by the Question mark.
6. After creating the excel sheet, save the both excel sheet in CSV format as “Student Training
Set”, “Student Test Set”.
7. Run the Weka explorer and load the data file “Student Training Set” in pre-processing
interface.
8. Next, we select the “Classify” tab and click “Choose” button to select the “J48” Tree classifier.
9. Then Select the “Use training set” radio button in Test options panel.
10. We now click “Start” to generate the model. The Run information, evaluation statistic as well
as the Confusion Matrix will appear in the right side “Classifier output” panel when the model
construction is complete.
11. Now we select the “Supplied test set” radio button in Test options panel and click “Set…”
button. This wills pop-up a window which will allow you to open the file containing test
instances.
12. Then select the “Student Test Set” file as Test Instances and click on “Close” button.
13. Again click “Start” to generate the model, after the model construction is completed right-click
the “Result list” and select “Visualize classifier errors”.
14. Now Save the Result as “Student Test Set Result.arff” in this Predicted dataset the Question
Mark will be replaced by the resulting value.

33
Dataset for Student Mark:

Student Training Set

@relation 'Student Training Set'


@attribute Mark1 numeric
@attribute Mark2 numeric
@attribute Mark3 numeric
@attribute Mark4 numeric
@attribute Mark5 numeric
@attribute Result {Pass, Fail}

@data
90,99,100,97,98,Pass
88,89,90,87,70,Pass
60,65,80,80,48,Pass
78,92,78,90,85,Pass
78,95,64,79,52,Pass
80,89,89,97,73,Pass
88,89,40,87,70,Pass
60,65,80,70,83,Pass
86,83,78,90,45,Pass
78,40,64,79,84,Pass
70,78,80,72,73,Pass
79,69,82,79,90,Pass
60,84,80,76,83,Pass
86,78,78,90,90,Pass
90,71,64,79,84,Pass
60,78,69,72,73,Pass
67,69,82,79,90,Pass
60,84,80,76,65,Pass
86,68,78,60,90,Pass
69,71,64,69,84,Pass

34
40,78,69,57,73,Pass
67,57,82,79,40,Pass
60,58,40,76,65,Pass
40,68,78,60,90,Pass
69,40,56,69,48,Pass
39,70,54,58,64,Fail
48,39,76,49,40,Fail
56,67,30,67,43,Fail
40,61,59,48,30,Fail
48,68,89,72,56,Fail
39,28,75,64,89,Fail
38,30,40,68,47,Fail
27,18,47,73,47,Fail
36,29,84,94,58,Fail
13,39,57,40,42,Fail
17,39,26,40,43,Fail
39,12,38,70,68,Fail
19,39,10,83,65,Fail
38,15,27,68,64,Fail
35,17,39,79,40,Fail
28,39,17,16,50,Fail
37,28,19,39,79,Fail
28,35,38,14,68,Fail
37,18,19,39,90,Fail
37,27,14,37,54,Fail
28,24,16,30,19,Fail
39,25,13,38,10,Fail
17,13,39,34,25,Fail
36,39,26,20,37,Fail
28,19,27,19,15,Fail

35
36
Student Test Set

@relation 'Student Test Set'

@attribute Mark1 numeric


@attribute Mark2 numeric
@attribute Mark3 numeric
@attribute Mark4 numeric
@attribute Mark5 numeric
@attribute Result {Pass, Fail}

@data
90,99,100,97,98,?
80,89,89,97,73,?
70,78,80,72,73,?
60,78,69,72,73,?
40,78,69,57,73,?
39,70,54,58,64,?
39,28,75,64,89,?
17,39,26,40,43,?
28,39,17,16,50,?
39,25,13,38,10,?

37
The following screenshot shows the classification rules that were generated when j48
algorithm is applied on the given dataset.

38
Student Test Set Result

@relation 'Student Test Set_predicted'


@attribute Mark1 numeric
@attribute Mark2 numeric
@attribute Mark3 numeric
@attribute Mark4 numeric
@attribute Mark5 numeric
@attribute 'prediction margin' numeric
@attribute 'predicted Result' {Pass, Fail}
@attribute Result {Pass, Fail}
@data
90,99,100,97,98,1,Pass,?
80,89,89,97,73,1,Pass,?
70,78,80,72,73,1,Pass,?
60,78,69,72,73,1,Pass,?
40,78,69,57,73,1,Pass,?
39,70,54,58,64,-1,Fail,?
39,28,75,64,89,-1,Fail,?
17,39,26,40,43,-1,Fail,?
28,39,17,16,50,-1,Fail,?
39,25,13,38,10,-1,Fail,?

39
Ex. No: 12
PREDICTIVE ANALYSIS MODEL FOR OPENING BANK ACCOUNT

AIM:
To Perform the Predictive Analysis model for Opening a Bank account in a particular region.

PROCEDURE:

1. Go to Start and Open Excel 2019 for creating a Dataset for Bank Account.
2. Prepare an Excel sheet with the attributes Job, Income, Age, Location and Approval.
3. Create the Two different excel sheet for Training set and Test Set.
4. The Entire prediction analysis model will works based on the records entered in training set.
5. But unlike training set the test data set will does not have the value for Approval attribute it
will be replaced by the Question mark.
6. After creating the excel sheet, save the both excel sheet in CSV format as “Bank Training Set”,
“Bank Test Set”.
7. Run the Weka explorer and load the data file “Bank Training Set” in pre-processing interface.
8. Next, we select the “Classify” tab and click “Choose” button to select the “J48” Tree classifier.
9. Then Select the “Use training set” radio button in Test options panel.
10. We now click “Start” to generate the model. The Run information, evaluation statistic as well
as the Confusion Matrix will appear in the right side “Classifier output” panel when the model
construction is complete.
11. Now we select the “Supplied test set” radio button in Test options panel and click “Set…”
button. This wills pop-up a window which will allow you to open the file containing test
instances.
12. Then select the “Bank Test Set” file as Test Instances and click on “Close” button.
13. Again click “Start” to generate the model, after the model construction is completed right-click
the “Result list” and select “Visualize classifier errors”.
14. Now Save the Result as “Bank Test Set Result.arff” in this Predicted dataset the Question Mark
will be replaced by the resulting value.

40
Dataset for Student Mark:

Bank Training Set

@relation 'Bank Training Set'

@attribute Job {Private,Governtment,'Daily Wage',Bussniess}


@attribute Income numeric
@attribute Age numeric
@attribute Location {Vellore}
@attribute Approval {Yes,No}

@data
Private,9000,18,Vellore,Yes
Private,15000,23,Vellore,Yes
Private,20000,20,Vellore,Yes
Private,30000,35,Vellore,Yes
Private,45000,40,Vellore,Yes
Private,10000,45,Vellore,Yes
Private,50000,46,Vellore,Yes
Private,5000,23,Vellore,No
Private,8000,18,Vellore,No
Private,7500,33,Vellore,No
Governtment,20000,36,Vellore,Yes
Governtment,35000,28,Vellore,Yes
Governtment,12000,18,Vellore,Yes
Governtment,8000,20,Vellore,No
Governtment,45000,30,Vellore,Yes
Governtment,8500,45,Vellore,Yes
Governtment,45000,46,Vellore,Yes
Governtment,7000,23,Vellore,No
Governtment,50000,33,Vellore,Yes
Governtment,75000,45,Vellore,Yes

41
'Daily Wage',13000,20,Vellore,Yes
'Daily Wage',5000,18,Vellore,No
'Daily Wage',15000,20,Vellore,Yes
'Daily Wage',20000,24,Vellore,Yes
'Daily Wage',7500,33,Vellore,No
'Daily Wage',8000,45,Vellore,No
'Daily Wage',18000,46,Vellore,Yes
'Daily Wage',9000,25,Vellore,Yes
'Daily Wage',20000,35,Vellore,Yes
'Daily Wage',12000,45,Vellore,Yes
Bussniess,8000,24,Vellore,No
Bussniess,20000,20,Vellore,Yes
Bussniess,30000,18,Vellore,Yes
Bussniess,25000,48,Vellore,Yes
Bussniess,75000,38,Vellore,Yes
Bussniess,25000,40,Vellore,Yes
Bussniess,15000,23,Vellore,Yes
Bussniess,80000,33,Vellore,Yes
Bussniess,90000,40,Vellore,Yes
Bussniess,100000,28,Vellore,Yes

42
Bank Training Set

@relation 'Bank Test Set'

@attribute Job {Private,Governtment,'Daily Wage',Bussniess}


@attribute Income numeric
@attribute Age numeric
@attribute Location {Vellore}
@attribute Approval {Yes,No}

@data
Private,9000,18,Vellore,?
Private,15000,23,Vellore,?
Private,20000,20,Vellore,?
Private,30000,35,Vellore,?
Private,45000,40,Vellore,?
Governtment,8500,45,Vellore,?
Governtment,45000,46,Vellore,?
Governtment,7000,23,Vellore,?
Governtment,50000,33,Vellore,?
Governtment,75000,45,Vellore,?
'Daily Wage',8000,45,Vellore,?
'Daily Wage',18000,46,Vellore,?
'Daily Wage',9000,25,Vellore,?
'Daily Wage',20000,35,Vellore,?
'Daily Wage',12000,45,Vellore,?
Bussniess,25000,40,Vellore,?
Bussniess,15000,23,Vellore,?
Bussniess,80000,33,Vellore,?
Bussniess,90000,40,Vellore,?
Bussniess,100000,28,Vellore,?

43
The following screenshot shows the classification rules that were generated when j48
algorithm is applied on the given dataset.

44
Bank Test Set

@relation 'Bank Test Set_predicted'

@attribute Job {Private,Governtment,'Daily Wage',Bussniess}


@attribute Income numeric
@attribute Age numeric
@attribute Location {Vellore}
@attribute 'prediction margin' numeric
@attribute 'predicted Approval' {Yes,No}
@attribute Approval {Yes,No}

@data
Private,9000,18,Vellore,1,Yes,?
Private,15000,23,Vellore,1,Yes,?
Private,20000,20,Vellore,1,Yes,?
Private,30000,35,Vellore,1,Yes,?
Private,45000,40,Vellore,1,Yes,?
Governtment,8500,45,Vellore,1,Yes,?
Governtment,45000,46,Vellore,1,Yes,?
Governtment,7000,23,Vellore,-1,No,?
Governtment,50000,33,Vellore,1,Yes,?
Governtment,75000,45,Vellore,1,Yes,?
'Daily Wage',8000,45,Vellore,-1,No,?
'Daily Wage',18000,46,Vellore,1,Yes,?
'Daily Wage',9000,25,Vellore,1,Yes,?
'Daily Wage',20000,35,Vellore,1,Yes,?
'Daily Wage',12000,45,Vellore,1,Yes,?
Bussniess,25000,40,Vellore,1,Yes,?
Bussniess,15000,23,Vellore,1,Yes,?
Bussniess,80000,33,Vellore,1,Yes,?
Bussniess,90000,40,Vellore,1,Yes,?
Bussniess,100000,28,Vellore,1,Yes,?
45
46
Ex. No: 13
PREDICTIVE ANALYSIS ON ATTENDANCE PERCENTAGE

AIM:
To Perform the Predictive Analysis on Attendance percentage to allow the students to appear
in the semester exam.

PROCEDURE:

1. Go to Start and Open Excel 2019 for creating a Dataset for Attendance.
2. Prepare an Excel sheet with the attributes Attendance Percentage, Condonation Fees,
Composite of Attendance, Redo Candidate, Married Girl and Eligibility.
3. Create the Two different excel sheet for Training set and Test Set.
4. The Entire prediction analysis model will works based on the records entered in training set.
5. But unlike training set the test data set will does not have the value for Eligibility attribute it
will be replaced by the Question mark.
6. After creating the excel sheet, save the both excel sheet in CSV format as “Attendance Training
Set”, “Attendance Test Set”.
7. Run the Weka explorer and load the data file “Attendance Training Set” in pre-processing
interface.
8. Next, we select the “Classify” tab and click “Choose” button to select the “J48” Tree classifier.
9. Then Select the “Use training set” radio button in Test options panel.
10. We now click “Start” to generate the model. The Run information, evaluation statistic as well
as the Confusion Matrix will appear in the right side “Classifier output” panel when the model
construction is complete.
11. Now we select the “Supplied test set” radio button in Test options panel and click “Set…”
button. This wills pop-up a window which will allow you to open the file containing test
instances.
12. Then select the “Attendance Test Set” file as Test Instances and click on “Close” button.
13. Again click “Start” to generate the model, after the model construction is completed right-click
the “Result list” and select “Visualize classifier errors”.
14. Now Save the Result as “Attendance Test Set Result.arff” in this Predicted dataset the Question
Mark will be replaced by the resulting value.

47
Dataset for Student Mark:

Attendance Training Set

@relation 'Attendance Training Set'

@attribute 'Attendance Percentage' numeric


@attribute 'Condonation Fees' {No,Yes}
@attribute 'Composite of Attendance' {No,Yes}
@attribute 'Redo Candidate' {No}
@attribute 'Married Girl' {Yes,No}
@attribute Eligibility {No,Yes}

@data
30,No,No,No,Yes,No
35,No,No,No,Yes,No
40,No,No,No,Yes,No
45,No,No,No,Yes,No
50,No,No,No,Yes,No
54,No,No,No,Yes,No
30,No,No,No,No,No
35,No,No,No,No,No
40,No,No,No,No,No
45,No,No,No,No,No
50,No,No,No,No,No
54,No,No,No,No,No
55,No,Yes,No,No,Yes
60,No,Yes,No,No,Yes
64,No,Yes,No,No,Yes
55,No,No,No,No,No
60,No,No,No,No,No
64,No,No,No,No,No
55,No,No,No,Yes,Yes
60,No,No,No,Yes,Yes

48
64,No,No,No,Yes,Yes
65,Yes,No,No,No,Yes
70,Yes,No,No,No,Yes
74,Yes,No,No,No,Yes
65,No,No,No,No,No
70,No,No,No,No,No
74,No,No,No,No,No
65,No,No,No,Yes,Yes
70,No,No,No,Yes,Yes
74,No,No,No,Yes,Yes
75,No,No,No,Yes,Yes
80,No,No,No,Yes,Yes
85,No,No,No,Yes,Yes
90,No,No,No,Yes,Yes
95,No,No,No,Yes,Yes
100,No,No,No,Yes,Yes
75,No,No,No,No,Yes
80,No,No,No,No,Yes
85,No,No,No,No,Yes
90,No,No,No,No,Yes
95,No,No,No,No,Yes
100,No,No,No,No,Yes

Attendance Test Set

@relation 'Attendance Test Set'

@attribute 'Attendance Percentage' numeric


@attribute 'Condonation Fees' {No,Yes}
@attribute 'Composite of Attendance' {No,Yes}
@attribute 'Redo Candidate' {No}
@attribute 'Married Girl' {Yes,No}
@attribute Eligibility {No,Yes}

49
@data
30,No,No,No,Yes,?
54,No,No,No,No,?
55,No,Yes,No,No,?
60,No,No,No,Yes,?
64,No,No,No,Yes,?
65,Yes,No,No,No,?
74,Yes,No,No,No,?
90,No,No,No,Yes,?
75,No,No,No,No,?

The following screenshot shows the classification rules that were generated when j48
algorithm is applied on the given dataset.

50
51
Attendance Test Set Result

@relation 'Attendance Test Set_predicted'

@attribute 'Attendance Percentage' numeric


@attribute 'Condonation Fees' {No,Yes}
@attribute 'Composite of Attendance' {No,Yes}
@attribute 'Redo Candidate' {No}
@attribute 'Married Girl' {Yes,No}
@attribute 'prediction margin' numeric
@attribute 'predicted Eligibility' {No,Yes}
@attribute Eligibility {No,Yes}

@data
30,No,No,No,Yes,1,No,?
54,No,No,No,No,1,No,?
55,No,Yes,No,No,-1,Yes,?
60,No,No,No,Yes,-1,Yes,?
64,No,No,No,Yes,-1,Yes,?
65,Yes,No,No,No,-1,Yes,?
74,Yes,No,No,No,-1,Yes,?
90,No,No,No,Yes,-1,Yes,?
75,No,No,No,No,-1,Yes,?

52
Ex. No: 14
PREDICTIVE ANALYSIS FOR CHECK THE FRUIT IS
APPLE OR NOT

AIM:
To Prepare the Predictive Analysis modal on the given attribute to identify the fruit is Apple
or Not.

PROCEDURE:

1. Go to Start and Open Excel 2019 for creating a Dataset for Apple.
2. Prepare an Excel sheet with the attributes Diameter, Mass, Volume, Surface Area, Color and
Fruit.
3. Create the Two different excel sheet for Training set and Test Set.
4. The Entire prediction analysis model will works based on the records entered in training set.
5. But unlike training set the test data set will does not have the value for Fruit attribute it will be
replaced by the Question mark.
6. After creating the excel sheet, save the both excel sheet in CSV format as “Apple Training
Set”, “Apple Test Set”.
7. Run the Weka explorer and load the data file “Apple Training Set” in pre-processing interface.
8. Next, we select the “Classify” tab and click “Choose” button to select the “J48” Tree classifier.
9. Then Select the “Use training set” radio button in Test options panel.
10. We now click “Start” to generate the model. The Run information, evaluation statistic as well
as the Confusion Matrix will appear in the right side “Classifier output” panel when the model
construction is complete.
11. Now we select the “Supplied test set” radio button in Test options panel and click “Set…”
button. This wills pop-up a window which will allow you to open the file containing test
instances.
12. Then select the “Apple Test Set” file as Test Instances and click on “Close” button.
13. Again click “Start” to generate the model, after the model construction is completed right-click
the “Result list” and select “Visualize classifier errors”.
14. Now Save the Result as “Apple Test Set Result.arff” in this Predicted dataset the Question
Mark will be replaced by the resulting value.

53
Dataset for Student Mark:

Apple Training Set

@relation 'Apple Training Set'

@attribute Diameter numeric


@attribute Mass numeric
@attribute Volume numeric
@attribute 'Surface Area' numeric
@attribute Color {Red,Yellow,Green}
@attribute Fruit {Yes,No}

@data
34.5,150,180,3737.385,Red,Yes
35,155,181,3846.5,Yellow,Yes
35.5,160,182,3957.185,Green,Yes
36,165,183,4069.44,Red,Yes
36.5,170,184,4183.265,Yellow,Yes
37,175,185,4298.66,Green,Yes
37.5,180,186,4415.625,Red,Yes
38,185,187,4534.16,Yellow,Yes
38.5,190,188,4654.265,Green,Yes
39,195,189,4775.94,Red,Yes
39.5,200,190,4899.185,Yellow,Yes
40,205,180,5024,Green,Yes
40.5,210,181,5150.385,Red,Yes
41,215,182,5278.34,Yellow,Yes
41.5,220,183,5407.865,Green,Yes
42,225,184,5538.96,Red,Yes
42.5,230,185,5671.625,Yellow,Yes
43,235,186,5805.86,Green,Yes
43.5,240,187,5941.665,Red,Yes

54
44,245,188,6079.04,Yellow,Yes
44.5,250,189,6217.985,Green,Yes
45,150,190,6358.5,Red,Yes
45.5,155,180,6500.585,Yellow,Yes
46,160,181,6644.24,Green,Yes
46.5,165,182,6789.465,Red,Yes
47,170,183,6936.26,Yellow,Yes
47.5,175,184,7084.625,Green,Yes
48,180,185,7234.56,Red,Yes
48.5,185,186,7386.065,Yellow,Yes
49,190,187,7539.14,Green,Yes
49.5,195,188,7693.785,Red,Yes
50,200,189,7850,Yellow,Yes
50.5,205,190,8007.785,Green,Yes
51,210,180,8167.14,Red,Yes
51.5,215,181,8328.065,Yellow,Yes
52,220,182,8490.56,Green,Yes
52.5,225,183,8654.625,Red,Yes
53,230,184,8820.26,Yellow,Yes
53.5,235,185,8987.465,Green,Yes
54,240,186,9156.24,Red,Yes
54.5,245,187,9326.585,Yellow,Yes
55,250,188,9498.5,Green,Yes
55.5,150,189,9671.985,Red,Yes
56,155,190,9847.04,Yellow,Yes
56.5,160,180,10023.665,Green,Yes
57,165,181,10201.86,Red,Yes
57.5,170,182,10381.625,Yellow,Yes
58,175,183,10562.96,Green,Yes
58.5,180,184,10745.865,Red,Yes

55
59,185,185,10930.34,Yellow,Yes
59.5,190,186,11116.385,Green,Yes
60,195,187,11304,Red,Yes
60.5,200,188,11493.185,Yellow,Yes
61,205,189,11683.94,Green,Yes
61.5,210,190,11876.265,Red,Yes
62,215,180,12070.16,Yellow,Yes
62.5,220,181,12265.625,Green,Yes
63,225,182,12462.66,Red,Yes
63.5,230,183,12661.265,Yellow,Yes
64,235,184,12861.44,Green,Yes
64.5,240,185,13063.185,Red,Yes
65,245,186,13266.5,Yellow,Yes
65.5,250,187,13471.385,Green,Yes
66,150,188,13677.84,Red,Yes
66.5,155,189,13885.865,Yellow,Yes
67,160,190,14095.46,Green,Yes
67.5,165,180,14306.625,Red,Yes
68,170,181,14519.36,Yellow,Yes
68.5,175,182,14733.665,Green,Yes
69,180,183,14949.54,Red,Yes
69.5,185,184,15166.985,Yellow,Yes
70,251,191,15386,Red,No
70.5,255,192,15606.585,Yellow,No
71,300,193,15828.74,Green,No
71.5,345,194,16052.465,Red,No
72,390,195,16277.76,Yellow,No
72.5,435,196,16504.625,Green,No
73,480,197,16733.06,Red,No
73.5,525,198,16963.065,Yellow,No

56
74,570,199,17194.64,Green,No
74.5,615,200,17427.785,Red,No
75,660,201,17662.5,Yellow,No
75.5,705,202,17898.785,Green,No
76,750,203,18136.64,Red,No
76.5,795,204,18376.065,Yellow,No
77,840,205,18617.06,Green,No
77.5,885,206,18859.625,Red,No
78,930,207,19103.76,Yellow,No
78.5,975,208,19349.465,Green,No
79,1020,209,19596.74,Red,No
79.5,1065,210,19845.585,Yellow,No
80,1110,211,20096,Green,No
80.5,1155,212,20347.985,Red,No
81,1200,213,20601.54,Yellow,No
81.5,1245,214,20856.665,Green,No
82,1290,215,21113.36,Red,No
82.5,1335,216,21371.625,Yellow,No
83,1380,217,21631.46,Green,No
83.5,1425,218,21892.865,Red,No

57
Apple Test Set

@relation 'Apple Test Set'

@attribute Diameter numeric


@attribute Mass numeric
@attribute Volume numeric
@attribute 'Surface Area' numeric
@attribute Color {Red,Yellow,Green}
@attribute Fruit {Yes,No}

@data
34.5,150,180,3737.385,Red,?
35,155,181,3846.5,Yellow,?
35.5,160,182,3957.185,Green,?
36,165,183,4069.44,Red,?
67.5,165,180,14306.625,Red,?
68,170,181,14519.36,Yellow,?
68.5,175,182,14733.665,Green,?
69,180,183,14949.54,Red,?
69.5,185,184,15166.985,Yellow,?
70,251,191,15386,Red,?
70.5,255,192,15606.585,Yellow,?
71,300,193,15828.74,Green,?
71.5,345,194,16052.465,Red,?
72,390,195,16277.76,Yellow,?
81,1200,213,20601.54,Yellow,?
81.5,1245,214,20856.665,Green,?
82.5,1335,216,21371.625,Yellow,?
83,1380,217,21631.46,Green,?
83.5,1425,218,21892.865,Red,?

58
The following screenshot shows the classification rules that were generated when j48
algorithm is applied on the given dataset.

59
Apple Test Set

@relation 'Apple Test Set_predicted'

@attribute Diameter numeric


@attribute Mass numeric
@attribute Volume numeric
@attribute 'Surface Area' numeric
@attribute Color {Red,Yellow,Green}
@attribute 'prediction margin' numeric
@attribute 'predicted Fruit' {Yes,No}
@attribute Fruit {Yes,No}

@data
34.5,150,180,3737.385,Red,1,Yes,?
35,155,181,3846.5,Yellow,1,Yes,?
35.5,160,182,3957.185,Green,1,Yes,?
36,165,183,4069.44,Red,1,Yes,?
67.5,165,180,14306.625,Red,1,Yes,?
68,170,181,14519.36,Yellow,1,Yes,?
68.5,175,182,14733.665,Green,1,Yes,?
69,180,183,14949.54,Red,1,Yes,?
69.5,185,184,15166.985,Yellow,1,Yes,?
70,251,191,15386,Red,-1,No,?
70.5,255,192,15606.585,Yellow,-1,No,?
71,300,193,15828.74,Green,-1,No,?
71.5,345,194,16052.465,Red,-1,No,?
72,390,195,16277.76,Yellow,-1,No,?
81,1200,213,20601.54,Yellow,-1,No,?
81.5,1245,214,20856.665,Green,-1,No,?
82.5,1335,216,21371.625,Yellow,-1,No,?
83,1380,217,21631.46,Green,-1,No,?
83.5,1425,218,21892.865,Red,-1,No,?

60
61
Ex. No: 15
PREDICTION ANALYSIS MODEL ON IRIS FLOWER

AIM:
To Perform the Predictive Analysis on the given attribute to find the type of Iris Flower.

PROCEDURE:

1. Go to Start and Open Excel 2019 for creating a Dataset for Iris.
2. Prepare an Excel sheet with the attributes Sepal Length, Sepal Width, Petal Length, Petal
Width and Class.
3. Create the Two different excel sheet for Training set and Test Set.
4. The Entire prediction analysis model will works based on the records entered in training set.
5. But unlike training set the test data set will does not have the value for Class attribute it will
be replaced by the Question mark.
6. After creating the excel sheet, save the both excel sheet in CSV format as “Iris Training Set”,
“Iris Test Set”.
7. Run the Weka explorer and load the data file “Iris Training Set” in pre-processing interface.
8. Next, we select the “Classify” tab and click “Choose” button to select the “J48” Tree classifier.
9. Then Select the “Use training set” radio button in Test options panel.
10. We now click “Start” to generate the model. The Run information, evaluation statistic as well
as the Confusion Matrix will appear in the right side “Classifier output” panel when the model
construction is complete.
11. Now we select the “Supplied test set” radio button in Test options panel and click “Set…”
button. This wills pop-up a window which will allow you to open the file containing test
instances.
12. Then select the “Iris Test Set” file as Test Instances and click on “Close” button.
13. Again click “Start” to generate the model, after the model construction is completed right-click
the “Result list” and select “Visualize classifier errors”.
14. Now Save the Result as “Iris Test Set Result.arff” in this Predicted dataset the Question Mark
will be replaced by the resulting value.

62
Dataset for Student Mark:

Iris Training Set

@relation 'Iris Training Set'

@attribute 'Sepal Length' numeric


@attribute 'Sepal Width' numeric
@attribute 'Petal Length' numeric
@attribute 'Petal Width' numeric
@attribute Class {Iris-setosa,Iris-versicolor,Iris-virginica}

@data
4.3,2.3,1,0.1,Iris-setosa
4.4,2.4,1.1,0.2,Iris-setosa
4.5,2.5,1.2,0.3,Iris-setosa
4.6,2.6,1.3,0.4,Iris-setosa
4.7,2.7,1.4,0.5,Iris-setosa
4.8,2.8,1.5,0.6,Iris-setosa
4.9,2.9,1.6,0.1,Iris-setosa
5,3,1.7,0.2,Iris-setosa
5.1,3.1,1.8,0.3,Iris-setosa
5.2,3.2,1.9,0.4,Iris-setosa
5.3,3.3,1,0.5,Iris-setosa
5.4,3.4,1.1,0.6,Iris-setosa
5.5,3.5,1.2,0.1,Iris-setosa
5.6,3.6,1.3,0.2,Iris-setosa
5.7,3.7,1.4,0.3,Iris-setosa
4.3,3.8,1.5,0.4,Iris-setosa
4.4,3.9,1.6,0.5,Iris-setosa
4.5,4,1.7,0.6,Iris-setosa
4.6,4.1,1.8,0.4,Iris-setosa
4.7,4.2,1.9,0.5,Iris-setosa

63
4.9,2,3,1,Iris-versicolor
5,2.1,3.1,1.1,Iris-versicolor
5.1,2.2,3.2,1.2,Iris-versicolor
5.2,2.3,3.3,1.3,Iris-versicolor
5.3,2.4,3.4,1.4,Iris-versicolor
5.4,2.5,3.5,1.5,Iris-versicolor
5.5,2.6,3.6,1.6,Iris-versicolor
5.6,2.7,3.7,1.7,Iris-versicolor
5.7,2.8,3.8,1.8,Iris-versicolor
5.8,2.9,3.9,1,Iris-versicolor
5.9,3,4,1.1,Iris-versicolor
6,3.1,4.1,1.2,Iris-versicolor
6.1,3.2,4.2,1.3,Iris-versicolor
6.2,3.3,4.3,1.4,Iris-versicolor
6.3,3.4,4.4,1.5,Iris-versicolor
6.4,2,4.5,1.6,Iris-versicolor
6.5,2.1,4.6,1.7,Iris-versicolor
6.6,2.2,4.7,1.8,Iris-versicolor
6.7,2.3,4.8,1,Iris-versicolor
6.8,2.4,4.9,1.1,Iris-versicolor
6.9,2.5,5,1.2,Iris-versicolor
7,2.6,5.1,1.3,Iris-versicolor
4.9,2.5,4.5,1.4,Iris-virginica
5,2.6,4.6,1.5,Iris-virginica
5.1,2.7,4.7,1.6,Iris-virginica
5.2,2.8,4.8,1.7,Iris-virginica
5.3,2.9,4.9,1.8,Iris-virginica
5.4,3,5,1.9,Iris-virginica
5.5,3.1,5.1,2,Iris-virginica
5.6,3.2,5.2,2.1,Iris-virginica

64
5.7,3.3,5.3,2.2,Iris-virginica
5.8,2.5,5.4,2.3,Iris-virginica
5.9,2.6,5.5,2.4,Iris-virginica
6,2.7,5.6,2.5,Iris-virginica
6.1,2.8,5.7,1.4,Iris-virginica
6.2,2.9,5.8,1.5,Iris-virginica
6.3,3,5.9,1.6,Iris-virginica
6.4,3.1,6,1.7,Iris-virginica
6.5,3.2,6.1,1.8,Iris-virginica
6.6,3.3,6.2,1.9,Iris-virginica
6.7,2.5,6.3,2,Iris-virginica
6.8,2.6,6.4,2.1,Iris-virginica
6.9,2.7,6.5,2.2,Iris-virginica
7,2.8,6.6,2.3,Iris-virginica
7.1,2.9,6.7,2.4,Iris-virginica
7.2,3,6.8,2.5,Iris-virginica
7.3,3.1,6.9,1.4,Iris-virginica
7.4,3.2,4.5,1.5,Iris-virginica
7.5,3.3,4.6,1.6,Iris-virginica
7.6,2.5,4.7,1.7,Iris-virginica
7.7,2.6,4.8,1.8,Iris-virginica
7.8,2.7,4.9,1.9,Iris-virginica
7.9,2.8,5,2,Iris-virginica

65
Iris Training Set

@relation 'Iris Test Set'

@attribute 'Sepal Length' numeric


@attribute 'Sepal Width' numeric
@attribute 'Petal Length' numeric
@attribute 'Petal Width' numeric
@attribute Class {Iris-setosa,Iris-versicolor,Iris-virginica}

@data
4.3,2.3,1,0.1,?
4.4,2.4,1.1,0.2,?
4.5,2.5,1.2,0.3,?
4.6,4.1,1.8,0.4,?
4.7,4.2,1.9,0.5,?
4.9,2,3,1,?
5,2.1,3.1,1.1,?
5.1,2.2,3.2,1.2,?
6.9,2.5,5,1.2,?
7,2.6,5.1,1.3,?
4.9,2.5,4.5,1.4,?
5,2.6,4.6,1.5,?
7.7,2.6,4.8,1.8,?
7.8,2.7,4.9,1.9,?
7.9,2.8,5,2,?

66
The following screenshot shows the classification rules that were generated when j48
algorithm is applied on the given dataset.

67
Iris Test Set Result

@relation 'Iris Test Set_predicted'

@attribute 'Sepal Length' numeric


@attribute 'Sepal Width' numeric
@attribute 'Petal Length' numeric
@attribute 'Petal Width' numeric
@attribute 'prediction margin' numeric
@attribute 'predicted Class' {Iris-setosa,Iris-versicolor,Iris-virginica}
@attribute Class {Iris-setosa,Iris-versicolor,Iris-virginica}

@data
4.3,2.3,1,0.1,1,Iris-setosa,?
4.4,2.4,1.1,0.2,1,Iris-setosa,?
4.5,2.5,1.2,0.3,1,Iris-setosa,?
4.6,4.1,1.8,0.4,1,Iris-setosa,?
4.7,4.2,1.9,0.5,1,Iris-setosa,?
4.9,2,3,1,-1,Iris-versicolor,?
5,2.1,3.1,1.1,-1,Iris-versicolor,?
5.1,2.2,3.2,1.2,-1,Iris-versicolor,?
6.9,2.5,5,1.2,-1,Iris-versicolor,?
7,2.6,5.1,1.3,-1,Iris-versicolor,?
4.9,2.5,4.5,1.4,-1,Iris-virginica,?
5,2.6,4.6,1.5,-1,Iris-virginica,?
7.7,2.6,4.8,1.8,-1,Iris-virginica,?
7.8,2.7,4.9,1.9,-1,Iris-virginica,?
7.9,2.8,5,2,-1,Iris-virginica,?

68
69

You might also like