DM Tools Sample-1
DM Tools Sample-1
ARTS COLLEGE(AUTONOMOUS)
VELLORE-2
PG & Research Department of Computer Science
M.Sc., COMPUTER SCIENCE
PRACTICAL RECORD
2023-2024
SUB.CODE : …………………………………………………………………………………
SUBJECT : ………………………………………………………………………………....
CLASS :………………………………………………………………………………….
MUTHURANGAM GOVERNMENT
ARTS COLLEGE (AUTONOMOUS)
VELLORE-2
BONAFIDE CERTIFICATE
SI. PAGE
DATE TITLE SIGN
NO. NO.
AIM:
This experiment illustrates some of the basic data pre-processing operations that can be
performed using WEKA-Explorer. The sample dataset used for this example is the student data
available in arff format.
PROCEDURE:
1. Loading the data. We can load the dataset into weka by clicking on open button in pre-
processing interface and selecting the appropriate file.
2. Once the data is loaded, weka will recognize the attributes and during the scan of the data weka
will compute some basic strategies on each attribute. The left panel in the above figure shows
the list of recognized attributes while the top panel indicates the names of the base relation or
table and the current working relation (which are same initially).
3. Clicking on an attribute in the left panel will show the basic statistics on the attributes for the
categorical attributes the frequency of each attribute value is shown, while for continuous
attributes we can obtain min, max, mean, standard deviation and deviation etc.,
4. The visualization in the right button panel in the form of cross-tabulation across two attributes.
5. Selecting or filtering attributes:
Removing an attribute - When we need to remove an attribute, we can do this by using the
attribute filters in weka. In the filter model panel, click on choose button, this will show a popup
window with a list of available filters.
Scroll down the list and select the “weka.filters.unsupervised.attribute.remove” filters.
6. Next click the textbox immediately to the right of the choose button. In the resulting dialog
box enter the index of the attribute to be filtered out.
7. Make sure that invert selection option is set to false. The click OK now in the filter box you
will see “Remove-R-7”.
8. Click the apply button to apply filter to this data. This will remove the attribute and create new
working relation.
9. Save the new working relation as an arff file by clicking save button on the top (button)
panel.(student.arff)
1
DISCRETIZATION:
Sometimes association rule mining can only be performed on categorical data. This requires
performing discretization on numeric or continuous attributes. In the following example let us
discretize age attribute.
Let us divide the values of age attribute into three bins (intervals).
First load the dataset into weka (student.arff)
Select the age attribute.
Activate filter-dialog box and select “WEKA.filters.unsupervised.attribute.discretize”
from the list.
To change the defaults for the filters, click on the box immediately to the right of the
choose button.
We enter the index for the attribute to be discretized. In this case the attribute is age. So,
we must enter ‘1’ corresponding to the age attribute.
Enter ‘3’ as the number of bins. Leave the remaining field values as they are.
Click OK button.
Click apply in the filter panel. This will result in a new working relation with the selected
attribute partition into 3 bins.
Save the new working relation in a file called student-data-discretized.arff.
Dataset student.arff
@relation student
@data
%
<30, high, no, fair, no
<30, high, no, excellent, no
30-40, high, no, fair, yes
2
>40, medium, no, fair, yes
>40, low, yes, fair, yes
>40, low, yes, excellent, no
30-40, low, yes, excellent, yes
<30, medium, no, fair, no
<30, low, yes, fair, no
>40, medium, yes, fair, yes
<30, medium, yes, excellent, yes
30-40, medium, no, excellent, yes
30-40, high, yes, fair, yes
>40, medium, no, excellent, no
%
3
Ex. No: 02
PREPROCESSING ON DATASET LABOR.ARFF
AIM:
This experiment illustrates some of the basic data pre-processing operations that can be
performed using WEKA-Explorer. The sample dataset used for this example is the labor data available
in arff format.
PROCEDURE:
1. Loading the data. We can load the dataset into weka by clicking on open button in pre-
processing interface and selecting the appropriate file.
2. Once the data is loaded, weka will recognize the attributes and during the scan of the data weka
will compute some basic strategies on each attribute. The left panel in the above figure shows
the list of recognized attributes while the top panel indicates the names of the base relation or
table and the current working relation (which are same initially).
3. Clicking on an attribute in the left panel will show the basic statistics on the attributes for the
categorical attributes the frequency of each attribute value is shown, while for continuous
attributes we can obtain min, max, mean, standard deviation and deviation etc.,
4. The visualization in the right button panel in the form of cross-tabulation across two attributes.
5. Selecting or filtering attributes:
Removing an attribute - When we need to remove an attribute, we can do this by using the
attribute filters in weka. In the filter model panel, click on choose button, this will show a popup
window with a list of available filters.
Scroll down the list and select the “weka.filters.unsupervised.attribute.remove” filters.
6. Next click the textbox immediately to the right of the choose button. In the resulting dialog
box enter the index of the attribute to be filtered out.
7. Make sure that invert selection option is set to false. The click OK now in the filter box you
will see “Remove-R-7”.
8. Click the apply button to apply filter to this data. This will remove the attribute and create new
working relation.
9. Save the new working relation as an arff file by clicking save button on the top (button)
panel.(labor.arff)
4
DISCRETIZATION:
Sometimes association rule mining can only be performed on categorical data. This requires
performing discretization on numeric or continuous attributes. In the following example let us
discretize age attribute.
Let us divide the values of age attribute into three bins (intervals).
First load the dataset into weka (labor.arff)
Select the age attribute.
Activate filter-dialog box and select “WEKA.filters.unsupervised.attribute.discretize”
from the list.
To change the defaults for the filters, click on the box immediately to the right of the
choose button.
We enter the index for the attribute to be discretized. In this case the attribute is age. So,
we must enter ‘1’ corresponding to the age attribute.
Enter ‘3’ as the number of bins. Leave the remaining field values as they are.
Click OK button.
Click apply in the filter panel. This will result in a new working relation with the selected
attribute partition into 3 bins.
Save the new working relation in a file called student-data-discretized.arff.
Dataset labor.arff
5
The following screenshot shows the effect of discretization
6
Ex. No: 03
ASSOCIATION RULE PROCESS ON DATASET
CONTACTLENSES.ARFF USING APRIORI ALGORITHM
AIM:
This experiment illustrates some of the basic elements of association rule mining using WEKA.
The sample dataset used for this example is contactlenses.arff
PROCEDURE:
1. Open the data file in Weka Explorer. It is presumed that the required data fields have been
discretized. In this example it is age attribute.
2. Clicking on the associate tab will bring up the interface for association rule algorithm.
3. We will use Apriori algorithm. This is the default algorithm.
4. In-order to change the parameters for the run (example support, confidence etc) we click on the
text box immediately to the right of the choose button.
Dataset contactlenses.arff
7
The following screenshot shows the association rules that were generated when apriori
algorithm is applied on the given dataset.
8
9
Ex. No: 04
ASSOCIATION RULE PROCESS ON DATASET TEST.ARFF
USING APRIORI ALGORITHM
AIM:
This experiment illustrates some of the basic elements of association rule mining using WEKA.
The sample dataset used for this example is test.arff
PROCEDURE:
1. Open the data file in Weka Explorer. It is presumed that the required data fields have been
discretized. In this example it is age attribute.
2. Clicking on the associate tab will bring up the interface for association rule algorithm.
3. We will use Apriori algorithm. This is the default algorithm.
4. In-order to change the parameters for the run (example support, confidence etc) we click on the
text box immediately to the right of the choose button.
Dataset test.arff
@relation test
@data
2005, cse
2005, it
2005, cse
2006, mech
2006, it
2006, ece
2007, it
2007, cse
10
2008, it
2008, cse
2009, it
2009, ece
The following screenshot shows the association rules that were generated when apriori
algorithm is applied on the given dataset.
11
12
Ex. No: 05
CLASSIFICATION RULE PROCESS ON DATASET
STUDENT.ARFF USING J48 ALGORITHM
AIM:
This experiment illustrates the use of j-48 classifier in weka. The sample data set used in this
experiment is “student” data available at arff format. This document assumes that appropriate data
pre-processing has been performed.
PROCEDURE:
13
Dataset student.arff
@relation student
@data
14
The following screenshot shows the classification rules that were generated when j48 algorithm
is applied on the given dataset.
15
16
Ex. No: 06
CLASSIFICATION RULE PROCESS ON DATASET
EMPLOYEE.ARFF USING J48 ALGORITHM
AIM:
This experiment illustrates the use of j-48 classifier in weka. The sample data set used in this
experiment is “employee” data available at arff format. This document assumes that appropriate data
pre-processing has been performed.
PROCEDURE:
The following screenshot shows the classification rules that were generated whenj48 algorithm
is applied on the given dataset.
18
19
Ex. No: 07
CLASSIFICATION RULE PROCESS ON DATASET
EMPLOYEE.ARFF USING ID3 ALGORITHM
AIM:
This experiment illustrates the use of id3 classifier in weka. The sample data set used in this
experiment is “employee” data available at arff format. This document assumes that appropriate data
pre-processing has been performed.
PROCEDURE:
21
22
Ex. No: 08
CLASSIFICATION RULE PROCESS ON DATASET
EMPLOYEE.ARFF USING NAIVE BAYES ALGORITHM
AIM:
This experiment illustrates the use of Naive Bayes classifier in weka. The sample data set used
in this experiment is “employee” data available at arff format. This document assumes that appropriate
data pre-processing has been performed.
PROCEDURE:
24
25
Ex. No: 09
LUSTERING RULE PROCESS ON DATASET IRIS.ARFF
USING SIMPLE K-MEANS
AIM:
This experiment illustrates the use of simple k-mean clustering with Weka explorer. The
sample data set used for this example is based on the iris data available in arff format. This document
assumes that appropriate pre-processing has been performed. This iris dataset includes 150 instances.
PROCEDURE:
1. Run the Weka explorer and load the data file iris.arff in pre-processing interface.
2. In-order to perform clustering select the ‘cluster’ tab in the explorer and click on the choose
button. This step results in a dropdown list of available clustering algorithms.
3. In this case we select ‘simple k-means’.
4. Next click in text button to the right of the choose button to get popup window shown in the
screenshots. In this window we enter six on the number of clusters and we leave the value of
the seed on as it is. The seed value is used in generating a random number which is used for
making the internal assignments of instances of clusters.
5. Once of the option have been specified. We run the clustering algorithm there we must make
sure that they are in the ‘cluster mode’ panel. The use of training set option is selected and then
we click ‘start’ button. This process and resulting window are shown in the following
screenshots.
6. The result window shows the centroid of each cluster as well as statistics on the number and
the percent of instances assigned to different clusters. Here clusters centroid are means vectors
for each cluster. This clusters can be used to characterized the cluster. For e.g., the centroid of
cluster1 shows the class iris. versicolor mean value of the sepal length is 5.4706, sepal width
2.4765, petal width 1.1294, petal length 3.7941.
7. Another way of understanding characteristic of each cluster through visualization, we can do
this, try right clicking the result set on the result. List panel and selecting the visualize cluster
assignments.
26
The following screenshot shows the clustering rules that were generated when simple k-means
algorithm is applied on the given dataset.
From the above visualization, we can understand the distribution of sepal length and petal
length in each cluster. For instance, for each cluster is dominated by petal length. In this case
by changing the color dimension to other attributes we can see their distribution with in each
of the cluster.
8. We can assure that resulting dataset which included each instance along with its assign
cluster. To do so we click the save button in the visualization window and save the result iris
k-mean. The top portion of this file is shown in the following figure.
27
28
Ex. No: 10
CLUSTERING RULE PROCESS ON DATASET STUDENT.ARFF
USING SIMPLE K-MEANS
AIM:
This experiment illustrates the use of simple k-mean clustering with Weka explorer. The
sample data set used for this example is based on the student data available in ARFF format. This
document assumes that appropriate pre-processing has been performed. This student dataset includes
14 instances.
PROCEDURE:
1. Run the Weka explorer and load the data file student.arff in pre-processing interface.
2. In-order to perform clustering select the ‘cluster’ tab in the explorer and click on the choose
button. This step results in a dropdown list of available clustering algorithms.
3. In this case we select ‘simple k-means’.
4. Next click in text button to the right of the choose button to get popup window shown in the
screenshots. In this window we enter six on the number of clusters and we leave the value of
the seed on as it is. The seed value is used in generating a random number which is used for
making the internal assignments of instances of clusters.
5. Once of the option have been specified. We run the clustering algorithm there we must make
sure that they are in the ‘cluster mode’ panel. The use of training set option is selected and then
we click ‘start’ button. This process and resulting window are shown in the following
screenshots.
6. The result window shows the centroid of each cluster as well as statistics on the number and
the percent of instances assigned to different clusters. Here clusters centroid are means vectors
for each cluster. This clusters can be used to characterized the cluster.
7. Another way of understanding characteristic of each cluster through visualization, we can do
this, try right clicking the result set on the result. List panel and selecting the visualize cluster
assignments.
Interpretation of the above visualization
From the above visualization, we can understand the distribution of age and instance number
in each cluster. For instance, for each cluster is dominated by age. In this case by changing the
color dimension to other attributes we can see their distribution with in each of the cluster.
29
8. We can assure that resulting dataset which included each instance along with its assign cluster.
To do so we click the save button in the visualization window and save the result student k-
mean. The top portion of this file is shown in the following figure.
Dataset student.arff
@relation student
@data
30
The following screenshot shows the clustering rules that were generated when simple k-means
algorithm is applied on the given dataset.
31
32
Ex. No: 11
PREDICTION ANALYSIS MODEL ON STUDENT MARKS
AIM:
Predictive Analysis on Five Subject Student marks to check whether the given marks are Pass
or Fail.
PROCEDURE:
1. Go to Start and Open Excel 2019 for creating a Dataset for Student marks.
2. Prepare an Excel sheet with the attributes Mark1, Mark2, Mark3, Mark4, Mark5 and Result.
3. Create the Two different excel sheet for Training set and Test Set.
4. The Entire prediction analysis model will works based on the records entered in training set.
5. But unlike training set the test data set will does not have the value for Result attribute it will
be replaced by the Question mark.
6. After creating the excel sheet, save the both excel sheet in CSV format as “Student Training
Set”, “Student Test Set”.
7. Run the Weka explorer and load the data file “Student Training Set” in pre-processing
interface.
8. Next, we select the “Classify” tab and click “Choose” button to select the “J48” Tree classifier.
9. Then Select the “Use training set” radio button in Test options panel.
10. We now click “Start” to generate the model. The Run information, evaluation statistic as well
as the Confusion Matrix will appear in the right side “Classifier output” panel when the model
construction is complete.
11. Now we select the “Supplied test set” radio button in Test options panel and click “Set…”
button. This wills pop-up a window which will allow you to open the file containing test
instances.
12. Then select the “Student Test Set” file as Test Instances and click on “Close” button.
13. Again click “Start” to generate the model, after the model construction is completed right-click
the “Result list” and select “Visualize classifier errors”.
14. Now Save the Result as “Student Test Set Result.arff” in this Predicted dataset the Question
Mark will be replaced by the resulting value.
33
Dataset for Student Mark:
@data
90,99,100,97,98,Pass
88,89,90,87,70,Pass
60,65,80,80,48,Pass
78,92,78,90,85,Pass
78,95,64,79,52,Pass
80,89,89,97,73,Pass
88,89,40,87,70,Pass
60,65,80,70,83,Pass
86,83,78,90,45,Pass
78,40,64,79,84,Pass
70,78,80,72,73,Pass
79,69,82,79,90,Pass
60,84,80,76,83,Pass
86,78,78,90,90,Pass
90,71,64,79,84,Pass
60,78,69,72,73,Pass
67,69,82,79,90,Pass
60,84,80,76,65,Pass
86,68,78,60,90,Pass
69,71,64,69,84,Pass
34
40,78,69,57,73,Pass
67,57,82,79,40,Pass
60,58,40,76,65,Pass
40,68,78,60,90,Pass
69,40,56,69,48,Pass
39,70,54,58,64,Fail
48,39,76,49,40,Fail
56,67,30,67,43,Fail
40,61,59,48,30,Fail
48,68,89,72,56,Fail
39,28,75,64,89,Fail
38,30,40,68,47,Fail
27,18,47,73,47,Fail
36,29,84,94,58,Fail
13,39,57,40,42,Fail
17,39,26,40,43,Fail
39,12,38,70,68,Fail
19,39,10,83,65,Fail
38,15,27,68,64,Fail
35,17,39,79,40,Fail
28,39,17,16,50,Fail
37,28,19,39,79,Fail
28,35,38,14,68,Fail
37,18,19,39,90,Fail
37,27,14,37,54,Fail
28,24,16,30,19,Fail
39,25,13,38,10,Fail
17,13,39,34,25,Fail
36,39,26,20,37,Fail
28,19,27,19,15,Fail
35
36
Student Test Set
@data
90,99,100,97,98,?
80,89,89,97,73,?
70,78,80,72,73,?
60,78,69,72,73,?
40,78,69,57,73,?
39,70,54,58,64,?
39,28,75,64,89,?
17,39,26,40,43,?
28,39,17,16,50,?
39,25,13,38,10,?
37
The following screenshot shows the classification rules that were generated when j48
algorithm is applied on the given dataset.
38
Student Test Set Result
39
Ex. No: 12
PREDICTIVE ANALYSIS MODEL FOR OPENING BANK ACCOUNT
AIM:
To Perform the Predictive Analysis model for Opening a Bank account in a particular region.
PROCEDURE:
1. Go to Start and Open Excel 2019 for creating a Dataset for Bank Account.
2. Prepare an Excel sheet with the attributes Job, Income, Age, Location and Approval.
3. Create the Two different excel sheet for Training set and Test Set.
4. The Entire prediction analysis model will works based on the records entered in training set.
5. But unlike training set the test data set will does not have the value for Approval attribute it
will be replaced by the Question mark.
6. After creating the excel sheet, save the both excel sheet in CSV format as “Bank Training Set”,
“Bank Test Set”.
7. Run the Weka explorer and load the data file “Bank Training Set” in pre-processing interface.
8. Next, we select the “Classify” tab and click “Choose” button to select the “J48” Tree classifier.
9. Then Select the “Use training set” radio button in Test options panel.
10. We now click “Start” to generate the model. The Run information, evaluation statistic as well
as the Confusion Matrix will appear in the right side “Classifier output” panel when the model
construction is complete.
11. Now we select the “Supplied test set” radio button in Test options panel and click “Set…”
button. This wills pop-up a window which will allow you to open the file containing test
instances.
12. Then select the “Bank Test Set” file as Test Instances and click on “Close” button.
13. Again click “Start” to generate the model, after the model construction is completed right-click
the “Result list” and select “Visualize classifier errors”.
14. Now Save the Result as “Bank Test Set Result.arff” in this Predicted dataset the Question Mark
will be replaced by the resulting value.
40
Dataset for Student Mark:
@data
Private,9000,18,Vellore,Yes
Private,15000,23,Vellore,Yes
Private,20000,20,Vellore,Yes
Private,30000,35,Vellore,Yes
Private,45000,40,Vellore,Yes
Private,10000,45,Vellore,Yes
Private,50000,46,Vellore,Yes
Private,5000,23,Vellore,No
Private,8000,18,Vellore,No
Private,7500,33,Vellore,No
Governtment,20000,36,Vellore,Yes
Governtment,35000,28,Vellore,Yes
Governtment,12000,18,Vellore,Yes
Governtment,8000,20,Vellore,No
Governtment,45000,30,Vellore,Yes
Governtment,8500,45,Vellore,Yes
Governtment,45000,46,Vellore,Yes
Governtment,7000,23,Vellore,No
Governtment,50000,33,Vellore,Yes
Governtment,75000,45,Vellore,Yes
41
'Daily Wage',13000,20,Vellore,Yes
'Daily Wage',5000,18,Vellore,No
'Daily Wage',15000,20,Vellore,Yes
'Daily Wage',20000,24,Vellore,Yes
'Daily Wage',7500,33,Vellore,No
'Daily Wage',8000,45,Vellore,No
'Daily Wage',18000,46,Vellore,Yes
'Daily Wage',9000,25,Vellore,Yes
'Daily Wage',20000,35,Vellore,Yes
'Daily Wage',12000,45,Vellore,Yes
Bussniess,8000,24,Vellore,No
Bussniess,20000,20,Vellore,Yes
Bussniess,30000,18,Vellore,Yes
Bussniess,25000,48,Vellore,Yes
Bussniess,75000,38,Vellore,Yes
Bussniess,25000,40,Vellore,Yes
Bussniess,15000,23,Vellore,Yes
Bussniess,80000,33,Vellore,Yes
Bussniess,90000,40,Vellore,Yes
Bussniess,100000,28,Vellore,Yes
42
Bank Training Set
@data
Private,9000,18,Vellore,?
Private,15000,23,Vellore,?
Private,20000,20,Vellore,?
Private,30000,35,Vellore,?
Private,45000,40,Vellore,?
Governtment,8500,45,Vellore,?
Governtment,45000,46,Vellore,?
Governtment,7000,23,Vellore,?
Governtment,50000,33,Vellore,?
Governtment,75000,45,Vellore,?
'Daily Wage',8000,45,Vellore,?
'Daily Wage',18000,46,Vellore,?
'Daily Wage',9000,25,Vellore,?
'Daily Wage',20000,35,Vellore,?
'Daily Wage',12000,45,Vellore,?
Bussniess,25000,40,Vellore,?
Bussniess,15000,23,Vellore,?
Bussniess,80000,33,Vellore,?
Bussniess,90000,40,Vellore,?
Bussniess,100000,28,Vellore,?
43
The following screenshot shows the classification rules that were generated when j48
algorithm is applied on the given dataset.
44
Bank Test Set
@data
Private,9000,18,Vellore,1,Yes,?
Private,15000,23,Vellore,1,Yes,?
Private,20000,20,Vellore,1,Yes,?
Private,30000,35,Vellore,1,Yes,?
Private,45000,40,Vellore,1,Yes,?
Governtment,8500,45,Vellore,1,Yes,?
Governtment,45000,46,Vellore,1,Yes,?
Governtment,7000,23,Vellore,-1,No,?
Governtment,50000,33,Vellore,1,Yes,?
Governtment,75000,45,Vellore,1,Yes,?
'Daily Wage',8000,45,Vellore,-1,No,?
'Daily Wage',18000,46,Vellore,1,Yes,?
'Daily Wage',9000,25,Vellore,1,Yes,?
'Daily Wage',20000,35,Vellore,1,Yes,?
'Daily Wage',12000,45,Vellore,1,Yes,?
Bussniess,25000,40,Vellore,1,Yes,?
Bussniess,15000,23,Vellore,1,Yes,?
Bussniess,80000,33,Vellore,1,Yes,?
Bussniess,90000,40,Vellore,1,Yes,?
Bussniess,100000,28,Vellore,1,Yes,?
45
46
Ex. No: 13
PREDICTIVE ANALYSIS ON ATTENDANCE PERCENTAGE
AIM:
To Perform the Predictive Analysis on Attendance percentage to allow the students to appear
in the semester exam.
PROCEDURE:
1. Go to Start and Open Excel 2019 for creating a Dataset for Attendance.
2. Prepare an Excel sheet with the attributes Attendance Percentage, Condonation Fees,
Composite of Attendance, Redo Candidate, Married Girl and Eligibility.
3. Create the Two different excel sheet for Training set and Test Set.
4. The Entire prediction analysis model will works based on the records entered in training set.
5. But unlike training set the test data set will does not have the value for Eligibility attribute it
will be replaced by the Question mark.
6. After creating the excel sheet, save the both excel sheet in CSV format as “Attendance Training
Set”, “Attendance Test Set”.
7. Run the Weka explorer and load the data file “Attendance Training Set” in pre-processing
interface.
8. Next, we select the “Classify” tab and click “Choose” button to select the “J48” Tree classifier.
9. Then Select the “Use training set” radio button in Test options panel.
10. We now click “Start” to generate the model. The Run information, evaluation statistic as well
as the Confusion Matrix will appear in the right side “Classifier output” panel when the model
construction is complete.
11. Now we select the “Supplied test set” radio button in Test options panel and click “Set…”
button. This wills pop-up a window which will allow you to open the file containing test
instances.
12. Then select the “Attendance Test Set” file as Test Instances and click on “Close” button.
13. Again click “Start” to generate the model, after the model construction is completed right-click
the “Result list” and select “Visualize classifier errors”.
14. Now Save the Result as “Attendance Test Set Result.arff” in this Predicted dataset the Question
Mark will be replaced by the resulting value.
47
Dataset for Student Mark:
@data
30,No,No,No,Yes,No
35,No,No,No,Yes,No
40,No,No,No,Yes,No
45,No,No,No,Yes,No
50,No,No,No,Yes,No
54,No,No,No,Yes,No
30,No,No,No,No,No
35,No,No,No,No,No
40,No,No,No,No,No
45,No,No,No,No,No
50,No,No,No,No,No
54,No,No,No,No,No
55,No,Yes,No,No,Yes
60,No,Yes,No,No,Yes
64,No,Yes,No,No,Yes
55,No,No,No,No,No
60,No,No,No,No,No
64,No,No,No,No,No
55,No,No,No,Yes,Yes
60,No,No,No,Yes,Yes
48
64,No,No,No,Yes,Yes
65,Yes,No,No,No,Yes
70,Yes,No,No,No,Yes
74,Yes,No,No,No,Yes
65,No,No,No,No,No
70,No,No,No,No,No
74,No,No,No,No,No
65,No,No,No,Yes,Yes
70,No,No,No,Yes,Yes
74,No,No,No,Yes,Yes
75,No,No,No,Yes,Yes
80,No,No,No,Yes,Yes
85,No,No,No,Yes,Yes
90,No,No,No,Yes,Yes
95,No,No,No,Yes,Yes
100,No,No,No,Yes,Yes
75,No,No,No,No,Yes
80,No,No,No,No,Yes
85,No,No,No,No,Yes
90,No,No,No,No,Yes
95,No,No,No,No,Yes
100,No,No,No,No,Yes
49
@data
30,No,No,No,Yes,?
54,No,No,No,No,?
55,No,Yes,No,No,?
60,No,No,No,Yes,?
64,No,No,No,Yes,?
65,Yes,No,No,No,?
74,Yes,No,No,No,?
90,No,No,No,Yes,?
75,No,No,No,No,?
The following screenshot shows the classification rules that were generated when j48
algorithm is applied on the given dataset.
50
51
Attendance Test Set Result
@data
30,No,No,No,Yes,1,No,?
54,No,No,No,No,1,No,?
55,No,Yes,No,No,-1,Yes,?
60,No,No,No,Yes,-1,Yes,?
64,No,No,No,Yes,-1,Yes,?
65,Yes,No,No,No,-1,Yes,?
74,Yes,No,No,No,-1,Yes,?
90,No,No,No,Yes,-1,Yes,?
75,No,No,No,No,-1,Yes,?
52
Ex. No: 14
PREDICTIVE ANALYSIS FOR CHECK THE FRUIT IS
APPLE OR NOT
AIM:
To Prepare the Predictive Analysis modal on the given attribute to identify the fruit is Apple
or Not.
PROCEDURE:
1. Go to Start and Open Excel 2019 for creating a Dataset for Apple.
2. Prepare an Excel sheet with the attributes Diameter, Mass, Volume, Surface Area, Color and
Fruit.
3. Create the Two different excel sheet for Training set and Test Set.
4. The Entire prediction analysis model will works based on the records entered in training set.
5. But unlike training set the test data set will does not have the value for Fruit attribute it will be
replaced by the Question mark.
6. After creating the excel sheet, save the both excel sheet in CSV format as “Apple Training
Set”, “Apple Test Set”.
7. Run the Weka explorer and load the data file “Apple Training Set” in pre-processing interface.
8. Next, we select the “Classify” tab and click “Choose” button to select the “J48” Tree classifier.
9. Then Select the “Use training set” radio button in Test options panel.
10. We now click “Start” to generate the model. The Run information, evaluation statistic as well
as the Confusion Matrix will appear in the right side “Classifier output” panel when the model
construction is complete.
11. Now we select the “Supplied test set” radio button in Test options panel and click “Set…”
button. This wills pop-up a window which will allow you to open the file containing test
instances.
12. Then select the “Apple Test Set” file as Test Instances and click on “Close” button.
13. Again click “Start” to generate the model, after the model construction is completed right-click
the “Result list” and select “Visualize classifier errors”.
14. Now Save the Result as “Apple Test Set Result.arff” in this Predicted dataset the Question
Mark will be replaced by the resulting value.
53
Dataset for Student Mark:
@data
34.5,150,180,3737.385,Red,Yes
35,155,181,3846.5,Yellow,Yes
35.5,160,182,3957.185,Green,Yes
36,165,183,4069.44,Red,Yes
36.5,170,184,4183.265,Yellow,Yes
37,175,185,4298.66,Green,Yes
37.5,180,186,4415.625,Red,Yes
38,185,187,4534.16,Yellow,Yes
38.5,190,188,4654.265,Green,Yes
39,195,189,4775.94,Red,Yes
39.5,200,190,4899.185,Yellow,Yes
40,205,180,5024,Green,Yes
40.5,210,181,5150.385,Red,Yes
41,215,182,5278.34,Yellow,Yes
41.5,220,183,5407.865,Green,Yes
42,225,184,5538.96,Red,Yes
42.5,230,185,5671.625,Yellow,Yes
43,235,186,5805.86,Green,Yes
43.5,240,187,5941.665,Red,Yes
54
44,245,188,6079.04,Yellow,Yes
44.5,250,189,6217.985,Green,Yes
45,150,190,6358.5,Red,Yes
45.5,155,180,6500.585,Yellow,Yes
46,160,181,6644.24,Green,Yes
46.5,165,182,6789.465,Red,Yes
47,170,183,6936.26,Yellow,Yes
47.5,175,184,7084.625,Green,Yes
48,180,185,7234.56,Red,Yes
48.5,185,186,7386.065,Yellow,Yes
49,190,187,7539.14,Green,Yes
49.5,195,188,7693.785,Red,Yes
50,200,189,7850,Yellow,Yes
50.5,205,190,8007.785,Green,Yes
51,210,180,8167.14,Red,Yes
51.5,215,181,8328.065,Yellow,Yes
52,220,182,8490.56,Green,Yes
52.5,225,183,8654.625,Red,Yes
53,230,184,8820.26,Yellow,Yes
53.5,235,185,8987.465,Green,Yes
54,240,186,9156.24,Red,Yes
54.5,245,187,9326.585,Yellow,Yes
55,250,188,9498.5,Green,Yes
55.5,150,189,9671.985,Red,Yes
56,155,190,9847.04,Yellow,Yes
56.5,160,180,10023.665,Green,Yes
57,165,181,10201.86,Red,Yes
57.5,170,182,10381.625,Yellow,Yes
58,175,183,10562.96,Green,Yes
58.5,180,184,10745.865,Red,Yes
55
59,185,185,10930.34,Yellow,Yes
59.5,190,186,11116.385,Green,Yes
60,195,187,11304,Red,Yes
60.5,200,188,11493.185,Yellow,Yes
61,205,189,11683.94,Green,Yes
61.5,210,190,11876.265,Red,Yes
62,215,180,12070.16,Yellow,Yes
62.5,220,181,12265.625,Green,Yes
63,225,182,12462.66,Red,Yes
63.5,230,183,12661.265,Yellow,Yes
64,235,184,12861.44,Green,Yes
64.5,240,185,13063.185,Red,Yes
65,245,186,13266.5,Yellow,Yes
65.5,250,187,13471.385,Green,Yes
66,150,188,13677.84,Red,Yes
66.5,155,189,13885.865,Yellow,Yes
67,160,190,14095.46,Green,Yes
67.5,165,180,14306.625,Red,Yes
68,170,181,14519.36,Yellow,Yes
68.5,175,182,14733.665,Green,Yes
69,180,183,14949.54,Red,Yes
69.5,185,184,15166.985,Yellow,Yes
70,251,191,15386,Red,No
70.5,255,192,15606.585,Yellow,No
71,300,193,15828.74,Green,No
71.5,345,194,16052.465,Red,No
72,390,195,16277.76,Yellow,No
72.5,435,196,16504.625,Green,No
73,480,197,16733.06,Red,No
73.5,525,198,16963.065,Yellow,No
56
74,570,199,17194.64,Green,No
74.5,615,200,17427.785,Red,No
75,660,201,17662.5,Yellow,No
75.5,705,202,17898.785,Green,No
76,750,203,18136.64,Red,No
76.5,795,204,18376.065,Yellow,No
77,840,205,18617.06,Green,No
77.5,885,206,18859.625,Red,No
78,930,207,19103.76,Yellow,No
78.5,975,208,19349.465,Green,No
79,1020,209,19596.74,Red,No
79.5,1065,210,19845.585,Yellow,No
80,1110,211,20096,Green,No
80.5,1155,212,20347.985,Red,No
81,1200,213,20601.54,Yellow,No
81.5,1245,214,20856.665,Green,No
82,1290,215,21113.36,Red,No
82.5,1335,216,21371.625,Yellow,No
83,1380,217,21631.46,Green,No
83.5,1425,218,21892.865,Red,No
57
Apple Test Set
@data
34.5,150,180,3737.385,Red,?
35,155,181,3846.5,Yellow,?
35.5,160,182,3957.185,Green,?
36,165,183,4069.44,Red,?
67.5,165,180,14306.625,Red,?
68,170,181,14519.36,Yellow,?
68.5,175,182,14733.665,Green,?
69,180,183,14949.54,Red,?
69.5,185,184,15166.985,Yellow,?
70,251,191,15386,Red,?
70.5,255,192,15606.585,Yellow,?
71,300,193,15828.74,Green,?
71.5,345,194,16052.465,Red,?
72,390,195,16277.76,Yellow,?
81,1200,213,20601.54,Yellow,?
81.5,1245,214,20856.665,Green,?
82.5,1335,216,21371.625,Yellow,?
83,1380,217,21631.46,Green,?
83.5,1425,218,21892.865,Red,?
58
The following screenshot shows the classification rules that were generated when j48
algorithm is applied on the given dataset.
59
Apple Test Set
@data
34.5,150,180,3737.385,Red,1,Yes,?
35,155,181,3846.5,Yellow,1,Yes,?
35.5,160,182,3957.185,Green,1,Yes,?
36,165,183,4069.44,Red,1,Yes,?
67.5,165,180,14306.625,Red,1,Yes,?
68,170,181,14519.36,Yellow,1,Yes,?
68.5,175,182,14733.665,Green,1,Yes,?
69,180,183,14949.54,Red,1,Yes,?
69.5,185,184,15166.985,Yellow,1,Yes,?
70,251,191,15386,Red,-1,No,?
70.5,255,192,15606.585,Yellow,-1,No,?
71,300,193,15828.74,Green,-1,No,?
71.5,345,194,16052.465,Red,-1,No,?
72,390,195,16277.76,Yellow,-1,No,?
81,1200,213,20601.54,Yellow,-1,No,?
81.5,1245,214,20856.665,Green,-1,No,?
82.5,1335,216,21371.625,Yellow,-1,No,?
83,1380,217,21631.46,Green,-1,No,?
83.5,1425,218,21892.865,Red,-1,No,?
60
61
Ex. No: 15
PREDICTION ANALYSIS MODEL ON IRIS FLOWER
AIM:
To Perform the Predictive Analysis on the given attribute to find the type of Iris Flower.
PROCEDURE:
1. Go to Start and Open Excel 2019 for creating a Dataset for Iris.
2. Prepare an Excel sheet with the attributes Sepal Length, Sepal Width, Petal Length, Petal
Width and Class.
3. Create the Two different excel sheet for Training set and Test Set.
4. The Entire prediction analysis model will works based on the records entered in training set.
5. But unlike training set the test data set will does not have the value for Class attribute it will
be replaced by the Question mark.
6. After creating the excel sheet, save the both excel sheet in CSV format as “Iris Training Set”,
“Iris Test Set”.
7. Run the Weka explorer and load the data file “Iris Training Set” in pre-processing interface.
8. Next, we select the “Classify” tab and click “Choose” button to select the “J48” Tree classifier.
9. Then Select the “Use training set” radio button in Test options panel.
10. We now click “Start” to generate the model. The Run information, evaluation statistic as well
as the Confusion Matrix will appear in the right side “Classifier output” panel when the model
construction is complete.
11. Now we select the “Supplied test set” radio button in Test options panel and click “Set…”
button. This wills pop-up a window which will allow you to open the file containing test
instances.
12. Then select the “Iris Test Set” file as Test Instances and click on “Close” button.
13. Again click “Start” to generate the model, after the model construction is completed right-click
the “Result list” and select “Visualize classifier errors”.
14. Now Save the Result as “Iris Test Set Result.arff” in this Predicted dataset the Question Mark
will be replaced by the resulting value.
62
Dataset for Student Mark:
@data
4.3,2.3,1,0.1,Iris-setosa
4.4,2.4,1.1,0.2,Iris-setosa
4.5,2.5,1.2,0.3,Iris-setosa
4.6,2.6,1.3,0.4,Iris-setosa
4.7,2.7,1.4,0.5,Iris-setosa
4.8,2.8,1.5,0.6,Iris-setosa
4.9,2.9,1.6,0.1,Iris-setosa
5,3,1.7,0.2,Iris-setosa
5.1,3.1,1.8,0.3,Iris-setosa
5.2,3.2,1.9,0.4,Iris-setosa
5.3,3.3,1,0.5,Iris-setosa
5.4,3.4,1.1,0.6,Iris-setosa
5.5,3.5,1.2,0.1,Iris-setosa
5.6,3.6,1.3,0.2,Iris-setosa
5.7,3.7,1.4,0.3,Iris-setosa
4.3,3.8,1.5,0.4,Iris-setosa
4.4,3.9,1.6,0.5,Iris-setosa
4.5,4,1.7,0.6,Iris-setosa
4.6,4.1,1.8,0.4,Iris-setosa
4.7,4.2,1.9,0.5,Iris-setosa
63
4.9,2,3,1,Iris-versicolor
5,2.1,3.1,1.1,Iris-versicolor
5.1,2.2,3.2,1.2,Iris-versicolor
5.2,2.3,3.3,1.3,Iris-versicolor
5.3,2.4,3.4,1.4,Iris-versicolor
5.4,2.5,3.5,1.5,Iris-versicolor
5.5,2.6,3.6,1.6,Iris-versicolor
5.6,2.7,3.7,1.7,Iris-versicolor
5.7,2.8,3.8,1.8,Iris-versicolor
5.8,2.9,3.9,1,Iris-versicolor
5.9,3,4,1.1,Iris-versicolor
6,3.1,4.1,1.2,Iris-versicolor
6.1,3.2,4.2,1.3,Iris-versicolor
6.2,3.3,4.3,1.4,Iris-versicolor
6.3,3.4,4.4,1.5,Iris-versicolor
6.4,2,4.5,1.6,Iris-versicolor
6.5,2.1,4.6,1.7,Iris-versicolor
6.6,2.2,4.7,1.8,Iris-versicolor
6.7,2.3,4.8,1,Iris-versicolor
6.8,2.4,4.9,1.1,Iris-versicolor
6.9,2.5,5,1.2,Iris-versicolor
7,2.6,5.1,1.3,Iris-versicolor
4.9,2.5,4.5,1.4,Iris-virginica
5,2.6,4.6,1.5,Iris-virginica
5.1,2.7,4.7,1.6,Iris-virginica
5.2,2.8,4.8,1.7,Iris-virginica
5.3,2.9,4.9,1.8,Iris-virginica
5.4,3,5,1.9,Iris-virginica
5.5,3.1,5.1,2,Iris-virginica
5.6,3.2,5.2,2.1,Iris-virginica
64
5.7,3.3,5.3,2.2,Iris-virginica
5.8,2.5,5.4,2.3,Iris-virginica
5.9,2.6,5.5,2.4,Iris-virginica
6,2.7,5.6,2.5,Iris-virginica
6.1,2.8,5.7,1.4,Iris-virginica
6.2,2.9,5.8,1.5,Iris-virginica
6.3,3,5.9,1.6,Iris-virginica
6.4,3.1,6,1.7,Iris-virginica
6.5,3.2,6.1,1.8,Iris-virginica
6.6,3.3,6.2,1.9,Iris-virginica
6.7,2.5,6.3,2,Iris-virginica
6.8,2.6,6.4,2.1,Iris-virginica
6.9,2.7,6.5,2.2,Iris-virginica
7,2.8,6.6,2.3,Iris-virginica
7.1,2.9,6.7,2.4,Iris-virginica
7.2,3,6.8,2.5,Iris-virginica
7.3,3.1,6.9,1.4,Iris-virginica
7.4,3.2,4.5,1.5,Iris-virginica
7.5,3.3,4.6,1.6,Iris-virginica
7.6,2.5,4.7,1.7,Iris-virginica
7.7,2.6,4.8,1.8,Iris-virginica
7.8,2.7,4.9,1.9,Iris-virginica
7.9,2.8,5,2,Iris-virginica
65
Iris Training Set
@data
4.3,2.3,1,0.1,?
4.4,2.4,1.1,0.2,?
4.5,2.5,1.2,0.3,?
4.6,4.1,1.8,0.4,?
4.7,4.2,1.9,0.5,?
4.9,2,3,1,?
5,2.1,3.1,1.1,?
5.1,2.2,3.2,1.2,?
6.9,2.5,5,1.2,?
7,2.6,5.1,1.3,?
4.9,2.5,4.5,1.4,?
5,2.6,4.6,1.5,?
7.7,2.6,4.8,1.8,?
7.8,2.7,4.9,1.9,?
7.9,2.8,5,2,?
66
The following screenshot shows the classification rules that were generated when j48
algorithm is applied on the given dataset.
67
Iris Test Set Result
@data
4.3,2.3,1,0.1,1,Iris-setosa,?
4.4,2.4,1.1,0.2,1,Iris-setosa,?
4.5,2.5,1.2,0.3,1,Iris-setosa,?
4.6,4.1,1.8,0.4,1,Iris-setosa,?
4.7,4.2,1.9,0.5,1,Iris-setosa,?
4.9,2,3,1,-1,Iris-versicolor,?
5,2.1,3.1,1.1,-1,Iris-versicolor,?
5.1,2.2,3.2,1.2,-1,Iris-versicolor,?
6.9,2.5,5,1.2,-1,Iris-versicolor,?
7,2.6,5.1,1.3,-1,Iris-versicolor,?
4.9,2.5,4.5,1.4,-1,Iris-virginica,?
5,2.6,4.6,1.5,-1,Iris-virginica,?
7.7,2.6,4.8,1.8,-1,Iris-virginica,?
7.8,2.7,4.9,1.9,-1,Iris-virginica,?
7.9,2.8,5,2,-1,Iris-virginica,?
68
69