Dm&pa Lab Manual
Dm&pa Lab Manual
LAB MANUAL
INDEX
1 Create a dataset using ARFF and CSV formats and load into the 24-30
WEKA explorer.
2 Perform the following pre-processing filters on “Weather” Dataset 31-32
i)Add ii)Remove iii)Discretize iv)Replace Missing Values
v)Normalize
3 a)List all the Categorical attributes and the Real valued attributes in 33-42
“German Credit” dataset.
b)Generate strong Association rules by using Apriori algorithm on
“German Credit” dataset with min_sup=60% and min_conf=80%
4 a) Implement the Classification using Decision Tree algorithm on 43-45
‘Weather’ dataset. Draw the confusion matrix and report the model
with accuracy.
b) Implement Bayesian Classification and analyze the result on
1
DM & PA LAB DEPARTMENT OF CSE (DS)
‘iris’ Dataset.
c) Rank the performance of j48, PART and oneR Algorithms on
‘Weather’ dataset using Experimenter.
d) Perform an Experiment using the ‘Knowledge Flow’ in
Weka3.8.1 tool.
1. Download the software as your requirements from the below given link.
http://www.cs.waikato.ac.nz/ml/weka/downloading.html
2. The Java is mandatory for installation of WEKA so if you have already Java on your
machine then download only WEKA else download the software with JVM.
3. Then open the file location and double click on the file
4. Click Next
2
DM & PA LAB DEPARTMENT OF CSE
(DS)
5. Click I Agree.
DM & PA LAB DEPARTMENT OF CSE (DS)
6. As your requirement do the necessary changes of settings and click Next. Full and
Associate files are the recommended settings.
4
DM & PA LAB DEPARTMENT OF CSE
(DS)
8. If you want a shortcut then check the box and click Install.
9. The Installation will start wait for a while it will finish within a minute.
DM & PA LAB DEPARTMENT OF CSE (DS)
11. Hurray !!!!!!! That’s all click on the Finish and take a shovel and start Mining. Best of
Luck.
6
DM & PA LAB DEPARTMENT OF CSE (DS)
This is the GUI you get when started. You have 4 options Explorer, Experimenter,
KnowledgeFlow and Simple CLI.
Understand the features of WEKA tool kit such as Explorer, Knowledge flow interface,
Experimenter, command-line interface.
Ans: WEKA
The Weka GUI Chooser (class weka.gui.GUIChooser) provides a starting point for
launching Weka’s main GUI applications and supporting tools. If one prefers a MDI (“multiple
document interface”) appearance, then this is provided by an alternative launcher called “Main”
7
DM & PA LAB DEPARTMENT OF CSE (DS)
(classweka.gui.Main). The GUI Chooser consists of four buttons—one for each of the four major
Weka applications—and four menus.
• Explorer An environment for exploring data with WEKA (the rest of this
Documentationdeals with this application in more detail).
•
ExperimenterAn environment for performing experiments and conducting statistical testsbetween
learning schemes.
• Knowledge Flow This environment supports essentially the same functions as the Explorer
butwith a drag-and-drop interface. One advantage is that it supports incremental learning.
8
DM & PA LAB DEPARTMENT OF CSE (DS)
1. Explorer
At the very top of the window, just below the title bar, is a row of tabs. When the Explorer
is first started only the first tab is active; the others are grayed out. This is because it is
necessary to open (and potentially pre-process) a data set before starting to explore the data.
The tabs are as follows:
Once the tabs are active, clicking on them flicks between different screens, on which the
respective actions can be performed. The bottom area of the window (including the status box, the
log button, and the Weka bird) stays visible regardless of which section you are in. The Explorer
can be easily extended with custom tabs. The Wiki article “Adding tabs in the Explorer”
explains this in detail.
9
DM & PA LAB DEPARTMENT OF CSE (DS)
An ARFF (= Attribute-Relation File Format) file is an ASCII text file that describes a list of
instances sharing a set of attributes.
ARFF files are not the only format one can load, but all files that can be converted with
Weka’s “core converters”. The following formats are currently supported:
• ARFF (+ compressed)
• C4.5
• CSV
• libsvm
• binary serialized instances
• XRFF (+ compressed)
Overview
ARFF files have two distinct sections. The first section is the Header information, which is
followed the Data information. The Header of the ARFF file contains the name of the relation, a
list of the attributes (the columns in the data), and their types.
2. Sources:
10
DM & PA LAB DEPARTMENT OF CSE (DS)
@RELATION iris
@ATTRIBUTE sepal length
NUMERIC @ATTRIBUTE sepal width
NUMERIC @ATTRIBUTE petal length
NUMERIC @ATTRIBUTE petal width
NUMERIC
@ATTRIBUTE class {Iris-setosa, Iris-versicolor, Iris-irginica} The Data of the ARFF file looks
like the following:
@DATA
5.1,3.5,1.4,0.2,Iris-
setosa
4.9,3.0,1.4,0.2,Iris-
setosa
4.7,3.2,1.3,0.2,Iris-
setosa
4.6,3.1,1.5,0.2,Iris-
setosa
5.0,3.6,1.4,0.2,Iris-
setosa
5.4,3.9,1.7,0.4,Iris-
setosa
4.6,3.4,1.4,0.3,Iris-
setosa
5.0,3.4,1.5,0.2,Iris-
setosa
4.4,2.9,1.4,0.2,Iris-
setosa
4.9,3.1,1.5,0.1,Iris-
setosa
11
DM & PA LAB DEPARTMENT OF CSE (DS)
The ARFF Header section of the file contains the relation declaration and at-
tribute declarations.
The relation name is defined as the first line in the ARFF file. The format is: @relation <relation-
name>
where<relation-name> is a string. The string must be quoted if the name includes spaces.
12
DM & PA LAB DEPARTMENT OF CSE (DS)
Attribute declarations take the form of an ordered sequence of @attribute statements. Each
attribute in the data set has its own @attribute statement which uniquely defines the name
of that attribute and it’s data type. The order the attributes are declared indicates
thecolumn position in the data section of the file. For example, if an attribute is the third
one declared then Weka expects that all that attributes values will be found in the third
comma delimited column.
@attribute <attribute-name><datatype>
where the <attribute-name> must start with an alphabetic character. If spaces are to be
included in the name then the entire name must be quoted.
• numeric
• integer is treated as numeric
• real is treated as numeric
• <nominal-specification>
• string
• date [<date-format>]
• relational for multi-instance data (for future use)
Numeric attributes
13
DM & PA LAB DEPARTMENT OF CSE (DS)
Nominal attributes
String attributes
String attributes allow us to create attributes containing arbitrary textual values. This is very
useful in text-mining applications, as we can create datasets with string attributes, then
write Weka Filters to manipulate strings (like String- ToWordVectorFilter). String
attributes are declared as follows:
Date attributes
Date attribute declarations take the form: @attribute <name> date [<date-format>] where
<name> is the name for the attribute and <date-format> is an optional string specifying how
date values should be parsed and printed (this is the same format used by
SimpleDateFormat). The default format string accepts the ISO-8601 combined date and
time format: yyyy-MM-dd’T’HH:mm:ss. Dates must be specified in the data section as the
corresponding string representations of the date/time (see example below).
Relational attributes
14
DM & PA LAB DEPARTMENT OF CSE (DS)
The ARFF Data section of the file contains the data declaration line and the actual instance
lines.
@data
Attribute values for each instance are delimited by commas. They must appear in the order
that they were declared in the header section (i.e. the data corresponding to the nth
@attribute declaration is always the nth field of the attribute).
Values of string and nominal attributes are case sensitive, and any that contain space or the
comment-delimiter character % must be quoted. (The code suggests that double-quotes are
acceptable and that a backslash will escape individual characters.)
15
DM & PA LAB DEPARTMENT OF CSE (DS)
Dates must be specified in the data section using the string representation specified in the
attribute declaration.
For example:
@RELATION Timestamps
@ATTRIBUTE timestamp DATE "yyyy-MM-ddHH:mm:ss" @DATA
"2001-04-03
12:12:12" "2001-05-
03 12:59:55"
Relational data must be enclosed within double quotes ”. For example an instance of
theMUSK1 dataset (”...” denotes an omission):
MUSK-188,"42,...,30",1
16
DM & PA LAB DEPARTMENT OF CSE (DS)
• contact-lens.arff
• cpu.arff
• cpu.with-vendor.arff
• diabetes.arff
• glass.arff
• ionospehre.arff
17
DM & PA LAB DEPARTMENT OF CSE
(DS)
• ReutersCorn-test.arff
• ReutersGrain-train.arff
• ReutersGrain-test.arff
• segment-challenge.arff
• segment-test.arff
• soybean.arff
• supermarket.arff
• vote.arff
• weather.arff
• weather.nominal.arff
1. outlook
2. temperature
3. humidity
4. windy
5. play
DM & PA LAB DEPARTMENT OF CSE
(DS)
1. sunny
2. overcast
3. rainy
DM & PA LAB DEPARTMENT OF CSE
(DS)
Ans: @relation
weather.symbolic@data
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
sunny,mild,high,FALSE,no
sunny,cool,normal,FALSE,yes
rainy,mild,normal,FALSE,yes
sunny,mild,normal,TRUE,yes
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes
rainy,mild,high,TRUE,no
Create a dataset using ARFF and CSV formats and load into the WEKA Explorer.
Description:
We need to create a Weather table with training data set which includes attributes like outlook,
temperature, humidity, windy, play.
Procedure:
Steps:
@relation weather
@attribute outlook {sunny,rainy,overcast}
@attribute temparature numeric
@attribute humidity numeric
@attribute windy {true,false}
@attribute play {yes,no}
@data
sunny,85.0,85.0,false,no
overcast,80.0,90.0,true,no
sunny,83.0,86.0,false,yes
rainy,70.0,86.0,false,yes
rainy,68.0,80.0,false,yes
rainy,65.0,70.0,true,no
overcast,64.0,65.0,false,y
es
sunny,72.0,95.0,true,no
sunny,69.0,70.0,false,yes
rainy,75.0,80.0,false,yes
24
DM & PA LAB DEPARTMENT OF CSE (DS)
Result:
25
DM & PA LAB DEPARTMENT OF CSE (DS)
Real world databases are highly influenced to noise, missing and inconsistency due to their queue
size so the data can be pre-processed to improve the quality of data and missing results and it also
improves the efficiency.
1) Add
2) Remove
3) Normalization
4) Discretize
5) Replace missing Values
Procedure:
@relation weather
@attribute outlook {sunny,rainy,overcast}
@attribute temparature numeric
@attribute humidity numeric
@attribute windy {true,false}
@attribute play {yes,no}
@data
sunny,85.0,85.0,false,no
overcast,80.0,90.0,true,no
sunny,83.0,86.0,false,yes
rainy,70.0,86.0,false,yes
rainy,68.0,80.0,false,yes
rainy,65.0,70.0,true,no
overcast,64.0,65.0,false,yes
sunny,72.0,95.0,true,no
sunny,69.0,70.0,false,yes
rainy,75.0,80.0,false,yes
26
DM & PA LAB DEPARTMENT OF CSE (DS)
Procedure:
27
DM & PA LAB DEPARTMENT OF CSE (DS)
8) Select the attribute Add.
9) A new window is opened.
10) In that we enter attribute index, type, data format, nominal label values for Climate.
11) Click on OK.
12) Press the Apply button, then a new attribute is added to the Weather Table.
13) Save the file.
14) Click on the Edit button, it shows a new Weather Table on Weka.
Procedure:
28
DM & PA LAB DEPARTMENT OF CSE (DS)
3) Click on open file.
4) Select Weather.arff file and click on open.
5) Click on Choose button and select the Filters option.
6) In Filters, we have Supervised and Unsupervised data.
7) Click on Unsupervised data.
8) Select the attribute Remove.
9) Select the attributes windy, play to Remove.
10) Click Remove button and then Save.
11) Click on the Edit button, it shows a new Weather Table on Weka.
29
DM & PA LAB DEPARTMENT OF CSE (DS)
Weather Table after removing attributes WINDY, PLAY:
Procedure:
Procedure:
31
DM & PA LAB DEPARTMENT OF CSE (DS)
OUTPUT:
Missing values
32
DM & PA LAB DEPARTMENT OF CSE (DS)
Result:
EXPERIMENT NO: 5
Aim:
a) List all the categorical attributes and Real-valued attributes in “German Credit” Dataset.
33
DM & PA LAB DEPARTMENT OF CSE (DS)
Task 1: Credit Risk Assessment
Description: The business of banks is making loans. Assessing the credit worthiness of an
applicant is of crucial importance. You have to develop a system to help a loan officer decide
whether the credit of a customer is good. Or bad. A bank’s business rules regarding
loans must consider two opposing factors. On th one han, a bank wants to make as many
loans as possible.
Interest on these loans is the banks profit source. On the other hand, a bank can not afford to
make too many bad loans. Too many bad loans could lead to the collapse of the bank. The
bank’s loan policy must involved a compromise. Not too strict and not too lenient.
To do the assignment, you first and foremost need some knowledge about the world of credit.
You can acquire such knowledge in a number of ways.
1. Knowledge engineering: Find a loan officer who is willing to talk. Interview her and try to
represent her knowledge in a number of ways.
2. Books: Find some training manuals for loan officers or perhaps a suitable textbook on
finance. Translate this knowledge from text from to production rule form.
3. Common sense: Imagine yourself as a loan officer and make up reasonable rules which can
be used to judge the credit worthiness of a loan applicant.
4. Case histories: Find records of actual cases where competent loan officers correctly
judged when and not to. Approve a loan application.
Actual historical credit data is not always easy to come by because of confidentiality
rules. Here is one such data set. Consisting of 1000 actual cases collected in Germany.
In spite of the fact that the data is German, you should probably make use of it for this
assignment(Unless you really can consult a real loan officer!)
There are 20 attributes used in judging a loan applicant( ie., 7 Numerical attributes and 13
Categoricl or Nominal attributes). The goal is the classify the applicant into one of two categories.
Good or Bad.
1. Checking_Status
2. Duration
3. Credit_history
4. Purpose
34
DM & PA LAB DEPARTMENT OF CSE (DS)
5. Credit_amout
6. Savings_status
7. Employment
8. Installment_Commitment
9. Personal_status
10. Other_parties
11. Residence_since
12. Property_Magnitude
13. Age
14. Other_payment_plans
15. Housing
16. Existing_credits
17. Job
18. Num_dependents
19. Own_telephone
20. Foreign_worker
21. Class
1. List all the categorical (or nominal) attributes and the real valued attributes
separately.
3. Click on invert.
4. Then we get all categorial attributes selected
5. Click on remove
6. Click on visualize all.
1. Checking_Status
2. Credit_history
3. Purpose
4. Savings_status
5. Employment
6. Personal_status
7. Other_parties
8. Property_Magnitude
9. Other_payment_plans
10. Housing
11. Job
12. Own_telephone
13. Foreign_worker
1. Duration
2. Credit_amout
3. Installment_Commitment
4. Residence_since
5. Age
6. Existing_credits
7. Num_dependents
36
DM & PA LAB DEPARTMENT OF CSE (DS)
Classification is the process for finding a model that describes the data values and
concepts for the purpose of Prediction.
Decision Tree:
Root nodes representing the attributes. Internal nodes are also the attributes. External nodes
are the classes and each branch represents the values of the attributes
Decision Tree also contains set of rules for a given data set; there are two subsets in Decision
Tree.
One is a Training data set and second one is a Testing data set. Training data set is previously
classified data.
Testing data set is newly generated data.
Procedure:
1) Open Start -Programs -Accessories -Notepad
2) Type the following training data set with the help of Notepad for
Weather Table.
@relation weather
@attribute outlook {sunny, rainy, overcast}
@attribute temperature numeric @attribute humidity
numeric
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,85,85,FALSE,no
37
DM & PA LAB DEPARTMENT OF CSE (DS)
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
38
DM & PA LAB DEPARTMENT OF CSE (DS)
5) Select Choose button and click on Tree option.
6) Click on J48.
7) Click on Start button and output will be displayed on the right side of the window.
8) Select the result list and right click on result list and select Visualize Tree option.
9) Then Decision Tree will be displayed on new window.
39
DM & PA LAB DEPARTMENT OF CSE (DS)
Output:
Decision Tree:
40
DM & PA LAB DEPARTMENT OF CSE (DS)
Scheme: weka.classifiers.bayes.NaiveBayes
Relation: iris
Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
class
Test mode: 10-fold cross-validation
Class
Attribute Iris-setosa Iris-versicolor Iris-virginica
(0.33) (0.33) (0.33)
===============================================================
sepallength
mean 4.9913 5.9379 6.5795
std. dev. 0.355 0.5042 0.6353
weight sum 50 50 50
precision 0.1059 0.1059 0.1059
41
DM & PA LAB DEPARTMENT OF CSE (DS)
sepalwidth
mean 3.4015 2.7687 2.9629
std. dev. 0.3925 0.3038 0.3088
weight sum 50 50 50
precision 0.1091 0.1091 0.1091
petallength
mean 1.4694 4.2452 5.5516
std. dev. 0.1782 0.4712 0.5529
weight sum 50 50 50
precision 0.1405 0.1405 0.1405
petalwidth
mean 0.2743 1.3097 2.0343
std. dev. 0.1096 0.1915 0.2646
weight sum 50 50 50
precision 0.1143 0.1143 0.1143
a b c <-- classified as
50 0 0 | a = Iris-setosa
0 48 2 | b = Iris-versicolor
0 4 46 | c = Iris-virginica
42
DM & PA LAB DEPARTMENT OF CSE (DS)
c) Rank the performance of J48, PART and OneR Algorithms on “Weather” Dataset.
Aim: J48 Algorithm.
Description:
Cross-validation, sometimes called rotation estimation, is a technique for assessing how the
results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where
the goal is prediction, and one wants to estimate how accurately a predictive model will perform in
practice. One round of cross-validation involves partitioning a sample of data into complementary subsets,
performing the analysis on one subset (called the training set), and validating the analysis on the other
subset (called the validation set or testing set).
Procedure:
43
DM & PA LAB DEPARTMENT OF CSE (DS)
Procedure:
1) Start -> Programs -> Weka 3.4
2) Open Knowledge Flow.
3) Select Data Source tab & choose Arff Loader.
4) Place Arff Loader component on the layout area by clicking on that component.
5) Specify an Arff file to load by right clicking on Arff Loader icon, and then a pop-up menu
will appear. In that select Configure & browse to the location of weather.arff
6) Click on the Evaluation tab & choose Class Assigner & place it on the layout.
7) Now connect the Arff Loader to the Class Assigner by right clicking on Arff Loader, and then select
Data Set option, now a link will be established.
8) Right click on Class Assigner & choose Configure option, and then a new window will appear
& specify a class to our data.
9) Select Evaluation tab & select Cross-Validation Fold Maker & place it on the layout.
10) Now connect the Class Assigner to the Cross-Validation Fold Maker.
11) Select Classifiers tab & select J48 component & place it on the layout.
12) Now connect Cross-Validation Fold Maker to J48 twice; first choose Training Data Set option and
then Test Data Set option.
13) Select Evaluation Tab & select Classifier Performance Evaluator component & place it on the
layout.
14) Connect J48 to Classifier Performance Evaluator component by right clicking on J48 & selecting
Batch Classifier.
15) Select Visualization tab & select Text Viewer component & place it on the layout.
16) Connect Text Viewer to Classifier Performance Evaluator by right clicking on Text
44
DM & PA LAB DEPARTMENT OF CSE (DS)
PART:
45
DM & PA LAB DEPARTMENT OF CSE (DS)
OneR
Aim:
Description:
The knowledge flow provides an alternative way to the explorer as a graphical front end to
WEKA’s algorithm. Knowledge flow is a working progress. So, some of the functionality from explorer is
not yet available. So, on the other hand there are the things that can be done in knowledge flow, but not in
explorer. Knowledge flow presents a dataflow interface to WEKA. The user can select WEKA components
from a toolbar placed them on a layout campus and connect them together in order to form a knowledge
flow for processing and analyzing the data.
Procedure:
47
DM & PA LAB DEPARTMENT OF CSE (DS)
6) Explorer shows many options. In that click on ‘open file’ and select the arff file
7) Click on edit button which shows employee table on weka.
Output:
48
DM & PA LAB DEPARTMENT OF CSE (DS)
9) Right click on Attribute Selection and select Configure option and choose the best attribute for
Employee data.
10) Right click on Normalize and select Dataset option then establish a link between Normalize and Arff
Saver.
11) Right click on Arff Saver and select Configure option then new window will be opened and set
the path, enter .arff in look in dialog box to save normalize data.
12) Right click on Arff Loader and click on Start Loading option then everything will be executed one by
one.
13) Check whether output is created or not by selecting the preferred path.
14) Rename the data name as a.arff
15) Double click on a.arff then automatically the output will be opened in MS-Excel.
49
DM & PA LAB DEPARTMENT OF CSE (DS)
Result:
Procedure:
LINEAR REGRESSION:
In statistics, Linear Regression is an approach for modeling a relationship between a scalar dependent
variable Y and one or more explanatory variables denoted X.the case of explanatory variable is called
Simple Linear Regression.
Coefficient of Linear Regression is given by: Y=ax+b
PROBLEM:
Consider the dataset below where x is the number of working expeince of a college graduate and y is the
corresponding salary of the graduate. Build a regression equation and predict the salary of college graduate
whose experience is 10 years.
Input:
51
DM & PA LAB DEPARTMENT OF CSE (DS)
Output:
Result: Thus the concept of Regression for training the given dataset is applied and implemented.
52
DM & PA LAB DEPARTMENT OF CSE (DS)
Scheme:weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A
\"weka.core.EuclideanDistance -R first-last\""
Relation: iris
Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
53
DM & PA LAB DEPARTMENT OF CSE (DS)
petalwidth
class
Test mode:evaluate on training data
1 0 1 1 1 1 Iris-setosa
1 0 1 1 1 1 Iris-versicolor
1 0 1 1 1 1 Iris-virginica
Weighted Avg. 1 0 1 1 1 1
c <-- classified as
50 0 0 | a = Iris-setosa
0 50 0 | b = Iris-versicolor
0 0 50 | c = Iris-virginica
54
DM & PA LAB DEPARTMENT OF CSE (DS)
Decision Tree:
Root nodes representing the attributes. Internal nodes are also the attributes. External nodes
are the classes and each branch represents the values of the attributes
Decision Tree also contains set of rules for a given data set; there are two subsets in Decision
Tree.
One is a Training data set and second one is a Testing data set. Training data set is previously
classified data.
Testing data set is newly generated data.
55
DM & PA LAB DEPARTMENT OF CSE (DS)
Procedure:
1) Open Start- Programs- Accessories -Notepad
2) Type the following training data set with the help of Notepad for
Customer Table.
@relation customer
@attribute name {x,y,z,u,v,l,w,q,r,n}
@attribute age {youth,middle,senior}
@attribute income {high,medium,low}
@attribute class {A,B}
@data
x,youth,high,A y,youth,low,B
z,middle,high,A u,middle,low,B
v,senior,high,A l,senior,low,B
w,youth,high,A q,youth,low,B
r,middle,high,A n,senior,high,A
56
DM & PA LAB DEPARTMENT OF CSE (DS)
Training Data Set -Customer Table
EXPERIMENT NO: 12
Implementation of K-Means Clustering, Hierarchial Clustering Algorithm.
Aim: Write a procedure for Clustering Customer data using Simple K Means Algorithm.
Description:
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters)
so that the objects in the same cluster are more similar (in some sense or another) to each other than to those
in other clusters. Clustering is a main task of explorative data mining, and a common technique for
statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis,
information retrieval, and bioinformatics.
Procedure:
@data
x,youth,high,A
y,youth,low,B
z,middle,high,A
u,middle,low,B
v,senior,high,A
l,senior,low,B
w,youth,high,A
q,youth,low,B
r,middle,high,A
n,senior,high,A
Procedure:
1) Click Start -> Programs -> Weka 3.4
2) Click on Explorer.
3) Click on open file & then select Customer.arff file.
4) Click on Cluster menu. In this there are different algorithms are there.
5) Click on Choose button and then select SimpleKMeans algorithm.
6) Click on Start button and then output will be displayed on the screen.
DM & PA LAB DEPARTMENT OF CSE (DS)
Output:
DM & PA LAB DEPARTMENT OF CSE (DS)
Aim:
Description:
This program calculates and has comparisons on the data set selection of attributes and
methods of manipulations have been chosen. The Visualization can be shown in a 2-D
representation of the information.
Procedure:
1) Open Start -Programs- Accessories - Notepad
2) Type the following training data set with the help of Notepad for
Weather Table.
@relation weather
@attribute outlook {sunny, rainy, overcast}
@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
Plot Matrix:
DM & PA LAB DEPARTMENT OF CSE (DS)
Procedure:
1) Open Start -Programs -Weka-3-4 - Weka-3-4
2) Open the explorer and click on Preprocess, then a new window will appear. In that
window select weather.arfffile then the data will be displayed.
3) After that click on the Visualize tab on the top of the Menu bar.
4) When we select Visualize tab then Plot Matrix is displayed on the screen.
Output:
5) After that we select the Select Attribute button, then select Outlook attribute and clock OK.
6) Click on the Update button to display the output.
7) After that select the Select Attribute button and select Temperature attribute and then click OK.
8) Increase the Plot Size and Point Size.
9) Click on the Update button to display the output.
10) After that we select the Select Attribute button, then select Humidity attribute and clock OK.
11) Click on the Update button to display the output.
12) After that select the Select Attribute button and select Windy attribute and then click OK.
13) Increase the Jitter Size.
14) Click on the Update button to display the output.
45
15) After that we select the Select Attribute button, then select Play attribute and clock OK.
DM & PA LAB DEPARTMENT OF CSE (DS)
Output:
46
DM & PA LAB DEPARTMENT OF CSE
(DS)
Output:
Output:
DM & PA LAB DEPARTMENT OF CSE
(DS)