Wa0002.
Wa0002.
INTRODUCTION:
Invoke Weka from the Windows Start menu (on Linux or the Mac, double-click
weka.jar or weka.app, respectively). This starts up the Weka GUI Chooser.Click the
Explorer button to enter the Weka Explorer. The Preprocess panel opens up when
the Explorer interface is started. Click the open file option and starts perform the
respective operations, this can be shown below the figure.
THE PANELS:
1. PREPROCESS.
2. CLASSIFY.
3. CLUSTER.
4. ASSOCIATE.
5. SELECT ATTRIBUTE
6. VISUALIZE
PREPROCESS PANEL
As the result shows, the weather data has 14 instances, and 5 attributes called
outlook, temperature, humidity, windy, and play. Click on the name of an attribute in the
left subpanel to see information about the selected attribute on the right, such as its
values and how many times an instance in the dataset has a particular value. This
information is also shown in the form of a histogram. All attributes in this dataset are
“nominal”— that is, they have a predefined finite set of values. The last attribute, play, is
the “class” attribute; its value can be yes or no.
As you know, Weka “filters” can be used to modify datasets in a systematic fashion--
that is, they are data Preprocessing tools. Reload the weather.nominal dataset, and let’s
remove an attribute from it. The appropriate
Choosing attributes
3. Clicking on one of the crosses opens up an Instance Info window, which lists the
values of all attributes for the selected instance. Close the Instance Info window again.
Info pops up
The selection fields at the top of the window containing the scatter plot determine
which attributes are used for the x- and y-axes. Change the x-axis to petalwidth and the
y-axis to petallength. The field showing Color: class (Num) can be used to change the
color coding.
Each of the barlike plots to the right of the scatter plot window represents a single
attribute. In each bar, instances are placed at the appropriate horizontal position and
scattered randomly in the vertical direction. Clicking a bar uses that attribute for the
x-axis of the scatter plot. Right-clicking a bar does the same for the y-axis. Use these
bars to change the x- and y-axes back to sepallength and petalwidth.
The Jitter slider displaces the cross for each instance randomly from its true position,
and can reveal situations where instances lie on top of one another.
The Select Instance button and the Reset, Clear, and Save buttons let you modify
the dataset. Certain instances can be selected and the others removed. Try the
Rectangle option: Select an area by left-clicking and dragging the mouse. The Reset
button changes into a Submit button. Click it, and all instances outside the rectangle
are deleted. You could use Save to save the modified dataset to a file. Reset restores
the original dataset.T
CLASSIFY PANEL:
Now we apply a classifier to the weather data. Load the weather data again. Go
to the Preprocess panel, click the Open file button, and select
“weather.nominal.arff” from the data directory. Then switch to the Classify panel by
clicking the Classify tab at the top of the window.
OUTPUT:
The outcome of training and testing appears in the Classifier Output box on the right.
Scroll through the text and examine it. First, look at the part that describes the decision
tree, reproduce in image below.
This represents the decision tree that was built, including the number of instances that
fall under each leaf.
The textual representation is clumsy to interpret, but Weka can generate an equivalent
graphical version.
Here’s how to get the graphical tree. Each time the Start button is pressed
and a new classifier is built and evaluated, a new entry appears in the Result List panel
in the lower left corner.
Click the start
Button
For example, in the case of tenfold cross-validation this involves running the learning
algorithm 10 times to build and evaluate 10 classifiers. A model built from the full training
set is then printed into the Classifier Output area: This may involve running the learning
algorithm one final time. The remainder of the output depends on the test protocol that
was chosen using test options.
Clustering Data :
WEKA contains “clusterers” for finding groups of similar instances in a dataset. The
clustering schemes available in WEKA are,
✓ k-Means,
✓ EM,
✓ Cobweb,
✓ X-means,
✓ Farthest First.
Clusters can be visualized and compared to “true” clusters (if given). Evaluation is
based on log
likelihood if clustering scheme produces a probability distribution.
For this exercise we will use customer data that is contained in “customers.arff” file
and analyze it with “k-means” clustering scheme.
Steps:
(i) Select the file from WEKA
In ‘Preprocess’ window click on ‘Open file…’ button and select “weather.arff” file.
Click ‘Cluster’ tab at the top of WEKA Explorer window.
(ii) Choose the Cluster Scheme.
1.In the ‘Clusterer’ box click on ‘Choose’ button. In pull-down menu select WEKA
Clusterers, and select the cluster scheme ‘SimpleKMeans’. Some implementations of K-
means only allow numerical values for attributes; therefore, we do not need to use a filter.
.
GUI
3. Set the value in “numClusters” box to 5 (Instead of default 2) because you have five
clusters in your .arff file. Leave the value of ‘seed’ as is. The seed value is used in
generating a random number, which is used for making the initial assignment of
instances to clusters. Note that, in general, K-means is quite sensitive to how clusters
are initially assigned. Thus, it is often necessary to try different values and evaluate the
results.
(iii) Setting the test options.
1. Before you run the clustering algorithm, you need to choose ‘Cluster mode’.
2. Click on ‘Classes to cluster evaluation’ radio-button in ‘Cluster mode’ box and
select ‘Play’ in the pull-down box below. It means that you will compare how
well the chosen clusters match up with a pre-assigned class (‘‘Play’’) in
the data.
3. Once the options have been specified, you can run the clustering algorithm.
Click on the ‘Start’
button to execute the algorithm.
4. When training set is complete, the ‘Cluster’ output area on the right panel of
‘Cluster’
window is filled with text describing the results of training and testing. A new
entry appears in
the ‘Result list’ box on the left of the result. These behave just like their
classification counterparts.
CLUSTER OUTPUT
(iv)Analysing Results.
The clustering model shows the centroid of each cluster and statistics on the
number and percentage of instances assigned to different clusters. Cluster centroids
are the mean vectors foreach cluster; so, each dimension value and the centroid
represent the mean value for that dimension in the cluster.
5.Left click on ‘3’ in the ‘Class colour’ box and select lighter color from the color
palette.
COLOUR PALATTE
6. You may want to save the resulting data set, which included each instance along
with its
assigned cluster. To do so, click ‘Save’ button in the visualization window and save
the result as
the file “weather_kmeans.arff”.
ASSOCIATION PANEL
(i)opening the file
1.Click ‘Associate’ tab at the top of ‘WEKA Explorer’ window. It brings up interface
for the
Apriori algorithm.
Opens up apriori algorithm.
2. The association rule scheme cannot handle numeric values; therefore, for this
exercise you will use grocery store data from the “weather.arff” file where all values
are nominal.Open “weather.arff” file.
(ii)setting the test-options