0% found this document useful (0 votes)
57 views13 pages

DWDM - Case Study On Weka - Ceb624

This document describes a case study using the WEKA data mining tool to perform preprocessing and visualization on a customer dataset. It discusses loading the dataset into ARFF format, exploring attributes and statistics, applying filters for attribute selection and data transformation, and visualizing the data using histograms and scatter plots. The key steps taken include loading and exploring the raw data, selecting relevant attributes, applying filters like numeric to nominal conversion and removing instances, and visualizing the preprocessed data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views13 pages

DWDM - Case Study On Weka - Ceb624

This document describes a case study using the WEKA data mining tool to perform preprocessing and visualization on a customer dataset. It discusses loading the dataset into ARFF format, exploring attributes and statistics, applying filters for attribute selection and data transformation, and visualizing the data using histograms and scatter plots. The key steps taken include loading and exploring the raw data, selecting relevant attributes, applying filters like numeric to nominal conversion and removing instances, and visualizing the preprocessed data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

SREEJIT GOPINATH NAIR SIGN:

CEB624 DATE:
TECOMP B GRADE:

VI SEM PCE DEPT OF COMPUTER ENGINEERING Case study using WEKA

Aim:-
Demonstration of preprocessing on dataset Customer.arff includes creating an ARFF file
and reading it into WEKA, and using the WEKA Explorer.

Creating an ARFF file :-


Attribute-Relation File Format (ARFF) is a file format recognized by WEKA. An ARFF
file typically has a .arff extension and contains two sections – a Header section and a Data
section.
An example header on the standard Customer dataset looks like this:

@Relation Customer

@Attribute age{youth,middleage,senior}
@Attribute income{high,medium,low}
@Attribute student{yes,no}
@Attribute creditrating{fair,excellent}
@Attribute buyscomputer{yes,no}

@Data
youth,high,no,fair,no
youth,high,no,excellent,no
middleage,high,no,fair,yes
senior,medium,no,fair,yes
senior,low,yes,fair,yes
senior,low,yes,excellent,no
middleage,low,yes,excellent,yes
youth,medium,no,fair,no
youth,low,yes,fair,yes
senior,medium,yes,fair,yes
youth,medium,yes,excellent,yes
middleage,medium,no,excellent,yes
middleage,high,yes,fair,yes
senior,medium,no,excellent,no

Lines that begin with a % are comments. The @RELATION, @ATTRIBUTE and @DATA
declarations are case insensitive..
The WEKA Explorer
When Explorer Tab is opened, tabs are as follows:
1. Preprocess. Choose and modify the data being acted on.
2. Classify. Train and test learning schemes that classify or perform regression.
3. Cluster. Learn clusters for the data.
4. Associate. Learn association rules for the data.
5. Select attributes. Select the most relevant attributes in the data.
6. Visualize. View an interactive 2D plot of the data.

Preprocessing :
Step1:Loading the data by clicking on open button in preprocessing interface and
selecting the appropriate file.

Step2:Once the data is loaded, weka will recognize the attributes and during the scan of
the data weka will compute some basic strategies on each attribute. The left panel shows the
list of recognized attributes while the top panel indicates the names of the base relation or
table and the current working relation.

Step3:Clicking on an attribute in the left panel will show the basic statistics on the
attributes for the categorical(nominal) attributes the frequency of each attribute value is
shown, while for continuous(numeric) attributes we can obtain min, max, mean, standard
deviation and deviation etc.,

Step4:The visualization all in the right button panel in the form of cross-tabulation
across attributes.

Step5: Selecting or filtering attributes


Filter box is used to set up filters that are required. At the left of the Filter box is a
Choose button. Once a filter has been selected, its name and options are shown in the field
next to the Choose button. Clicking on this box brings up a GenericObjectEditor dialog box,
which lets to configure a filter. Once completed with the settings chosen, click OK to return
to the main Explorer window.
Now apply it to the data by pressing the Apply button at the right end of the Filter panel.
The Preprocess panel will then show the transformed data. The change can be undone using
the Undo button. Use the Edit button to view original data set before transformation and
transformed data in the dataset editor.

Step 6:Visualization:
Weka uses many ways to visualize the data. The main GUI will show a histogram for the
attribute distributions for a single selected attribute at a time, by default this is the class
attribute. Individual colors indicate individual classes. On moving mouse over the histogram
,it will show the ranges and how many samples fall in each range. The button VISUALIZE
ALL will bring up a screen showing all distribution at once.
There is also a tab called VISUALIZE. Clicking that will open the scatterplots for all
attribute pairs

Task-1
Describe chosen dataset and few of the attributes.
Example: Description of the Iris dataset
Title: Iris data
Number of Instances: 150
Number of Attributes : 5 (numeric)

Attribute description for Customer:


Attribute 1 :- sepallength

Attribute 2 :- sepalwidth

Attribute 3 :- petallength
Attribute 4 :- petalwidth

Attribute 5 :- class
Task-2
List all the categorical (or nominal) attributes
Attribute 1 : Sepallength
Attribute 2 : Sepalwidth
Attribute 3 : Petallength
Attribute 4 : Petalwidth
Attribute 5 : Class
Task-3
What attributes do you think might be crucial in making the assessment?
The following attributes are crucial: sepallength, sepalwidth, petallength, petalwidth
Measures to do this process are: InfoGain, Gini Index,Gain Ratio.
Using weka tool we can easily identify the attribute relevancy analysis.
Steps: weka explorer -> preprocess -> open file(credit-g.arff) -> select all -> click on select
attributes -> select attribute evaluator(Info Gain) and Search Method as Ranker -> Click on
“start”
Task-4
Perform filterations on the Customer dataset (any 2). Capture the snapshots of the
dataset before filteration, the GenericObjectEditor window of each filter and transformed
dataset after filteration. Describe each chosen filter within few lines. Few Filters
recommended are listed below : (Choose -> weka -> filters -> unsupervised -> attribute)

Before Filter
After Filter (Add Cluster)
1) Numeric to Nominal
2) MathExpression
InterQuartileRanges

Try the following Unsupervised Instance Filters.


(Choose -> weka -> filters -> unsupervised -> instance)

Instance --> RemoveMisclassified

Remove Percentage
Task-5
Display (using snapshots) any visualization technique for the dataset chosen
(Histogram or Scatter plot)

Histogram
CONCLUSION:
Henceforth experiment was successfully implemented. Thus helped to analyze data using
preprocessing and visualize the results.

You might also like