DWDM - Case Study On Weka - Ceb624
DWDM - Case Study On Weka - Ceb624
CEB624 DATE:
TECOMP B GRADE:
Aim:-
Demonstration of preprocessing on dataset Customer.arff includes creating an ARFF file
and reading it into WEKA, and using the WEKA Explorer.
@Relation Customer
@Attribute age{youth,middleage,senior}
@Attribute income{high,medium,low}
@Attribute student{yes,no}
@Attribute creditrating{fair,excellent}
@Attribute buyscomputer{yes,no}
@Data
youth,high,no,fair,no
youth,high,no,excellent,no
middleage,high,no,fair,yes
senior,medium,no,fair,yes
senior,low,yes,fair,yes
senior,low,yes,excellent,no
middleage,low,yes,excellent,yes
youth,medium,no,fair,no
youth,low,yes,fair,yes
senior,medium,yes,fair,yes
youth,medium,yes,excellent,yes
middleage,medium,no,excellent,yes
middleage,high,yes,fair,yes
senior,medium,no,excellent,no
Lines that begin with a % are comments. The @RELATION, @ATTRIBUTE and @DATA
declarations are case insensitive..
The WEKA Explorer
When Explorer Tab is opened, tabs are as follows:
1. Preprocess. Choose and modify the data being acted on.
2. Classify. Train and test learning schemes that classify or perform regression.
3. Cluster. Learn clusters for the data.
4. Associate. Learn association rules for the data.
5. Select attributes. Select the most relevant attributes in the data.
6. Visualize. View an interactive 2D plot of the data.
Preprocessing :
Step1:Loading the data by clicking on open button in preprocessing interface and
selecting the appropriate file.
Step2:Once the data is loaded, weka will recognize the attributes and during the scan of
the data weka will compute some basic strategies on each attribute. The left panel shows the
list of recognized attributes while the top panel indicates the names of the base relation or
table and the current working relation.
Step3:Clicking on an attribute in the left panel will show the basic statistics on the
attributes for the categorical(nominal) attributes the frequency of each attribute value is
shown, while for continuous(numeric) attributes we can obtain min, max, mean, standard
deviation and deviation etc.,
Step4:The visualization all in the right button panel in the form of cross-tabulation
across attributes.
Step 6:Visualization:
Weka uses many ways to visualize the data. The main GUI will show a histogram for the
attribute distributions for a single selected attribute at a time, by default this is the class
attribute. Individual colors indicate individual classes. On moving mouse over the histogram
,it will show the ranges and how many samples fall in each range. The button VISUALIZE
ALL will bring up a screen showing all distribution at once.
There is also a tab called VISUALIZE. Clicking that will open the scatterplots for all
attribute pairs
Task-1
Describe chosen dataset and few of the attributes.
Example: Description of the Iris dataset
Title: Iris data
Number of Instances: 150
Number of Attributes : 5 (numeric)
Attribute 2 :- sepalwidth
Attribute 3 :- petallength
Attribute 4 :- petalwidth
Attribute 5 :- class
Task-2
List all the categorical (or nominal) attributes
Attribute 1 : Sepallength
Attribute 2 : Sepalwidth
Attribute 3 : Petallength
Attribute 4 : Petalwidth
Attribute 5 : Class
Task-3
What attributes do you think might be crucial in making the assessment?
The following attributes are crucial: sepallength, sepalwidth, petallength, petalwidth
Measures to do this process are: InfoGain, Gini Index,Gain Ratio.
Using weka tool we can easily identify the attribute relevancy analysis.
Steps: weka explorer -> preprocess -> open file(credit-g.arff) -> select all -> click on select
attributes -> select attribute evaluator(Info Gain) and Search Method as Ranker -> Click on
“start”
Task-4
Perform filterations on the Customer dataset (any 2). Capture the snapshots of the
dataset before filteration, the GenericObjectEditor window of each filter and transformed
dataset after filteration. Describe each chosen filter within few lines. Few Filters
recommended are listed below : (Choose -> weka -> filters -> unsupervised -> attribute)
Before Filter
After Filter (Add Cluster)
1) Numeric to Nominal
2) MathExpression
InterQuartileRanges
Remove Percentage
Task-5
Display (using snapshots) any visualization technique for the dataset chosen
(Histogram or Scatter plot)
Histogram
CONCLUSION:
Henceforth experiment was successfully implemented. Thus helped to analyze data using
preprocessing and visualize the results.