0% found this document useful (0 votes)
492 views

Rajkiya Engineering College Kannauj: Datawarehousing & Data Mining Lab (RCS-654)

Rajkiya Engineering College in Kannauj, India. The document is a practical file submitted by student Sumit Sharma to professor Abhishek Bajpai. It contains 10 experiments conducted on data warehousing and data mining topics such as OLAP operations, varying arrays, nested tables, ETL tools, the Apriori algorithm, data preprocessing in WEKA, association rule mining, classification using decision trees and Naive Bayes, and clustering using k-means.

Uploaded by

sumit
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
492 views

Rajkiya Engineering College Kannauj: Datawarehousing & Data Mining Lab (RCS-654)

Rajkiya Engineering College in Kannauj, India. The document is a practical file submitted by student Sumit Sharma to professor Abhishek Bajpai. It contains 10 experiments conducted on data warehousing and data mining topics such as OLAP operations, varying arrays, nested tables, ETL tools, the Apriori algorithm, data preprocessing in WEKA, association rule mining, classification using decision trees and Naive Bayes, and clustering using k-means.

Uploaded by

sumit
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Rajkiya Engineering College

Kannauj

Practical File
DATAWAREHOUSING & DATA MINING
Lab (RCS-654)
Session: - 2019-2020

Submitted by: - Submitted To: -


Sumit Sharma Mr. Abhishek Bajpai
1883910911 Asst. Professor
Computer Science and Engineering Department of Computer Science
IIIrd Year & Engineering
INDEX
S.No Experiment Remark
1. Implementation of OLAP operations.

2. Implementation of Varying Arrays.

3. Implementation of Nested Tables.

4. Demonstration of any ETL tool.

5. Write a program of Apriori algorithm using any programming


language.

6. Create data-set in .arff file format. Demonstration of preprocessing


on WEKA data-set.

7. Demonstration of Association rule process on data-set contact


lenses.arff using apriori algorithm

8. Demonstration of classification rule process on WEKA data-set


using j48 algorithm.

9. Demonstration of classification rule process on WEKA data-set


using Naive Bayes algorithm.

10. Demonstration of clustering rule process on data-set iris.arff using


simple k-means.
Practical: 01
Object: Implementation of OLAP operations.

Online Analytical Processing (OLAP) is based on the multidimensional data model. It allows
managers, and analysts to get an insight of the information through fast, consistent, and
interactive access to information. OLAP operations in multidimensional data. Here is the list of
OLAP operations.
a) Roll-up:
Roll-up performs aggregation on a data cube in any of the following ways:
 By climbing up a concept hierarchy for a dimension
 By dimension reduction
The following diagram illustrates how roll-up works.

-- Roll-up is performed by climbing up a concept hierarchy for the dimension location.


-- Initially the concept hierarchy was "street < city < province < country".
-- On rolling up, the data is aggregated by ascending the location hierarchy from the level
of city to the level of country.

-- The data is grouped into cities rather than countries.


-- When roll-up is performed, one or more dimensions from the data cube are removed.
b) Drill-down:
Drill-down is the reverse operation of roll-up. It is performed by either of the following ways:
 By stepping down a concept hierarchy for a dimension.
 By introducing a new dimension.
The following diagram illustrates how drill-down works:
-- Drill-down is performed by stepping down a concept hierarchy for the dimension time.

-- Initially the concept hierarchy was "day < month < quarter < year."

-- On drilling down, the time dimension is descended from the level of quarter to the level of
month.

-- When drill-down is performed, one or more dimensions from the data cube are added.

-- It navigates the data from less detailed data to highly detailed data.
c) Slice:
The slice operation selects one particular dimension from a given cube and provides a new sub-
cube.
Consider the following diagram that shows how slice works.
Here Slice is performed for the dimension "time" using the criterion time = "Q1".
It will form a new sub-cube by selecting one or more dimensions.
d) Dice:
Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider
the following diagram that shows the dice operation.

The dice operation on the cube based on the following selection criteria involves three
dimensions.
 (location = "Toronto" or "Vancouver")
 (time = "Q1" or "Q2")
 (item =" Mobile" or "Modem")

e) Pivot:
The pivot operation is also known as rotation. It rotates the data axes in view in order to provide
an alternative presentation of data. Consider the following diagram that shows the pivot
operation.
Practical: 02
Object: Implementation of Varying Arrays.

Oracle provides two collection types: nested tables and varying arrays or VARRAYS. A
collection is an ordered group of elements of the same type. Each element from the group can be
accessed using a unique subscript. The element types of a collection can be either built-in
datatypes, user-defined types or references (REFs) to object types.

Varrays:
Varrays are ordered groups of items of type VARRAY. Varrays can be used to associate
a single identifier with an entire collection. This allows manipulation of the collection as a whole
and easy reference of individual elements. The maximum size of a varray needs to be specified
in its type definition. The range of values for the index of a varray is from 1 to the maximum
specified in its type definition. If no elements are in the array, then the array is atomically null.
The main use of a varray is to group small or uniform-sized collections of objects. Elements of a
varray cannot be accessed individually through SQL, although they can be accessed in PL/SQL,
OCI, or Pro*C using the array style subscript. The type of the element of a VARRAY can be any
PL/SQL type except the following:

Code:

 BOOLEAN
 TABLE
 VARRAY
 Object types with TABLE or VARRAY attributes
 REF CURSOR
 NCHAR
 NCLOB
 NVARCHAR2

Varrays can be used to retrieve an entire collection as a value. Varray data is stored in-line, in
the same tablespace as the other data in its row. When a varray is declared, a constructor with
the same name as the varray is implicitly defined. The constructor creates a varray from the
elements passed to it. You can use a constructor wherever you can use a function call, including
the SELECT, VALUES, and SET clauses. A varray can be assigned to another varray, provided
the datatypes are the exact same type. For example, suppose you declared two PL/SQL types:

Code:
TYPE My_Varray1 IS VARRAY(10) OF My_Type;
TYPE My_Varray2 IS VARRAY(10) OF My_Type;
An object of type My_Varray1 can be assigned to another object of type My_Varray1 because
they are the exact same type. However, an object of type My_Varray2 cannot be assigned to an
object of type My_Varray1 because they are not the exact same type, even though they have the
same element type. Varrays can be atomically null, so the IS NULL comparison operator can be
used to see if a varray is null. Varrays cannot be compared for equality or inequality.
Examples for Varrays
Example 5: The following shows how to create a simple VARRAY:
a) First, define a object type ELEMENTS as follows:
Code:
SQL> CREATE TYPE MEDICINES AS OBJECT (
2> MED_ID NUMBER(6),
3> MED_NAME VARCHAR2(14),
4> MANF_DATE DATE);
5> /
b) Next, define a VARRAY type MEDICINE_ARR which stores MEDICINES
objects: Code:
SQL> CREATE TYPE MEDICINE_ARR AS VARRAY(40) OF
MEDICINES; 2> /
c) Finally, create a relational table MED_STORE which has MEDICINE_ARR as a
column type: Code:
SQL> CREATE TABLE MED_STORE
( 2> LOCATION VARCHAR2(15),
3> STORE_SIZE NUMBER(7),
4> EMPLOYEES NUMBER(6),
5> MED_ITEMS MEDICINE_ARR);
Example 6: ---------- The following example shows how to insert two rows into the
MED_STORE table:
Code:
SQL> INSERT INTO MED_STORE
2>VALUES('BELMONT',1000,10,
3> MEDICINE_ARR(MEDICINES(11111,'STOPACHE',SYSDATE)));
SQL> INSERT INTO MED_STORE
2> VALUES ('REDWOOD CITY',700,5,
3> MEDICINE_ARR(MEDICINES(12345,'STRESS_BUST',SYSDATE)));
Example 7: ---------- The following example shows how to delete the second row we have
inserted in example 6 above:
Code:
SQL> DELETE FROM MED_STORE 2> WHERE LOCATION = 'REDWOOD CITY';
Example 8: ---------- The following example shows how to update the MED_STORE table and
add more medicines to the Belmont store:
Code:
SQL> UPDATE MED_STORE
2> SET MED_ITEMS = MEDICINE_ARR (
3>MEDICINES(12346,'BUGKILL',SYSDATE),
4>MEDICINES(12347,'INHALER',SYSDATE),
5> MEDICINES(12348,'PAINKILL',SYSDATE))
Practical: 03
Object: Implementation of Nested Tables.

Oracle provides two collection types: nested tables and varying arrays or VARRAYS. A
collection is an ordered group of elements of the same type. Each element from the group can be
accessed using a unique subscript. The element types of a collection can be either built-in
datatypes, user-defined types or references (REFs) to object types.

Nested Tables:
An ordered group of items of type TABLE are called nested tables. Nested tables can contain
multiple columns and can be used as variables, parameters, results, attributes, and columns.
They can be thought of as one column database tables. Rows of a nested table are not stored in
any particular order. The size of a nested table can increase dynamically, i.e., nested tables are
unbounded. Elements in a nested table initially have consecutive subscripts, but as elements are
deleted, they can have non-consecutive subscripts. Nested tables can be fully manipulated using
SQL, Pro*C, OCI, and PL/SQL. The range of values for nested table subscripts is
1..2147483647. To extend a nested table, the built-in procedure EXTEND must be used. To
delete elements, the built-in procedure DELETE must be used. An uninitialized nested table is
atomically null, so the IS NULL comparison operator can be used to see if a nested table is null.
Oracle8 provides new operators such as CAST, THE, and MULTISET for manipulating nested
tables.

Examples of Nested Tables


Example 1:- The following example illustrates how a simple nested table is created.
a) First, define a Object type as follows:
Code:
SQL> CREATE TYPE ELEMENTS AS OBJECT (
2> ELEM_ID NUMBER(6),
3> PRICE NUMBER(7,2));
4> /
b) Next, create a table type ELEMENTS_TAB which stores
ELEMENTS objects: Code:
SQL> CREATE TYPE ELEMENTS_TAB AS TABLE OF ELEMENTS; 2>/
c) Finally, create a database table STORAGE having type ELEMENTS_TAB as one of
its columns: Code:
SQL> CREATE TABLE
STORAGE ( 2> SALESMAN
NUMBER(4),
3) ELEM_ID NUMBER(6),
4) ORDERED DATE,
5) ITEMS ELEMENTS_TAB)
6) NESTED TABLE ITEMS STORE AS ITEMS_TAB
Example 2:- This example demonstrates how to populate the STORAGE table with a single
row:
Code:
SQL> INSERT INTO STORAGE
2> VALUES (100,123456,SYSDATE,
3>ELEMENTS_TAB(ELEMENTS(175692,120.12),
4> ELEMENTS(167295,130.45),
5> ELEMENTS(127569,99.99)));
Example 3: ---------- The following example demonstrates how to use the operator THE
which is used in a SELECT statement to identify a nested table:
Code:
SQL> INSERT
INTO 2> THE
3> (SELECT ITEMS FROM STORAGE WHERE ELEM_ID
= 123456) 4> VALUES (125762, 101.99);
Example 4: ---------- The following example shows how to update the STORAGE table row
where salesman column has value 100:
Code:
SQL> UPDATE STORAGE
2> SET ITEMS = ELEMENTS_TAB(ELEMENTS(192512, 199.99))
3> WHERE SALESMAN = 100;
Differences Between Nested Tables and Varrays:
Nested tables are unbounded, whereas varrays have a maximum size. Individual elements
can be deleted from a nested table, but not from a varray. Therefore, nested tables can be
sparse, whereas varrays are always dense. Varrays are stored by Oracle in-line (in the same
tablespace), whereas nested table data is stored out-of-line in a store table, which is a
system-generated database table associated with the nested table. When stored in the
database, nested tables do not retain their ordering and subscripts, whereas varrays do.
Nested tables support indexes while varrays do not.
Practical: 04
Object: Demonstration of any ETL tool.

ETL comes from Data Warehousing and stands for Extract-Transform-Load. ETL covers a
process of how the data are loaded from the source system to the data warehouse. Extraction–
transformation– loading (ETL) tools are pieces of software responsible for the extraction of data
from several sources, its cleansing, customization, reformatting, integration, and insertion into a
data warehouse. Building the ETL process is potentially one of the biggest tasks of building a
warehouse; it is complex, time consuming, and consumes most of data warehouse project’s
implementation efforts, costs, and resources.
Building a data warehouse requires focusing closely on understanding three main areas:

 Source Area- The source area has standard models such as entity relationship diagram.
 Destination Area- The destination area has standard models such as star schema.
 Mapping Area- But the mapping area has not a standard model till now.

Abbreviations
• ETL-extraction–transformation–loading
• DW-data warehouse
• DM- data mart
• OLAP- on-line analytical processing
• DS-data sources
• ODS- operational data store
• DSA- data staging area
• DBMS- database management system
• OLTP-on-line transaction processing
• CDC-change data capture
• SCD-slowly changing dimension
• FCME- first-class modeling elements
• EMD-entity mapping diagram
• DSA-data storage area
ETL Process:
(a) Extract: The Extract step covers the data extraction from the source system and makes it
accessible for further processing. The main objective of the extract step is to retrieve all the
required data from the source system with as little resources as possible. The extract step
should be designed in a way that it does not negatively affect the source system in terms or
performance, response time or any kind of locking. There are several ways to perform the
extract:
• Update notification - if the source system is able to provide a notification that a
record has been changed and describe the change, this is the easiest way to get the
data.
• Incremental extract - some systems may not be able to provide notification that an update
has occurred, but they are able to identify which records have been modified and provide an
extract of such records. During further ETL steps, the system needs to identify changes and
propagate it down. Note, that by using daily extract, we may not be able to handle deleted
records properly.
• Full extract - some systems are not able to identify which data has been changed at all, so a
full extract is the only way one can get the data out of the system. The full extract requires
keeping a copy of the last extract in the same format in order to be able to identify changes.
Full extract handles deletions as well.
(b) Transform: The transform step applies a set of rules to transform the data from the source to
the target. This includes converting any measured data to the same dimension (i.e. conformed
dimension) using the same units so that they can later be joined. The transformation step also
requires joining data from several sources, generating aggregates, generating surrogate keys,
sorting, deriving new calculated values, and applying advanced validation rules.

(c) Load: During the load step, it is necessary to ensure that the load is performed correctly and
with as little resources as possible. The target of the Load process is often a database. In order to
make the load process efficient, it is helpful to disable any constraints and indexes before the
load and enable them back only after the load completes. The referential integrity needs to be
maintained by ETL tool to ensure consistency.
ETL method – nothin’ but SQL
ETL as scripts that can just be run on the database.These scripts must be re-runnable: they
should be able to be run without modification to pick up any changes in the legacy data, and
automatically work out how to merge the changes into the new schema. In order to meet the
requirements, my scripts must:
1) INSERT rows in the new tables based on any data in the source that hasn’t already been
created in the destination.
2) UPDATE rows in the new tables based on any data in the source that has already been
inserted in the destination.
3) DELETE rows in the new tables where the source data has been deleted.
Now, instead of writing a whole lot of INSERT, UPDATE and DELETE statements, I
thought “surely MERGE would be both faster and better” – and in fact, that has turned out
to be the case. By writing all the transformations as MERGE statements, we’ve satisfied
all the criteria, while also making my code very easily modified, updated, fixed and rerun.
If we discover a bug or a change in requirements, we simply change the way the column is
transformed in the MERGE statement, and re-run the statement. It then takes care of
working out whether to insert, update or delete each row.
Our next step was to design the architecture for our custom ETL solution. Go to the
DBA with the following design:
1. create two new schemas on the new 11g database: LEGACY and MIGRATE
2. take a snapshot of all data in the legacy database, and load it as tables in the LEGACY
schema
3. grant read-only on all tables in LEGACY to MIGRATE
4. grant CRUD on all tables in the target schema to MIGRATE.
For example, in the legacy database we have a table:
LEGACY.BMS_PARTIES(
par_id NUMBER PRIMARY KEY,
par_domain VARCHAR2(10) NOT NULL,
par_first_name VARCHAR2(100) ,
par_last_name VARCHAR2(100),
par_dob DATE,
par_business_name VARCHAR2(250),
created_by VARCHAR2(30) NOT NULL,
creation_date DATE NOT NULL,
last_updated_by VARCHAR2(30),
last_update_date DATE)
In the new model, we have a new table that represents the same kind of information:
NEW.TBMS_PARTY(
party_id NUMBER(9) PRIMARY KEY,
party_type_code VARCHAR2(10) NOT NULL,
first_name VARCHAR2(50),
surname VARCHAR2(100),
date_of_birth DATE,
business_name VARCHAR2(300),
db_created_by VARCHAR2(50) NOT NULL,
db_created_on DATE DEFAULT SYSDATE
NOT NULL, db_modified_by VARCHAR2(50),
db_modified_on DATE,
version_id NUMBER(12) DEFAULT 1 NOT NULL)
This was the simplest transformation you could possibly think of – the mapping from one
to the other is 1:1, and the columns almost mean the same thing.
The solution scripts start by creating an intermediary table:
MIGRATE.TBMS_PARTY(
old_par_id NUMBER PRIMARY KEY,
party_id NUMBER(9) NOT NULL,
party_type_code VARCHAR2(10) NOT NULL,
first_name VARCHAR2(50),
surname VARCHAR2(100),
date_of_birth DATE,
business_name VARCHAR2(300),
db_created_by VARCHAR2(50),
db_created_on DATE,
db_modified_by VARCHAR2(50),
db_modified_on DATE,
deleted CHAR(1))
The second step is the E and T parts of “ETL”: Query the legacy table, transform the data
right there in the query, and insert it into the intermediary table. To re-run this script as often
, write this as a MERGE statement:
MERGE INTO MIGRATE.TBMS_PARTY dest
USING (
SELECT par_id AS old_par_id,
par_id AS party_id,
CASE par_domain
WHEN 'P' THEN 'PE' /*Person*/
WHEN 'O' THEN 'BU' /*Business*/
END AS party_type_code,
par_first_name AS first_name,
par_last_name AS surname,
par_dob AS date_of_birth,
par_business_name AS business_name,
created_by AS db_created_by,
creation_date AS db_created_on,
last_updated_by AS db_modified_by,
last_update_date AS db_modified_on
FROM LEGACY.BMS_PARTIES s
WHERE NOT EXISTS (
SELECT null
FROM MIGRATE.TBMS_PARTY d
WHERE d.old_par_id = s.par_id
AND (d.db_modified_on = s.last_update_date
OR (d.db_modified_on IS NULL
AND s.last_update_date IS NULL))
)
) src
ON (src.OLD_PAR_ID = dest.OLD_PAR_ID)
WHEN MATCHED THEN UPDATE SET
party_id = src.party_id ,
party_type_code = src.party_type_code ,
first_name= src.first_name ,
surname= src.surname ,
date_of_birth= src.date_of_birth ,
business_name= src.business_name ,
db_created_by = src.db_created_by ,
db_created_on = src.db_created_on ,
db_modified_by = src.db_modified_by,
Practical : 05

Object: Write a program of Apriori algorithm using any programming language.

Software Requirement: c/c++ compiler, os etc.

Program:

#include<iostream>
using namespace std;
int main(){
int i,j,t1,k,l,m,f,f1,f2,f3;
int a[5][5];
for(i=0;i<5;i++){
cout<<"\n Enter items from purchase "<<i+1<<":";
for(j=0;j<5;j++){
cin>>a[i][j];
}
}
int min;
cout<<"\n Enter minimum acceptance level";
cin>>min;
cout<<"\nInitial Input:\n";
cout<<"\nTrasaction\tItems\n";
for(i=0;i<5;i++){
cout<<i+1<<":\t";
for(j=0;j<5;j++){
cout<<a[i][j]<<"\t";
}
cout<<"\n";
}
cout<<"\nAssume minimum support: "<<min;
int l1[5];
for(i=0;i<5;i++)
{
t1=0;
for(j=0;j<5;j++){
for(k=0;k<5;k++){
if(a[j][k]==i+1){
t1++;
}
}
}
l1[i]=t1;
}
cout<<"\n\nGenerating C1 from data\n";
for(i=0;i<5;i++){
cout<<i+1<<": "<<l1[i]<<"\n";
}
int p2pcount=0;
int p2items[5];
int p2pos=0;
for(i=0;i<5;i++){
if(l1[i]>=min){
p2pcount++;
p2items[p2pos]=i;
p2pos++;
}
}
cout<<"\nGenerating L1 From C1\n";
for(i=0;i<p2pos;i++){
cout<<p2items[i]+1<<"\t"<<l1[p2items[i]]<<"\n";
}
int l2[5][3];
int l2t1;
int l2t2;
int l2pos1=0;
int l2ocount=0;
int l2jcount=0;
for(i=0;i<p2pcount;i++){
for(j=i+1;j<p2pcount;j++){
l2t1=p2items[i]+1;
l2t2=p2items[j]+1;
if(l2t1==l2t2){
continue;
}
l2[l2pos1][0]=l2t1;
l2[l2pos1][1]=l2t2;
l2jcount++;
l2ocount=0; //reset counter
for(k=0;k<5;k++){
f1=f2=0;
for(l=0;l<5;l++){
if(l2t1==a[k][l]){
f1=1;
}
if(l2t2==a[k][l]){
f2=1;
}
}
if(f1==1&&f2==1){
l2ocount++;
}
}
l2[l2pos1][2]=l2ocount;
l2pos1++;
}
}
cout<<"\n\nGenerating L2\n";
for(i=0;i<l2jcount;i++){
for(j=0;j<3;j++){
cout<<l2[i][j]<<"\t";
}
cout<<"\n";
}
int p3pcount=0;
int p3items[5]={-1,-1,-1,-1,-1};
int p3pos=0;
for(i=0;i<5;i++){
if(l2[i][2]>=min){
f=0;
for(j=0;j<5;j++){
if(p3items[j]==l2[i][0]){
f=1;
}
}
if(f!=1){
p3items[p3pos]=l2[i][0];
p3pos++;
p3pcount++;
}
f=0;
for(j=0;j<5;j++){
if(p3items[j]==l2[i][1]){
f=1;
}
}
if(f!=1){
p3items[p3pos]=l2[i][1];
p3pos++;
p3pcount++;
}
}
}
int l3[5][4];
int l3ocount=0;
int l3jcount=0;
for(i=0;i<p3pcount;i++){
for(j=i+1;j<p3pcount;j++){
for(k=j+1;k<p3pcount;k++){
l3[i][0]=p3items[i];
l3[i][1]=p3items[j];
l3[i][2]=p3items[k];
l3jcount++;
l3ocount=0;
for(k=0;k<5;k++){
f1=f2=f3=0; //resetting flag
for(l=0;l<5;l++){
if(l3[i][0]==a[k][l]){
f1=1;
}
if(l3[i][1]==a[k][l]){
f2=1;
}
if(l3[i][2]==a[k][l]){
f3=1;
}
}
if(f1==1&&f2==1&&f3==1){
l3ocount++;
}
}
l3[i][3]=l3ocount;
}
}
}
cout<<"\n\nGenerating L3\n";
for(i=0;i<l3jcount;i++){
for(j=0;j<4;j++){
cout<<l3[i][j]<<"\t";
}
cout<<"\n";
}
return 0;
}

Output:
Practical: 06
Object: Create data-set in .arff file format. Demonstration of preprocessing on WEKA
data-set.

Software Requirement: Weka an open-source tool, OS, etc.

Dataset student .arff


@relation student
@attribute age {<30,30-40,>40}
@attribute income {low, medium, high}
@attribute student {yes, no}
@attribute credit-rating {fair, excellent}
@attribute buyspc {yes, no}
@data
%
<30, high, no, fair, no
<30, high, no, excellent, no
30-40, high, no, fair, yes
>40, medium, no, fair, yes
>40, low, yes, fair, yes
>40, low, yes, excellent, no
30-40, low, yes, excellent, yes
<30, medium, no, fair, no
<30, low, yes, fair, no
>40, medium, yes, fair, yes
<30, medium, yes, excellent, yes
30-40, medium, no, excellent, yes
30-40, high, yes, fair, yes
>40, medium, no, excellent, no

%
Output:
Practical:07
Object: Demonstration of Association rule process on data-set contact lenses.arff using
apriori algorithm.

Software Requirement: Weka an open-source tool, OS, etc.

Dataset contactlenses.arff

The following figure shows the association rules that were generated when apriori algorithm is
applied on the given dataset.
Practical: 08
Object: Demonstration of classification rule process on WEKA data-set using j48
algorithm.

Software Requirement: Weka an open-source tool, OS, etc.

Dataset student .arff

@relation student
@attribute age {<30,30-40,>40}
@attribute income {low, medium, high}
@attribute student {yes, no}
@attribute credit-rating {fair, excellent}
@attribute buyspc {yes, no}
@data
%
<30, high, no, fair, no
<30, high, no, excellent, no
30-40, high, no, fair, yes
>40, medium, no, fair, yes
>40, low, yes, fair, yes
>40, low, yes, excellent, no
30-40, low, yes, excellent, yes
<30, medium, no, fair, no
<30, low, yes, fair, no
>40, medium, yes, fair, yes
<30, medium, yes, excellent, yes
30-40, medium, no, excellent, yes
30-40, high, yes, fair, yes
>40, medium, no, excellent, no
%

Following figure shows the classification rules that were generated when j48 algorithm is applied
on the given dataset.
Fig: Graphical version of classification Tree.
Pracctical: 09

Object: Demonstration of classification rule process on WEKA data-set using Naive Bayes
algorithm.

Software Requirement: Weka an open-source tool, OS, etc.

Data set abc.arff:

@relation abc

@attribute age {25, 27, 28, 29, 30, 35, 48}

@attribute salary{10k,15k,17k,20k,25k,30k,35k,32k}

@attribute performance {good, avg, poor}

@data

25, 10k, poor

27, 15k, poor

27, 17k, poor

28, 17k, poor

29, 20k, avg

30, 25k, avg

29, 25k, avg

30, 20k, avg

35, 32k, good

48, 34k, good

48, 32k, good

%
Following figure shows the classification rules that were generated when naive bayes algorithm
is applied on the given dataset:
Practical: 10
Object: Demonstration of clustering rule process on data-set iris.arff using simple k-means.

Software Requirement: Weka an open-source tool, OS, etc.

Dataset of iris.arff:

The following screenshot shows the clustering rules that were generated when simple k means
algorithm is applied on the given dataset:

Visualization Cluster assignments figure:

You might also like