Classification Rule Mining Overview

Chapter 4 discusses Classification Rule Mining, which involves predicting categorical labels and continuous values through data analysis. It covers the principles of classification, the process of building classifiers, rule-based classification, and various classification methods such as Bayesian classifiers and decision trees. Additionally, it addresses design issues, data preparation, and regression analysis as a method for modeling relationships between variables.

Uploaded by

Tarekegn Mekonnen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views25 pages

Classification Rule Mining Overview

Uploaded by

Tarekegn Mekonnen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

CHAPTER-4

Classification rule Mining

Description
Principle
Design
Algorithm
Rule evaluation
• What is Classification rule Mining?
• Classification and prediction are two forms of data analysis that
can be used to extract models describing important data classes
or to predict future data trends.
• Classification predicts (discrete, unordered) labels, prediction
models continuous valued functions.
• Following are the examples of cases where the data analysis
task is Classification −
• A bank loan officer wants to analyze the data in order to know
which customer (loan applicant) are risky or which are safe.
• A marketing manager at a company needs to analyze a customer
with a given profile, who will buy a new computer.
• In both of the above examples, a model or classifier is
constructed to predict the categorical labels.
• These labels are risky or safe for loan application data and yes or
no for marketing data.
• A predictor is constructed that predicts a continuous-valued
function, or ordered value.
• Regression analysis is a statistical methodology that is mostly
used for numeric prediction.
• Many classification and prediction methods have been proposed
by researchers in machine learning, pattern recognition, and
statistics.
• Principle of Classification

• With the help of the bank loan application that we have discussed
above, let us understand the working of classification.

• The Data Classification process includes two steps −

• Building the Classifier or Model
• Using Classifier for Classification
• Building the Classifier or Model

• This step is the learning step or the learning phase.

• In this step the classification algorithms build the classifier.
• The classifier is built from the training set made up of database
tuples and their associated class labels.
• Each tuple that constitutes the training set is referred to as a
category or class.
• These tuples can also be referred to as sample, object or data
points.
• Using Classifier for Classification
• In this step, the classifier is used for classification. Here the test
data is used to estimate the accuracy of classification rules.
• The classification rules can be applied to the new data tuples if
the accuracy is considered acceptable.
• Data Mining - Rule Based Classification

• IF-THEN Rules : Rule-based classifier makes use of a set of IF-THEN

rules for classification.
• We can express a rule in the following from −
• IF condition THEN conclusion

• Let us consider a rule R1,

• R1: IF age = youth AND student = yes
• THEN buy_computer = yes
• Points to remember −
• The IF part of the rule is called rule antecedent or precondition.
• The THEN part of the rule is called rule consequent.
• The antecedent part the condition consist of one or more
attribute tests and these tests are logically ANDed.
• The consequent part consists of class prediction.

• Note − We can also write rule R1 as follows −

• R1: (age = youth) ^ (student = yes))(buys computer = yes)

• If the condition holds true for a given tuple, then the antecedent
is satisfied.
• Design Issues Regarding Classification and Prediction:
• Preparing the Data for Classification and Prediction:
• The following preprocessing steps may be applied to the data to
help improve the accuracy, efficiency, and scalability of the
classification or prediction process.
• Data Cleaning −
• Data cleaning involves removing the noise and treatment of
missing values.
• The noise is removed by applying smoothing techniques and the
problem of missing values is solved by replacing a missing value
with most commonly occurring value for that attribute.
• Relevance Analysis −
• Database may also have the irrelevant attributes.
• Correlation analysis is used to know whether any two given
attributes are related.
• Data Transformation and reduction −
• The data can be transformed by any of the following methods.
– Normalization − The data is transformed using normalization.
– Normalization is used when in the learning step, the neural
networks or the methods involving measurements are used.
– Normalization involves scaling all values for a given attribute
so that they fall within a small specified range, such as -1 to +1
or 0 to 1.
– Generalization −
The data can also be transformed by generalizing it to the
higher-level concepts. For this purpose we can use the concept
hierarchies.
• Comparison of Classification and Prediction
Methods
• Here is the criteria for comparing the methods of Classification and
Prediction −
• Accuracy − Accuracy of classifier refers to the ability of classifier.
It predict the class label correctly.
the accuracy of the predictor refers to the value of attribute for a
new data.
• Speed − This refers to the computational cost in generating and
using the classifier or predictor.
• Robustness − It refers to the ability of classifier or predictor to
make correct predictions from given noisy data.
• Scalability − Scalability refers to the ability to construct the
classifier or predictor efficiently for given large amount of data.
• Interpretability − It refers to what extent the classifier or predictor
understands.
• Bayes Classification Methods
• “What are Bayesian classifiers?” Bayesian classifiers are statistical
classifiers.
• They can predict class membership probabilities such as the
probability that a given tuple belongs to a particular class.
• Bayesian classification is based on Bayes’ theorem.
• classification algorithms have found a simple Bayesian classifier
known as the na¨ıve
• Bayesian classifier to be comparable in performance with decision
tree and selected neural network classifiers.
• Bayesian classifiers have also exhibited high accuracy and speed
when applied to large databases.
• Na¨ıve Bayesian classifiers assume that the effect of an attribute
value on a given class is independent of the values of the other
attributes. This assumption is called class conditional independence.
• Bayes’ theorem is useful in that it provides a way of calculating
the posterior probability is given below
• P(H/X), from P.(H), P.(X/H), and P.(X).

• Bayes’ theorem is
• P(H/X) =P(X/H)/P(H)/P(X)
• Classification by Decision Tree Induction:
• Decision tree induction is the learning of decision trees from
class-labeled training tuples.
• A decision tree is a flowchart-like tree structure, where Each
internal node denotes a test on an attribute.
• Each branch represents an outcome of the test.
• Each leaf node holds a class label.
• The topmost node in a tree is the root node.
• The construction of decision tree classifiers does not require any
domain knowledge or parameter setting, and generate
knowledge discovery.
• Decision trees can handle high dimensional data.
• Their representation of acquired knowledge in tree form.
• The learning and classification steps of decision tree induction are
simple and fast.
• In general, decision tree classifiers have good accuracy.
• Decision tree induction algorithms have been used for
classification in many application areas, such as medicine,
manufacturing and production, financial analysis, and molecular
biology.
• Algorithm For Decision Tree Induction:

• The algorithm is called with three parameters:

• Data partition
• Attribute list
• Attribute
• The parameter attribute list is a list of attributes describing the
tuples.
• Attribute selection method specifies procedure for selecting the
attribute that best discriminates the given tuples according to
class.
• The tree starts as a single node, N, representing the training
tuples in D.
• If the tuples in D are all of the same class, then node N becomes a
leaf and is labeled with that class .
• All of the terminating conditions are explained at the end of the
algorithm.
• Genetic Algorithms
• The idea of genetic algorithm is derived from natural evolution.
• In genetic algorithm, first of all, the initial population is created.
• This initial population consists of randomly generated rules. We can
represent each rule by a string of bits.
• For example, in a given training set, the samples are described by
two Boolean attributes such as A1 and A2. And this given training
set contains two classes such as C1 and C2.
• We can encode the rule IF A1 AND NOT A2 THEN C2 into a bit
string 100. In this bit representation, the two leftmost bits
represent the attribute A1 and A2, respectively.
• Likewise, the rule IF NOT A1 AND NOT A2 THEN C1 can be encoded
as 001.
• Fuzzy Set Approaches:

• Fuzzy logic uses truth values between 0.0 and 1.0 to represent the
degree of membership that a certain value has in a given category.
Each category then represents a fuzzy set.
• Fuzzy logic system typically provide graphical tools to assist users in
converting attribute values to fuzzy truth values.
• Fuzzy set theory is also known as possibility theory.
• It was proposed by Lotfi Zadeh in1965 as an alternative to traditional
two-value logic and probability theory.
• Most important, fuzzy set theory allows us to deal with vague or
inexact facts.
• Regression Analysis:
• Regression analysis can be used to model the relationship between
one or more independent or predictor variables and a dependent or
response variable which is continuous-valued.
• In general, the values of the predictor variables are known The
response variable is what we want to predict.

• Linear Regression:
• Straight-line regression analysis involves a response variable, y, and a
single predictor variable x.
• It is the simplest form of regression, and models y as a linear function
of x.
• That is, y = b+wx where the variance of y is assumed to be
constant band w are regression coefficients specifying the Y-
intercept and slope of the line
• The regression coefficients can be estimated using this method
with the following equations:
• where x is the mean value of x1, x2, … , x|D|, and y is the mean value
of y1, y2,…, y|D|.
• The coefficients w0 and w1 often provide good approximations to
otherwise complicated regression equations.

Classification Rule Mining Overview
No ratings yet
Classification Rule Mining Overview
31 pages
Data Mining: Classification & Prediction
No ratings yet
Data Mining: Classification & Prediction
43 pages
Classification and Clustering Techniques
No ratings yet
Classification and Clustering Techniques
26 pages
Data Mining: Classification & Prediction
No ratings yet
Data Mining: Classification & Prediction
16 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
10 pages
Classification vs. Prediction in DWM
No ratings yet
Classification vs. Prediction in DWM
12 pages
Data Classification Techniques Overview
No ratings yet
Data Classification Techniques Overview
30 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
19 pages
Classification and Prediction in Data Mining
No ratings yet
Classification and Prediction in Data Mining
19 pages
Data Classification Techniques Overview
No ratings yet
Data Classification Techniques Overview
51 pages
Classification and Prediction Challenges
No ratings yet
Classification and Prediction Challenges
39 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Data Classification and Prediction Overview
No ratings yet
Data Classification and Prediction Overview
21 pages
Classification and Prediction Overview
No ratings yet
Classification and Prediction Overview
83 pages
Data Mining Classification Techniques
No ratings yet
Data Mining Classification Techniques
61 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Classification vs. Prediction in Data Mining
No ratings yet
Classification vs. Prediction in Data Mining
24 pages
Classification Techniques in Data Mining
No ratings yet
Classification Techniques in Data Mining
13 pages
Data Mining: Classification Techniques
No ratings yet
Data Mining: Classification Techniques
50 pages
Data Mining Classification Techniques
No ratings yet
Data Mining Classification Techniques
10 pages
Classification and Prediction in Data Mining
No ratings yet
Classification and Prediction in Data Mining
30 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
20 pages
CART Decision Trees and Greedy Approach
No ratings yet
CART Decision Trees and Greedy Approach
50 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
58 pages
Classification Techniques in Data Mining
No ratings yet
Classification Techniques in Data Mining
141 pages
Computer Hour Pricing Overview
No ratings yet
Computer Hour Pricing Overview
129 pages
Classification and Prediction Methods
No ratings yet
Classification and Prediction Methods
64 pages
Classification and Prediction Overview
No ratings yet
Classification and Prediction Overview
15 pages
CS402 Mod 3
No ratings yet
CS402 Mod 3
2 pages
Classification and Prediction Methods
No ratings yet
Classification and Prediction Methods
28 pages
Data Mining Classification Techniques
No ratings yet
Data Mining Classification Techniques
24 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
28 pages
Classification vs. Prediction Overview
100% (1)
Classification vs. Prediction Overview
67 pages
Numeric Prediction and Classification Overview
No ratings yet
Numeric Prediction and Classification Overview
75 pages
Classification and Prediction Overview
No ratings yet
Classification and Prediction Overview
69 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
19 pages
Classification and Prediction Techniques
100% (3)
Classification and Prediction Techniques
63 pages
Data Mining Classification Techniques
No ratings yet
Data Mining Classification Techniques
12 pages
Data Classification Techniques Overview
No ratings yet
Data Classification Techniques Overview
14 pages
Data Classification and Prediction Methods
No ratings yet
Data Classification and Prediction Methods
13 pages
Classification vs. Prediction in Data Analysis
No ratings yet
Classification vs. Prediction in Data Analysis
69 pages
Classification and Prediction Methods
No ratings yet
Classification and Prediction Methods
46 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
17 pages
Data Mining Classification Techniques
No ratings yet
Data Mining Classification Techniques
50 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
42 pages
Module 3 DM
No ratings yet
Module 3 DM
50 pages
Classification Methods in Data Mining
No ratings yet
Classification Methods in Data Mining
33 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
93 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
32 pages
Classification vs. Prediction Explained
No ratings yet
Classification vs. Prediction Explained
21 pages
Data Warehousing: Classification & Clustering
No ratings yet
Data Warehousing: Classification & Clustering
186 pages
Data Mining Classification Techniques
No ratings yet
Data Mining Classification Techniques
85 pages
Understanding Data Classification Techniques
No ratings yet
Understanding Data Classification Techniques
21 pages
DM Unit 3
No ratings yet
DM Unit 3
39 pages
Classification vs. Prediction Explained
No ratings yet
Classification vs. Prediction Explained
41 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
47 pages
Understanding Classification in Data Analysis
No ratings yet
Understanding Classification in Data Analysis
78 pages
Wu Model Exam for Computer Science Students
No ratings yet
Wu Model Exam for Computer Science Students
20 pages
Test Blueprint for MIS Exit Exam 2023
No ratings yet
Test Blueprint for MIS Exit Exam 2023
15 pages
C++ 2024 Final Exam Answer Key
No ratings yet
C++ 2024 Final Exam Answer Key
10 pages
Detecting Offensive Words in Afaan Oromoo
No ratings yet
Detecting Offensive Words in Afaan Oromoo
38 pages
Strategies to Enhance Teaching Effectiveness
No ratings yet
Strategies to Enhance Teaching Effectiveness
25 pages
Understanding Network Devices and Layers
No ratings yet
Understanding Network Devices and Layers
18 pages
Key Factors in Web Host Selection
No ratings yet
Key Factors in Web Host Selection
3 pages
TCDF Cebraspe HNS Level IV 2013
No ratings yet
TCDF Cebraspe HNS Level IV 2013
62 pages
Integrating Databases with Websites
No ratings yet
Integrating Databases with Websites
63 pages
Lexical Analysis and Regular Expressions
No ratings yet
Lexical Analysis and Regular Expressions
70 pages
Classful Addressing of IP Addresses
No ratings yet
Classful Addressing of IP Addresses
106 pages
Computer Organization and Architecture Course
No ratings yet
Computer Organization and Architecture Course
7 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
180 pages
Compiler Design Overview and Techniques
No ratings yet
Compiler Design Overview and Techniques
37 pages
EJB Component Replication for High Availability
No ratings yet
EJB Component Replication for High Availability
48 pages
MCQs for Office Equipment Technicians
No ratings yet
MCQs for Office Equipment Technicians
9 pages
Text Operations and Index Term Selection
No ratings yet
Text Operations and Index Term Selection
36 pages
Software Engineering M.Sc(CA)-204 Guide
No ratings yet
Software Engineering M.Sc(CA)-204 Guide
199 pages
SDLC Phases and Software Design Principles
No ratings yet
SDLC Phases and Software Design Principles
18 pages
Compiler Design Basics and Implementation
No ratings yet
Compiler Design Basics and Implementation
3 pages
OpenGL Rendering Process Overview
No ratings yet
OpenGL Rendering Process Overview
73 pages
Client Peripheral Requirements Guide
No ratings yet
Client Peripheral Requirements Guide
44 pages
Compiler Design: Token Counting Program
No ratings yet
Compiler Design: Token Counting Program
8 pages
Ethiopian TVET Entrepreneurship Skills Module
No ratings yet
Ethiopian TVET Entrepreneurship Skills Module
196 pages
Python Data Visualization with Matplotlib
No ratings yet
Python Data Visualization with Matplotlib
1 page
Understanding Expert Systems in AI
No ratings yet
Understanding Expert Systems in AI
13 pages
PVC Pipes Cluster Analysis in Lahore
No ratings yet
PVC Pipes Cluster Analysis in Lahore
35 pages
8th Grade Math Test: Geometry Concepts
No ratings yet
8th Grade Math Test: Geometry Concepts
7 pages
Foundations of Educational Psychology
No ratings yet
Foundations of Educational Psychology
2 pages
Peace Journalism in Nigeria's Media
No ratings yet
Peace Journalism in Nigeria's Media
18 pages
Summary of Jane Austen's Emma
No ratings yet
Summary of Jane Austen's Emma
6 pages
Team Building GamesActivities Ideas
100% (3)
Team Building GamesActivities Ideas
23 pages
Green Firecracker Technology Manual
No ratings yet
Green Firecracker Technology Manual
19 pages
Omkar Sakat's Professional CV
No ratings yet
Omkar Sakat's Professional CV
3 pages
Indiana Young Children's Development Report
No ratings yet
Indiana Young Children's Development Report
13 pages
Revitalizing Cacharel's Brand Identity
No ratings yet
Revitalizing Cacharel's Brand Identity
11 pages
AutoPaint IP Infringement Case Study
No ratings yet
AutoPaint IP Infringement Case Study
4 pages
South Indian Scrap Iron Prices Update
No ratings yet
South Indian Scrap Iron Prices Update
2 pages
Customer Success Manager Profile
No ratings yet
Customer Success Manager Profile
2 pages
Understanding W2 and C2C in US Staffing
No ratings yet
Understanding W2 and C2C in US Staffing
9 pages
Ultrasonic Distance Measurement with 8051
No ratings yet
Ultrasonic Distance Measurement with 8051
3 pages
Effects of Friction and Gravity in Science 6
No ratings yet
Effects of Friction and Gravity in Science 6
5 pages
Stress Ribbon Bridge Overview
No ratings yet
Stress Ribbon Bridge Overview
27 pages
AB InBev SWOT and PEST Analysis
No ratings yet
AB InBev SWOT and PEST Analysis
8 pages
Introduction to Project Management Course
No ratings yet
Introduction to Project Management Course
10 pages
Nordstrom Sales Leadership Resume
No ratings yet
Nordstrom Sales Leadership Resume
2 pages
Rise and Fall Method in Leveling
No ratings yet
Rise and Fall Method in Leveling
5 pages
Lesson Plan: Conditional Sentences Types 0 & 1
No ratings yet
Lesson Plan: Conditional Sentences Types 0 & 1
9 pages
Platonist Philosophy 80 BC To 250 AD An PDF
67% (3)
Platonist Philosophy 80 BC To 250 AD An PDF
44 pages
Cement Setting: Materials and Methods
No ratings yet
Cement Setting: Materials and Methods
10 pages
Intermolecular Forces in Liquids and Solids
No ratings yet
Intermolecular Forces in Liquids and Solids
54 pages
Advanced CT Imaging Solutions Overview
No ratings yet
Advanced CT Imaging Solutions Overview
26 pages
Crafting an Effective Personal Statement
No ratings yet
Crafting an Effective Personal Statement
34 pages
Horizontal High Temp Vacuum Furnaces
100% (1)
Horizontal High Temp Vacuum Furnaces
8 pages
Down-Thecompletedescentmanual Andy Kirkpatrick
0% (1)
Down-Thecompletedescentmanual Andy Kirkpatrick
902 pages
Grade 12 Economics Study Guide CAPS
No ratings yet
Grade 12 Economics Study Guide CAPS
15 pages

Classification Rule Mining Overview

Uploaded by

Classification Rule Mining Overview

Uploaded by

CHAPTER-4

Classification rule Mining

• The Data Classification process includes two steps −

• This step is the learning step or the learning phase.

• IF-THEN Rules : Rule-based classifier makes use of a set of IF-THEN

• Let us consider a rule R1,

• Note − We can also write rule R1 as follows −

• The algorithm is called with three parameters:

You might also like