CHAPTER-4
Classification rule Mining
Description
Principle
Design
Algorithm
Rule evaluation
• What is Classification rule Mining?
• Classification and prediction are two forms of data analysis that
can be used to extract models describing important data classes
or to predict future data trends.
• Classification predicts (discrete, unordered) labels, prediction
models continuous valued functions.
• Following are the examples of cases where the data analysis
task is Classification −
• A bank loan officer wants to analyze the data in order to know
which customer (loan applicant) are risky or which are safe.
• A marketing manager at a company needs to analyze a customer
with a given profile, who will buy a new computer.
• In both of the above examples, a model or classifier is
constructed to predict the categorical labels.
• These labels are risky or safe for loan application data and yes or
no for marketing data.
• A predictor is constructed that predicts a continuous-valued
function, or ordered value.
• Regression analysis is a statistical methodology that is mostly
used for numeric prediction.
• Many classification and prediction methods have been proposed
by researchers in machine learning, pattern recognition, and
statistics.
• Principle of Classification
• With the help of the bank loan application that we have discussed
above, let us understand the working of classification.
• The Data Classification process includes two steps −
• Building the Classifier or Model
• Using Classifier for Classification
• Building the Classifier or Model
• This step is the learning step or the learning phase.
• In this step the classification algorithms build the classifier.
• The classifier is built from the training set made up of database
tuples and their associated class labels.
• Each tuple that constitutes the training set is referred to as a
category or class.
• These tuples can also be referred to as sample, object or data
points.
• Using Classifier for Classification
• In this step, the classifier is used for classification. Here the test
data is used to estimate the accuracy of classification rules.
• The classification rules can be applied to the new data tuples if
the accuracy is considered acceptable.
• Data Mining - Rule Based Classification
• IF-THEN Rules : Rule-based classifier makes use of a set of IF-THEN
rules for classification.
• We can express a rule in the following from −
• IF condition THEN conclusion
• Let us consider a rule R1,
• R1: IF age = youth AND student = yes
• THEN buy_computer = yes
• Points to remember −
• The IF part of the rule is called rule antecedent or precondition.
• The THEN part of the rule is called rule consequent.
• The antecedent part the condition consist of one or more
attribute tests and these tests are logically ANDed.
• The consequent part consists of class prediction.
• Note − We can also write rule R1 as follows −
• R1: (age = youth) ^ (student = yes))(buys computer = yes)
• If the condition holds true for a given tuple, then the antecedent
is satisfied.
• Design Issues Regarding Classification and Prediction:
• Preparing the Data for Classification and Prediction:
• The following preprocessing steps may be applied to the data to
help improve the accuracy, efficiency, and scalability of the
classification or prediction process.
• Data Cleaning −
• Data cleaning involves removing the noise and treatment of
missing values.
• The noise is removed by applying smoothing techniques and the
problem of missing values is solved by replacing a missing value
with most commonly occurring value for that attribute.
• Relevance Analysis −
• Database may also have the irrelevant attributes.
• Correlation analysis is used to know whether any two given
attributes are related.
• Data Transformation and reduction −
• The data can be transformed by any of the following methods.
– Normalization − The data is transformed using normalization.
– Normalization is used when in the learning step, the neural
networks or the methods involving measurements are used.
– Normalization involves scaling all values for a given attribute
so that they fall within a small specified range, such as -1 to +1
or 0 to 1.
– Generalization −
The data can also be transformed by generalizing it to the
higher-level concepts. For this purpose we can use the concept
hierarchies.
• Comparison of Classification and Prediction
Methods
• Here is the criteria for comparing the methods of Classification and
Prediction −
• Accuracy − Accuracy of classifier refers to the ability of classifier.
It predict the class label correctly.
the accuracy of the predictor refers to the value of attribute for a
new data.
• Speed − This refers to the computational cost in generating and
using the classifier or predictor.
• Robustness − It refers to the ability of classifier or predictor to
make correct predictions from given noisy data.
• Scalability − Scalability refers to the ability to construct the
classifier or predictor efficiently for given large amount of data.
• Interpretability − It refers to what extent the classifier or predictor
understands.
• Bayes Classification Methods
• “What are Bayesian classifiers?” Bayesian classifiers are statistical
classifiers.
• They can predict class membership probabilities such as the
probability that a given tuple belongs to a particular class.
• Bayesian classification is based on Bayes’ theorem.
• classification algorithms have found a simple Bayesian classifier
known as the na¨ıve
• Bayesian classifier to be comparable in performance with decision
tree and selected neural network classifiers.
• Bayesian classifiers have also exhibited high accuracy and speed
when applied to large databases.
• Na¨ıve Bayesian classifiers assume that the effect of an attribute
value on a given class is independent of the values of the other
attributes. This assumption is called class conditional independence.
• Bayes’ theorem is useful in that it provides a way of calculating
the posterior probability is given below
• P(H/X), from P.(H), P.(X/H), and P.(X).
• Bayes’ theorem is
• P(H/X) =P(X/H)/P(H)/P(X)
• Classification by Decision Tree Induction:
• Decision tree induction is the learning of decision trees from
class-labeled training tuples.
• A decision tree is a flowchart-like tree structure, where Each
internal node denotes a test on an attribute.
• Each branch represents an outcome of the test.
• Each leaf node holds a class label.
• The topmost node in a tree is the root node.
• The construction of decision tree classifiers does not require any
domain knowledge or parameter setting, and generate
knowledge discovery.
• Decision trees can handle high dimensional data.
• Their representation of acquired knowledge in tree form.
• The learning and classification steps of decision tree induction are
simple and fast.
• In general, decision tree classifiers have good accuracy.
• Decision tree induction algorithms have been used for
classification in many application areas, such as medicine,
manufacturing and production, financial analysis, and molecular
biology.
• Algorithm For Decision Tree Induction:
• The algorithm is called with three parameters:
• Data partition
• Attribute list
• Attribute
• The parameter attribute list is a list of attributes describing the
tuples.
• Attribute selection method specifies procedure for selecting the
attribute that best discriminates the given tuples according to
class.
• The tree starts as a single node, N, representing the training
tuples in D.
• If the tuples in D are all of the same class, then node N becomes a
leaf and is labeled with that class .
• All of the terminating conditions are explained at the end of the
algorithm.
• Genetic Algorithms
• The idea of genetic algorithm is derived from natural evolution.
• In genetic algorithm, first of all, the initial population is created.
• This initial population consists of randomly generated rules. We can
represent each rule by a string of bits.
• For example, in a given training set, the samples are described by
two Boolean attributes such as A1 and A2. And this given training
set contains two classes such as C1 and C2.
• We can encode the rule IF A1 AND NOT A2 THEN C2 into a bit
string 100. In this bit representation, the two leftmost bits
represent the attribute A1 and A2, respectively.
• Likewise, the rule IF NOT A1 AND NOT A2 THEN C1 can be encoded
as 001.
• Fuzzy Set Approaches:
• Fuzzy logic uses truth values between 0.0 and 1.0 to represent the
degree of membership that a certain value has in a given category.
Each category then represents a fuzzy set.
• Fuzzy logic system typically provide graphical tools to assist users in
converting attribute values to fuzzy truth values.
• Fuzzy set theory is also known as possibility theory.
• It was proposed by Lotfi Zadeh in1965 as an alternative to traditional
two-value logic and probability theory.
• Most important, fuzzy set theory allows us to deal with vague or
inexact facts.
• Regression Analysis:
• Regression analysis can be used to model the relationship between
one or more independent or predictor variables and a dependent or
response variable which is continuous-valued.
• In general, the values of the predictor variables are known The
response variable is what we want to predict.
• Linear Regression:
• Straight-line regression analysis involves a response variable, y, and a
single predictor variable x.
• It is the simplest form of regression, and models y as a linear function
of x.
• That is, y = b+wx where the variance of y is assumed to be
constant band w are regression coefficients specifying the Y-
intercept and slope of the line
• The regression coefficients can be estimated using this method
with the following equations:
• where x is the mean value of x1, x2, … , x|D|, and y is the mean value
of y1, y2,…, y|D|.
• The coefficients w0 and w1 often provide good approximations to
otherwise complicated regression equations.