0% found this document useful (0 votes)
17 views13 pages

DM Unit-Iii

Uploaded by

kolesuchi22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views13 pages

DM Unit-Iii

Uploaded by

kolesuchi22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.

in

UNIT-III

Classification

 Classification:

 predicts categorical class labels

 classifies data (constructs a model) based on the training set and the values (class labels)
in a classifying attribute and uses it in classifying new data

 Prediction:

 models continuous-valued functions, i.e., predicts unknown or missing values

 Typical Applications

 credit approval

 target marketing

 medical diagnosis

 treatment effectiveness analysis

Classification—A Two-Step Process

 Model construction: describing a set of predetermined classes

 Each tuple/sample is assumed to belong to a predefined class, as determined by the


class label attribute

 The set of tuples used for model construction: training set

 The model is represented as classification rules, decision trees, or mathematical


formulae

 Model usage: for classifying future or unknown objects

 Estimate accuracy of the model

 The known label of test sample is compared with the classified result from the
model

 Accuracy rate is the percentage of test set samples that are correctly classified
by the model

 Test set is independent of training set, otherwise over-fitting will occur

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com


www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.in

Supervised vs. Unsupervised Learning

 Supervised learning (classification)

 Supervision: The training data (observations, measurements, etc.) are accompanied by


labels indicating the class of the observations

 New data is classified based on the training set

 Unsupervised learning (clustering)

 The class labels of training data is unknown

 Given a set of measurements, observations, etc. with the aim of establishing the
existence of classes or clusters in the data

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com


www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.in

Issues regarding classification and prediction (1): Data Preparation

 Data cleaning

 Preprocess data in order to reduce noise and handle missing values

 Relevance analysis (feature selection)

 Remove the irrelevant or redundant attributes

 Data transformation

 Generalize and/or normalize data

Evaluating Classification Methods

 Predictive accuracy

 Speed and scalability

 time to construct the model

 time to use the model

 Robustness

 handling noise and missing values

 Scalability

 efficiency in disk-resident databases

 Interpretability:

 understanding and insight provded by the model

 Goodness of rules

 decision tree size

 compactness of classification rules

Classification by Decision Tree Induction

 Decision tree

 A flow-chart-like tree structure

 Internal node denotes a test on an attribute

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com


www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.in

 Branch represents an outcome of the test

 Leaf nodes represent class labels or class distribution

 Decision tree generation consists of two phases

 Tree construction

 At start, all the training examples are at the root

 Partition examples recursively based on selected attributes

 Tree pruning

 Identify and remove branches that reflect noise or outliers

 Use of decision tree: Classifying an unknown sample

 Test the attribute values of the sample against the decision tree

Training Dataset

Output: A Decision Tree for “buys_computer”

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com


www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.in

Algorithm for Decision Tree Induction

 Basic algorithm (a greedy algorithm)

 Tree is constructed in a top-down recursive divide-and-conquer manner

 At start, all the training examples are at the root

 Attributes are categorical (if continuous-valued, they are discretized in advance)

 Examples are partitioned recursively based on selected attributes

 Test attributes are selected on the basis of a heuristic or statistical measure (e.g.,
information gain)

 Conditions for stopping partitioning

 All samples for a given node belong to the same class

 There are no remaining attributes for further partitioning – majority voting is employed
for classifying the leaf

 There are no samples left

Attribute Selection Measure

 Information gain (ID3/C4.5)

 All attributes are assumed to be categorical

 Can be modified for continuous-valued attributes

 Gini index (IBM IntelligentMiner)

 All attributes are assumed continuous-valued

 Assume there exist several possible split values for each attribute

 May need other tools, such as clustering, to get the possible split values

 Can be modified for categorical attributes

Information Gain (ID3/C4.5)

 Select the attribute with the highest information gain

 Assume there are two classes, P and N

 Let the set of examples S contain p elements of class P and n elements of class N

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com


www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.in

 The amount of information, needed to decide if an arbitrary example in S belongs to P


or N is defined as

p p n n
I ( p, n)   log2  log2
pn pn pn pn

Information Gain in Decision Tree Induction

 Assume that using attribute A a set S will be partitioned into sets {S1, S2 , …, Sv}

 If Si contains pi examples of P and ni examples of N, the entropy, or the expected


information needed to classify objects in all subtrees Si is

pi  ni
E ( A)   I ( pi , ni )
i 1 pn
 The encoding information that would be gained by branching on A

Gain( A)  I ( p, n)  E ( A)

Attribute Selection by Information Gain Computation

 Class P: buys_computer = “yes” 5 4


E (age)  I ( 2,3)  I (4,0)
 Class N: buys_computer = “no” 14 14
5
 I (3,2)  0.69
 I(p, n) = I(9, 5) =0.940 14

Compute the entropy for age Hence

Gain(age)  I ( p, n)  E (age)
age pi ni I(pi, ni)
<=30 2 3 0.971 Similarly

30…40 4 0 0 Gain(income)  0.029


>40 3 2 0.971 Gain( student)  0.151
Gain(credit _ rating)  0.048

Gini Index (IBM IntelligentMiner)

 If a data set T contains examples from n classes, gini index, gini(T) is defined as

where pj is the relative frequency of class j in T.

 If a data set T is split into two subsets T1 and T2 with sizes N1 and N2 respectively, the gini index
of the split data contains examples from n classes, the gini index gini(T) is defined as

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com


www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.in

N 1 gini( )  N 2 gini( )
ginisplit (T )  T1 T2
N N
 The attribute provides the smallest ginisplit(T) is chosen to split the node (need to enumerate all
possible splitting points for each attribute).

Extracting Classification Rules from Trees

 Represent the knowledge in the form of IF-THEN rules

 One rule is created for each path from the root to a leaf

 Each attribute-value pair along a path forms a conjunction

 The leaf node holds the class prediction

 Rules are easier for humans to understand

 Example

IF age = “<=30” AND student = “no” THEN buys_computer = “no”

IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”

IF age = “31…40” THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”

IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”

Avoid Overfitting in Classification

 The generated tree may overfit the training data

 Too many branches, some may reflect anomalies due to noise or outliers

 Result is in poor accuracy for unseen samples

 Two approaches to avoid overfitting

 Prepruning: Halt tree construction early—do not split a node if this would result in the
goodness measure falling below a threshold

 Difficult to choose an appropriate threshold

 Postpruning: Remove branches from a “fully grown” tree—get a sequence of


progressively pruned trees

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com


www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.in

 Use a set of data different from the training data to decide which is the “best
pruned tree”

Approaches to Determine the Final Tree Size

 Separate training (2/3) and testing (1/3) sets

 Use cross validation, e.g., 10-fold cross validation

 Use all the data for training

 but apply a statistical test (e.g., chi-square) to estimate whether expanding or pruning a
node may improve the entire distribution

 Use minimum description length (MDL) principle:

 halting growth of the tree when the encoding is minimized

Enhancements to basic decision tree induction

 Allow for continuous-valued attributes

 Dynamically define new discrete-valued attributes that partition the continuous


attribute value into a discrete set of intervals

 Handle missing attribute values

 Assign the most common value of the attribute

 Assign probability to each of the possible values

 Attribute construction

 Create new attributes based on existing ones that are sparsely represented

 This reduces fragmentation, repetition, and replication

Classification in Large Databases

 Classification—a classical problem extensively studied by statisticians and machine learning


researchers

 Scalability: Classifying data sets with millions of examples and hundreds of attributes with
reasonable speed

 Why decision tree induction in data mining?

 relatively faster learning speed (than other classification methods)

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com


www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.in

 convertible to simple and easy to understand classification rules

 can use SQL queries for accessing databases

 comparable classification accuracy with other methods

Scalable Decision Tree Induction Methods in Data Mining Studies

 SLIQ (EDBT’96 — Mehta et al.)

 builds an index for each attribute and only class list and the current attribute list reside
in memory

 SPRINT (VLDB’96 — J. Shafer et al.)

 constructs an attribute list data structure

 PUBLIC (VLDB’98 — Rastogi & Shim)

 integrates tree splitting and tree pruning: stop growing the tree earlier

 RainForest (VLDB’98 — Gehrke, Ramakrishnan & Ganti)

 separates the scalability aspects from the criteria that determine the quality of the tree

 builds an AVC-list (attribute, value, class label)

Data Cube-Based Decision-Tree Induction

 Integration of generalization with decision-tree induction (Kamber et al’97).

 Classification at primitive concept levels

 E.g., precise temperature, humidity, outlook, etc.

 Low-level concepts, scattered classes, bushy classification-trees

 Semantic interpretation problems.

 Cube-based multi-level classification

 Relevance analysis at multi-levels.

 Information-gain analysis with dimension + level.

Bayesian Classification

 Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical
approaches to certain types of learning problems

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com


www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.in

 Incremental: Each training example can incrementally increase/decrease the probability that a
hypothesis is correct. Prior knowledge can be combined with observed data.

 Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities

 Standard: Even when Bayesian methods are computationally intractable, they can provide a
standard of optimal decision making against which other methods can be measured

Bayesian Theorem

 Given training data D, posteriori probability of a hypothesis h, P(h|D) follows the Bayes theorem

P(h | D)  P( D | h) P(h)
P ( D)
 MAP (maximum posteriori) hypothesis

h  arg max P(h | D)  arg max P(D | h)P(h).


MAP hH hH

 Practical difficulty: require initial knowledge of many probabilities, significant computational


cost

Naïve Bayes Classifier

 A simplified assumption: attributes are conditionally independent:


n
P (C j | V )  P (C j ) P (vi | C j )
i 1

 Greatly reduces the computation cost, only count the class distribution.

Given a training set, we can compute the probabilities

Outlook P N Humidity P N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Windy
hot 2/9 2/5 true 3/9 3/5
mild 4/9 2/5 false 6/9 2/5
cool 3/9 1/5

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com


www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.in

Bayesian classification

 The classification problem may be formalized using a-posteriori probabilities:

 P(C|X) = prob. that the sample tuple X=<x1,…,xk> is of class C.

 E.g. P(class=N | outlook=sunny,windy=true,…)

 Idea: assign to sample X the class label C such that P(C|X) is maximal

Estimating a-posteriori probabilities

 Bayes theorem:

P(C|X) = P(X|C)·P(C) / P(X)

 P(X) is constant for all classes

 P(C) = relative freq of class C samples

 C such that P(C|X) is maximum =


C such that P(X|C)·P(C) is maximum

 Problem: computing P(X|C) is unfeasible!

Naïve Bayesian Classification

 Naïve assumption: attribute independence

P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)

 If i-th attribute is categorical:


P(xi|C) is estimated as the relative freq of samples having value xi as i-th attribute in class C

 If i-th attribute is continuous:


P(xi|C) is estimated thru a Gaussian density function

 Computationally easy in both cases

Play-tennis example: estimating P(xi|C)

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com


www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.in

Outlook Temperature Humidity Windy Class


sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N

Play-tennis example: classifying X

 An unseen sample X = <rain, hot, high, false>

 P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582

 P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286

 Sample X is classified in class n (don’t play)

The independence hypothesis…

 … makes computation possible

 … yields optimal classifiers when satisfied

 … but is seldom satisfied in practice, as attributes (variables) are often correlated.

 Attempts to overcome this limitation:

 Bayesian networks, that combine Bayesian reasoning with causal relationships between
attributes

 Decision trees, that reason on one attribute at the time, considering most important
attributes first

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com


www.android.universityupdates.in | www.universityupdates.in | www.ios.universityupdates.in

Bayesian Belief Networks (I)

www.android.previousquestionpapers.com | www.previousquestionpapers.com | www.ios.previousquestionpapers.com

You might also like