0% found this document useful (0 votes)
13 views118 pages

Classification Techniques

The document outlines various classification techniques used in data analysis, including Decision Trees, Naive Bayesian Classification, Neural Networks, K-Nearest Neighbors, Support Vector Machines, Logistic Regression, and Ensemble Learning methods like AdaBoost. Each technique is described with its basic principles, advantages, and examples of application, emphasizing the importance of supervised learning and the use of training data. Additionally, the document discusses how these methods can be used to predict categorical outcomes based on input attributes.

Uploaded by

jasonnaren46
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views118 pages

Classification Techniques

The document outlines various classification techniques used in data analysis, including Decision Trees, Naive Bayesian Classification, Neural Networks, K-Nearest Neighbors, Support Vector Machines, Logistic Regression, and Ensemble Learning methods like AdaBoost. Each technique is described with its basic principles, advantages, and examples of application, emphasizing the importance of supervised learning and the use of training data. Additionally, the document discusses how these methods can be used to predict categorical outcomes based on input attributes.

Uploaded by

jasonnaren46
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Classification Techniques

Classification
• Classification predicts categorical (discrete) labels
• Example: categorize bank loan applications as either safe or risky
• A tuple, X, is represented by an n-dimensional attribute vector, X = (x1, x2, : : : , xn), depicting n
measurements made on the tuple from n database attributes, respectively, A1, A2, : : : , An
• Each tuple, X, is assumed to belong to a predefined class as determined by another database
attribute called the class label attribute.
• The class label attribute is discrete-valued and unordered and It is categorical in that each value
serves as a category or class.
• Data classification is a two-step process
• In the first step, classification algorithm builds the classifier by analyzing or “learning from” a training set
made up of database tuples and their associated class labels.
• In the second step, the model is used for classification
• Because the class label of each training tuple is provided, this step is also known as supervised
learning
Decision Tree
Decision Tree
• Decision tree induction is the learning of decision
trees from class-labeled training tuples.
• A decision tree is a flowchart-like tree structure,
where each internal node (nonleaf node) denotes
a test on an attribute, each branch represents an
outcome of the test, and each leaf node (or
terminal node) holds a class label.
• The topmost node in a tree is the root node.
• Given a tuple, X, for which the associated class
label is unknown, the attribute values of the
tuple are tested against the decision tree.
• A path is traced from the root to a leaf node,
which holds the class for that tuple.
Decision Tree types
• An attribute selection measure is for selecting the splitting criterion
that “best” separates a given data partition, D, of class-labeled
training tuples into individual classes.
• Attribute selection measure:
▪ Information gain – ID3
▪ Gain ratio – C4.5
▪ Gini index - CART
ID3 Decision Tree
Naive Bayesian Classification
Bayesian Classification
• Bayesian classifiers are statistical classifiers.
• They can predict class membership probabilities, such as the
probability that a given tuple belongs to a particular class.
• Bayes’ Theorem
Naive Bayesian Classification
• Let represent a tuple
• Let represent attribute vector
• Let represent classes
• X belongs to the class Ci if and only if

Where
Neural Network
Feed-forward Neural Network
• The backpropagation algorithm performs learning on a multilayer
feed-forward neural network.
• It iteratively learns a set of weights for prediction of the class label of
tuples.
• A multilayer feed-forward neural network consists of an input layer,
one or more hidden layers, and an output layer.
Learning by the backpropagation algorithm
Sample calculations for learning by the
backpropagation algorithm.

Let the learning rate be 0.9.

first training tuple is X = (1, 0, 1), whose class label is 1.


K Nearest Neighbour
(K- NN)
K- Nearest Neighbour Classification
• It is based on feature similarity
• A k-nearest-neighbor classifier searches for K training tuples that are
closest to the unknown tuple
• It is suitable for small, labelled, noise-free data
• Closeness is defined in terms of a distance metric, such as Euclidean
distance
K- Nearest Neighbour Classification
• A good value for k, the number of neighbors can be determined
experimentally.
• Starting with k = 1, use a test set to estimate the error rate of the
classifier.
• This process can be repeated each time by incrementing k to allow for
one more neighbor.
• The k value that gives the minimum error rate may be selected.
Example:
57,170 ?
√(170-167)2+(57-51)2 = 6.7
k=3
Weight Height Class Euclidean Distance
51 167 Underweight 6.7
62 182 Normal 13
69 176 Normal 13.4
64 173 Normal 7.6
65 172 Normal 8.2
56 174 Underweight 4.1
58 169 Normal 1.4
57 173 Normal 3
55 170 Normal 2
Support Vector Machine
(SVM)
SVM
• SVM attempt to pass a linearly separable hyperplane through a
dataset in order to classify the data into two groups
• This hyperplane could be a line (2D), plane (3D), and hyperplane
(4D+)
2-D training data
Linearly separable data Linearly inseparable data
Linearly separable training data

Attribute 1 Attribute 2 Class


3 1 C1
3 -1 C1
6 1 C1
6 -1 C1
1 0 C2
0 1 C2
0 -1 C2
-1 0 C2
Identify support vectors
S1= 1,0 S2=3,1 S3=3,-1
• Append 1 as initialbias input • Simplifying
S1= (1,0,1) ; S2=(3,1,1); S3=(3,-1,1) 2A1+ 4A2 + 4A3 = -1
• Calculate 3 variables for each SVs 4A1 + 11A2 + 9A3 = 1
A1.S1.S1+A2.S2.S1+A3.S3.S1 = -1 4A1 + 9A2. + 11A3 = 1
A1.S1.S2+A2.S2.S2+A3.S3.S2 = 1 • Solving
A1.S1.S3+A2.S2.S3+A3.S3.S3 = 1 A1 = -3.5, A2=0.75, A3=0.75
• Substitute S1,S2,S3
A1. (1,0,1). (1,0,1) +A2. (3,1,1). (1,0,1) W = Ʃ Ai . Si = -3.5 (1,0,1) + 0.75 (3,1,1) + 0.75
+A3. (3,-1,1). (1,0,1) = -1 (3,-1,1)
A1. (1,0,1). (3,1,1)+A2. (3,1,1). (3,1,1)+A3. = (1,0,-2)
(3,-1,1). (3,1,1) = 1 Y = W.X + B
A1. (1,0,1). (3,-1,1)+A2. (3,1,1). (3,- w1,w2 = 1,0 and b = -2
1,1)+A3. (3,-1,1). (3,-1,1) = 1 x-intercept = -b / w1 and y-intercept = -b / w2
• Take dot product of vectors and slope = - (b / w2) / (b / w1)
A1.(1+0+1) + A2.(3+0+1) + A3.(3+0+1) = - Substituting, x-intercept = 2, y-intercept =
1 infinity and slope = infinity
A1.(3+0+1) + A2.(9+1+1) + A3.(9-1+1) = 1 The resulting hyper plane is a 2-D vertical line
which meets x-axis at 2.
A1.(3+0+1) + A2.(9-1+1) + A3.(9+1+1) = 1
Logistic Regression
Logistic Regression
• Regression for Categorical data

• Supervised learning technique


Why Linear Regression fails for categorical data
AGE OUTCOMES AGE OUTCOMES AGE OUTCOMES

10 1 20 0 60 0 Y = -0.01404 * X + 0.8176
11 1 21 0 61 0
12 1 22 0 62 0
13 1 23 0 63 0
14 1 24 0 64 0
15 1 25 0 65 0
16 1 26 0 66 0
17 1 27 0 67 0
18 1 28 0 68 0
19 1 29 0 69 0
Why Linear Regression fails for categorical data
AGE OUTCOMES AGE OUTCOMES AGE OUTCOMES

10 0.6772 20 60 -0.0248 Y = -0.01404 * X + 0.8176


0.5368
11 0.66316 21 0.52276 61 -0.03884
12 0.64912 22 0.50872 62 -0.05288
13 0.63508 23 0.49468 63 -0.06692
14 0.62104 24 0.48064 64 -0.08096
15 0.607 25 0.4666 65 -0.095
16 0.59296 26 0.45256 66 -0.10904
17 0.57892 27 0.43852 67 -0.12308
18 0.56488 28 0.42448 68 -0.13712
19 0.55084 29 0.41044 69 -0.15116
Logistic Regression
p
log = β0 + β1 X
1−p

1
p=
1 + e-(β0 + β1 ∗ X)
The dataset of pass/fail in an exam for 5 students is given in the table below.
If we use Logistic Regression as the classifier and assume the model
suggested by the optimizer will become the following for Odds of passing a
course:

log (Odds)=−64+2×hours

• Calculate the probability of Pass for the student who studied 33 hours?
• At least how many hours the student should study that makes sure will pass
the course with the probability of more than 95%?
HOURS STUDIES RESULT (1= PASS, 0=FAIL)

29 0

15 0

33 1

28 1

39 1
Probability of Pass for the student who studied 33 hour

P = 1/(1+e-z)
Z = -64 + 2 * Hours
= -64 + 66 #Hours = 33
Z=2
P = 1/(1+e-2)
P = 0.88
A student who studies for 33 hours has 88% chance of passing the
course
At least how many hours the student should study that makes sure will pass the course with
the probability of more than 95%
P = 0.95 Log (odds) = -64 + 2 * hours
0.95 = 1/ (1 + e-z) Z = 2.94
0.95. (1 + e-z) = 1 2.94 = -64 + 2* hours
0.95 + 0.95 e-z =1 Hours = 33.5
0.95 e-z = 1 – 0.95
= 0.05 Z = -64 + 2. 3305
e-z = 0.0526 = -64 + 67
ln (e-z) = ln (0.0526) Z=3
-Z = -2.94 P = 1/(1+e-3)
Z = 2.94 P ≈ 0.952
Ensemble Learning
(AdaBoost Algorithm)
Ensemble Learning
• Ensemble learning combines several base algorithms to form one
optimized predictive algorithm
• Example: Instead of one Decision Tree, Ensemble Methods take
several different trees and aggregate them into one final, strong
predictor
• Types
• Bagging
• Boosting
• Stacking
Bagging Boosting Stacking
Weak learners Homogenous Homogenous Heterogenous
Learning Parallel Sequential Parallel
Combination Weak + deterministic averaging Weak + deterministic Weak + meta-model
process strategy
Goal Decrease Variance Decrease Bias Improve Predictions
Boosting
• Boosting algorithm tries to build a strong learner (predictive model) from the mistakes of several
weaker models.
• It starts by creating a model from the training data.
• Then, it creates a second model from the previous one by trying to reduce the errors from the
previous model.
• Models are added sequentially, each correcting its predecessor, until the training data is predicted
perfectly or the maximum number of models have been added.
• Boosting basically tries to reduce the bias error which arises when models are not able to identify
relevant trends in the data.
• This happens by evaluating the difference between the predicted value and the actual value.
• Types
• AdaBoost (Adaptive Boosting)
• Gradient Tree Boosting
• XGBoost
AdaBoost (Adaptive Boosting)
• Initialize weights wi = 1/N for every i
• For t=1 to T
❖ Generate training dataset by sampling with {Wi}
❖ Fit some weak learner gt
1−et
❖ Set λt = ½ ln
et
❖ et = σ𝑛𝑖=1(𝑒𝑖
∗ 𝑛
𝑤𝑖 )/ σ𝑖=1(𝑤𝑖 )
❖ Update the weights
➢ Wi ←wieλt if wrongly classified by gk
➢ Wi ←wie-λt if correctly classified
❖ Normalize wi to sum to one
• The new model is ft = ft-1 + λt gt
• fT (x) = sign [ σ𝑇𝑡=1 λt. gt ]
Example Initialize
weights wi = 1/N
Fit some weak learner gt
Update the weights
X1 X2 Decision
x1 x2 actual weight prediction loss
weight Wi ←wieλt if wrongly classified by gk
Wi ←wie-λt if correctly classified
x1 x2 actual Weight * loss
2 3 true
2 3 1 0.1 2 3 1 0.1 1 0 0 Normalize wi to sum to one
2.1 2 true 2 2 1 0.1 1 0 0
2 2 1 0.1 x x actu weig predicti norm(w_(i+
4 6 1 0.1 -1 1 0.1 w_(i+1)
1 2 al ht on 1))
4.5 6 true 4 6 1 0.1
4 3 -1 0.1 -1 0 0
4 3 -1 0.1 2 3 1 0.1 1 0.065 0.071
4 3.5 false 4 1 -1 0.1 -1 0 0
4 1 -1 0.1 5 7 1 0.1 -1 1 0.1 2 2 1 0.1 1 0.065 0.071
3.5 1 false
5 7 1 0.1 5 3 -1 0.1 -1 0 0
4 6 1 0.1 -1 0.153 0.167
5 7 true
6 5 1 0.1 -1 1 0.1
5 3 -1 0.1
4 3 -1 0.1 -1 0.065 0.071
5 3 false 8 6 -1 0.1 -1 0 0
6 5 1 0.1
8 2 -1 0.1 -1 0 0 4 1 -1 0.1 -1 0.065 0.071
6 5.5 true
8 6 -1 0.1
et = 0.3 5 7 1 0.1 -1 0.153 0.167
8 6 false 1−et
8 2 -1 0.1
λt = ½ ln = ln[(1 – 0.3)/0.3] / 2
8 2 false et 5 3 -1 0.1 -1 0.065 0.071
λt = 0.42
6 5 1 0.1 -1 0.153 0.167

8 6 -1 0.1 -1 0.065 0.071

8 2 -1 0.1 -1 0.065 0.071


Round-2 Round-3
et = 0.21, λt = 0.65 et = 0.31, λt = 0.38

x1 x2 actual weight prediction loss weight * loss w(i+1) norm(w(i+1)) x1 x2 actual weight prediction loss w * loss w(i+1) norm(w(i+1))

2 3 1 0.071 -1 1 0.071 0.137 0.167 2 3 1 0.167 1 0 0.000 0.114 0.122

2 2 1 0.071 -1 1 0.071 0.137 0.167 2 2 1 0.167 1 0 0.000 0.114 0.122

4 6 1 0.167 1 0 0.000 0.087 0.106 4 6 1 0.106 -1 1 0.106 0.155 0.167

4 3 -1 0.071 -1 0 0.000 0.037 0.045 4 3 -1 0.045 -1 0 0.000 0.031 0.033

4 1 -1 0.071 -1 0 0.000 0.037 0.045 4 1 -1 0.045 -1 0 0.000 0.031 0.033

5 7 1 0.167 1 0 0.000 0.087 0.106 5 7 1 0.106 -1 1 0.106 0.155 0.167

5 3 -1 0.071 -1 0 0.000 0.037 0.045 5 3 -1 0.045 -1 0 0.000 0.031 0.033

6 5 1 0.167 1 0 0.000 0.087 0.106 6 5 1 0.106 -1 1 0.106 0.155 0.167

8 6 -1 0.071 1 1 0.071 0.137 0.167 8 6 -1 0.167 -1 0 0.000 0.114 0.122

8 2 -1 0.071 -1 0 0.000 0.037 0.045 8 2 -1 0.045 -1 0 0.000 0.031 0.033


Round-4 round 1 alpha round 2 alpha round 3 alpha round 4 alpha

0.42 0.65 0.38 1.1

et = 0.1, λt = 1.1 round 1 prediction round 2 prediction round 3 prediction round 4 prediction

1 -1 1 1

1 -1 1 1

-1 1 -1 1
x1 x2 actual weight prediction loss weight * loss w(i+1) norm(w(i+1))
-1 -1 -1 1

2 3 1 0.122 1 0 0.000 0.041 0.068 -1 -1 -1 1

-1 1 -1 1
2 2 1 0.122 1 0 0.000 0.041 0.068
-1 -1 -1 1
4 6 1 0.167 1 0 0.000 0.056 0.093 -1 1 -1 1

4 3 -1 0.033 1 1 0.033 0.100 0.167 -1 1 -1 -1

-1 -1 -1 -1
4 1 -1 0.033 1 1 0.033 0.100 0.167

5 7 1 0.167 1 0 0.000 0.056 0.093 For example, prediction of the 1st instance will be
5 3 -1 0.033 1 1 0.033 0.100 0.167 0.42 x 1 + 0.65 x (-1) + 0.38 x 1 + 1.1 x 1 = 1.25
6 5 1 0.167 1 0 0.000 0.056 0.093 And we will apply sign function
8 6 -1 0.122 -1 0 0.000 0.041 0.068 Sign(1.25) = +1 aka true which is correctly classified
8 2 -1 0.033 -1 0 0.000 0.011 0.019
Classification & Prediction
Accuracy and Error Measures
Classifier Accuracy Measures
The accuracy of a classifier is the percentage of tuples
that are correctly classified by the classifier.

Where,

pos is the number of positive (“cancer”) tuples


neg is the number of negative (“not cancer”) tuples
t_pos is the number of true positives (“cancer” tuples that were correctly classified as such)
t_neg is the number of true negatives (“not cancer” tuples that were correctly classified as such)
Classifier Accuracy Measures
• Sensitivity is true positive rate (proportion of positive tuples that are
correctly identified)
• Specificity is true negative rate (proportion of negative tuples that are
correctly identified)
• Precision access the percentage of tuples labelled as “cancer” that actually
are “cancer” tuples.

Where,
f_pos is the number of false positives (“not cancer”
tuples that were incorrectly labelled as “cancer”)
Confusion matrix
Predictor Error Measures
• Loss functions measure the error between actual value yi and the predicted value yi’

• Average loss is given as follows where mean squared error exaggerates the presence of outliers

• Relative error is the error to be relative to what it would have been if we had just predicted the mean
value for y from the training data, D.
Evaluating the Accuracy of a Classifier or
Predictor
• Holdout Method
• Two-thirds of the data are allocated to the training set, and the remaining
one-third is allocated to the test set.
• The training set is used to derive the model, whose accuracy is estimated with
the test set
• Random subsampling
• Holdout method is repeated k times
• The overall accuracy estimate is taken as the average of the accuracies
obtained from each iteration
• For prediction, the average of the predictor error rates is the overall error rate
Ensemble Methods - Bagging
Ensemble Methods - Boosting
W1

W2
Combine
all

Wn
ROC Curve
# Actual Predicted Prob. Y Prob. N Actual Prob.Y
1 Y N 0.35 0.65 N 0.55
2 N N 0.23 0.77 Y 0.54
3 N Y 0.55 0.45 N 0.47
4 Y N 0.32 0.68 Y 0.35
5 Y Y 0.54 0.46 Y 0.32
6 N N 0.47 0.53 N 0.23

TP Rate = TP/(TP+FN)
TN Rate = FP/(FP+TN)

Cut-off 0.5:
1/(1+2)=0.33
1/(1+2) =0.33

Cut-off 0.4
1/(1+2)=0.33
2/(1+2)=0.66
PREDICTION & CLUSTERING
TECHNIQUES
PREDICTION TECHNIQUES
Linear Regression
Multiple Linear Regression
Regression Tree
LINEAR REGRESSION
Prediction
Prediction is the task of predicting continuous values for given input.
For example, we may wish to predict the salary of college graduates with
10 years of work experience.
By far, the most widely used approach for numeric prediction is regression,
a statistical methodology.
Regression analysis can be used to model the relationship between one or
more predictor variables and a response variable (which is continuous-
valued).
Predictor variables are the attributes describing the tuple.
The values of the predictor variables are known.
The response variable is what we want to predict.
Linear Regression
Linear Regression develops a model Y as a linear function of X.
y = w0 + w1 x
Where w0 and w1 are Y-intercept and slope of the line respectively
These regression coefficients can be solved by the method of least
squares
- Data points
|D| - No. of data points
- Mean of
- Mean of
Using this equation, we can predict that the salary of a college graduate with, say, 10 years of experience is $58,600
MULTIPLE LINEAR REGRESSION
Multiple linear regression formula
The formula for a multiple linear regression is

y = the predicted value

a = the y-intercept

b1= the regression coefficient of the first predictor variable x1

bn= the regression coefficient of the last predictor variable xn

a and b has to be chosen so as to minimize the sum of squared errors of prediction. So the prediction equation is:
Bivariate Linear Regression
EXAMPLE

X1 X2 Y
2 6 7
4 5 7
5 8 9
1 3 5
3 4 6
2 2 4
1 4 5
Y X1 X2 X1X2 X1Y X2Y X12 X22 - 1) / N
2 2 2
1 1
7 2 6 12 14 42 4 36
7 4 5 20 28 35 16 25 - 2) / N
2 2 2
2 2
9 5 8 40 45 72 25 64
5 1 3 3 5 15 1 9 1y 1Y - 1 /N]
6 3 4 12 18 24 9 16
4 2 2 4 8 8 4 4 2y 2Y - 2 /N]
5 1 4 4 5 20 1 16
1= 18 = 32 1x2 =12.7143 1y = 12.4286 2y = 19.4286 2=13.7143 2 = 23.7143
2 1 2

b1 = 47.71/163.57 = 0.2917

b2 = 108.43/163.57 = 0.66288

a = 6.14 - (0.29*2.57) - (0.66*4.57) = 2.36245

Y = 2.36245 + 0.2917 X1 + 0.66288 X2


REGRESSION TREE
Regression Tree
Decision trees which are built for a data set where the target column
could be real number are called regression trees
Decision rules will be found based on standard deviations
Day Outlook Temp. Humidity Wind No. of Golf Players

1 Sunny Hot High Weak 25


Standard deviation

2 Sunny Hot High Strong 30

Golf players
3 Overcast Hot High Weak 46
= {25, 30, 46, 45, 52, 23, 43, 35, 38, 46, 48, 52,
4 Rain Mild High Weak 45 44, 30}

5 Rain Cool Normal Weak 52

6 Rain Cool Normal Strong 23


Average of golf players
= (25 + 30 + 46 + 45 + 52 + 23 + 43 + 35 + 38 +
7 Overcast Cool Normal Strong 43 46 + 48 + 52 + 44 + 30 )/14
= 39.78
8 Sunny Mild High Weak 35

9 Sunny Cool Normal Weak 38 Standard deviation of golf players


= 39.78)2 + (30 39.78)2 + (46
10 Rain Mild Normal Weak 46 39.78) 2 39.78)2 )/14]
11 Sunny Mild Normal Strong 48
= 9.32

12 Overcast Mild High Strong 52

13 Overcast Hot Normal Weak 44

14 Rain Mild High Strong 30


Standard Deviation of golf players for Sunny
outlook Golf players for sunny outlook = {25,
30, 35, 38, 48}
Day Outlook Temp. Humidity Wind Golf Players

1 Sunny Hot High Weak 25


Average of golf players for sunny
2 Sunny Hot High Strong 30 outlook
= (25+30+35+38+48)/5
8 Sunny Mild High Weak 35 = 35.2
9 Sunny Cool Normal Weak 38

11 Sunny Mild Normal Strong 48 Standard deviation of golf players for


sunny outlook
= 35.2)2 + (30 35.2)2
= 7.78
Standard Deviation of golf players for
Overcast outlook
Golf players for overcast outlook
Day Outlook Temp. Humidity Wind Golf Players = {46, 43, 52, 44}

3 Overcast Hot High Weak 46


Average of golf players for overcast
7 Overcast Cool Normal Strong 43
outlook
12 Overcast Mild High Strong 52 = (46 + 43 + 52 + 44)/4
= 46.25
13 Overcast Hot Normal Weak 44

Standard deviation of golf players for


overcast outlook
= -46.25)2+(43-46.25)2
= 3.49
Standard Deviation of golf players for Rainy
outlook
Golf players for overcast outlook
Day Outlook Temp. Humidity Wind Golf Players = {45, 52, 23, 46, 30}

4 Rain Mild High Weak 45


Average of golf players for overcast
5 Rain Cool Normal Weak 52 outlook
6 Rain Cool Normal Strong 23 = (45+52+23+46+30)/5
= 39.2
10 Rain Mild Normal Weak 46

14 Rain Mild High Strong 30 Standard deviation of golf players for


rainy outlook
= 39.2)2+(52 39.2)2
=10.87
standard deviations for the outlook feature
Outlook Stdev of Golf Players Instances

Overcast 3.49 4

Rain 10.87 5

Sunny 7.78 5

Weighted standard deviation for outlook = (4/14)x3.49 + (5/14)x10.87 + (5/14)x7.78 = 7.66

Standard deviation reduction for outlook = 9.32 7.66 = 1.66


Standard deviation of golf players for hot
temperature
Day Outlook Temp. Humidity Wind Golf Players

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

3 Overcast Hot High Weak 46

13 Overcast Hot Normal Weak 44

Golf players for hot temperature = {25, 30, 46, 44}

Standard deviation of golf players for hot temperature = 8.95


Standard deviation of golf players for cool
temperature
Day Outlook Temp. Humidity Wind Golf Players

5 Rain Cool Normal Weak 52

6 Rain Cool Normal Strong 23

7 Overcast Cool Normal Strong 43

9 Sunny Cool Normal Weak 38

Golf players for cool temperature = {52, 23, 43, 38}

Standard deviation of golf players for cool temperature = 10.51


Standard deviation of golf players for mild
temperature
Day Outlook Temp. Humidity Wind Golf Players

4 Rain Mild High Weak 45

8 Sunny Mild High Weak 35

10 Rain Mild Normal Weak 46

11 Sunny Mild Normal Strong 48

12 Overcast Mild High Strong 52

14 Rain Mild High Strong 30

Golf players for mild temperature = {45, 35, 46, 48, 52, 30}

Standard deviation of golf players for mild temperature = 7.65


standard deviations for temperature feature
Temperature Stdev of Golf Players Instances

Hot 8.95 4

Cool 10.51 4

Mild 7.65 6

Weighted standard deviation for temperature = (4/14)x8.95 + (4/14)x10.51 + (6/14)x7.65 = 8.84

Standard deviation reduction for temperature = 9.32 8.84 = 0.47


Standard deviation for golf players for high
humidity
Day Outlook Temp. Humidity Wind Golf Players

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

3 Overcast Hot High Weak 46

4 Rain Mild High Weak 45

8 Sunny Mild High Weak 35

12 Overcast Mild High Strong 52

14 Rain Mild High Strong 30

Golf players for high humidity = {25, 30, 46, 45, 35, 52, 30}

Standard deviation for golf players for high humidity = 9.36


Standard deviation for golf players for normal
humidity
Day Outlook Temp. Humidity Wind Golf Players

5 Rain Cool Normal Weak 52

6 Rain Cool Normal Strong 23

7 Overcast Cool Normal Strong 43

9 Sunny Cool Normal Weak 38

10 Rain Mild Normal Weak 46

11 Sunny Mild Normal Strong 48

13 Overcast Hot Normal Weak 44

Golf players for normal humidity = {52, 23, 43, 38, 46, 48, 44}

Standard deviation for golf players for normal humidity = 8.73


Summarizing standard deviations for humidity
feature
Humidity Stdev of Golf Player Instances

High 9.36 7

Normal 8.73 7

Weighted standard deviation for humidity = (7/14)x9.36 + (7/14)x8.73 = 9.04

Standard deviation reduction for humidity = 9.32 9.04 = 0.27


Standard deviation for golf players for strong
wind
Day Outlook Temp. Humidity Wind Golf Players

2 Sunny Hot High Strong 30

6 Rain Cool Normal Strong 23

7 Overcast Cool Normal Strong 43

11 Sunny Mild Normal Strong 48

12 Overcast Mild High Strong 52

14 Rain Mild High Strong 30

Golf players for strong wind= {30, 23, 43, 48, 52, 30}

Standard deviation for golf players for strong wind = 10.59


Standard deviation for golf players for weak
wind 1 Sunny Hot High Weak 25

3 Overcast Hot High Weak 46

4 Rain Mild High Weak 45

5 Rain Cool Normal Weak 52

8 Sunny Mild High Weak 35

9 Sunny Cool Normal Weak 38

10 Rain Mild Normal Weak 46

13 Overcast Hot Normal Weak 44

Golf players for weakk wind= {25, 46, 45, 52, 35, 38, 46, 44}

Standard deviation for golf players for weak wind = 7.87


standard deviations for wind feature

Wind Stdev of Golf Player Instances

Strong 10.59 6

Weak 7.87 8

Weighted standard deviation for wind = (6/14)x10.59 + (8/14)x7.87 = 9.03

Standard deviation reduction for wind = 9.32 9.03 = 0.29


Feature Standard Deviation Reduction

Outlook 1.66

Temperature 0.47

Humidity 0.27

Wind 0.29
Standard deviation for sunny outlook
Day Outlook Temp. Humidity Wind Golf Players

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

8 Sunny Mild High Weak 35

9 Sunny Cool Normal Weak 38

11 Sunny Mild Normal Strong 48

Golf players for sunny outlook = {25, 30, 35, 38, 48}

Standard deviation for sunny outlook = 7.78


Standard deviation for sunny outlook and hot
temperature
Day Outlook Temp. Humidity Wind Golf Players

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

Standard deviation for sunny outlook and hot temperature = 2.5


Standard deviation for sunny outlook and cool
temperature

Day Outlook Temp. Humidity Wind Golf Players

9 Sunny Cool Normal Weak 38

Standard deviation for sunny outlook and cool temperature = 0


Standard deviation for sunny outlook and
mild temperature
Day Outlook Temp. Humidity Wind Golf Players

8 Sunny Mild High Weak 35

11 Sunny Mild Normal Strong 48

Standard deviation for sunny outlook and mild temperature = 6.5


standard deviations for temperature feature
when outlook is sunny
Temperature Stdev for Golf Players Instances

Hot 2.5 2

Cool 0 1

Mild 6.5 2

Weighted standard deviation for sunny outlook and temperature = (2/5)x2.5 + (1/5)x0 + (2/5)x6.5 = 3.6

Standard deviation reduction for sunny outlook and temperature = 7.78 3.6 = 4.18
Standard deviation for sunny outlook and
high humidity = 4.08

Day Outlook Temp. Humidity Wind Golf Players

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

8 Sunny Mild High Weak 35


Standard deviation for sunny outlook and
normal humidity = 5

Day Outlook Temp. Humidity Wind Golf Players

9 Sunny Cool Normal Weak 38

11 Sunny Mild Normal Strong 48


standard deviations for humidity feature when
outlook is sunny
Humidity Stdev for Golf Players Instances

High 4.08 3

Normal 5.00 2

Weighted standard deviations for sunny outlook and humidity = (3/5)x4.08 + (2/5)x5 = 4.45

Standard deviation reduction for sunny outlook and humidity = 7.78 4.45 = 3.33
Sunny outlook and Wind
Day Outlook Temp. Humidity Wind Golf Players Standard deviation for sunny
outlook and strong wind = 9
2 Sunny Hot High Strong 30

11 Sunny Mild Normal Strong 48

Day Outlook Temp. Humidity Wind Golf Players Standard deviation for
sunny outlook and weak
1 Sunny Hot High Weak 25 wind = 5.56
8 Sunny Mild High Weak 35

9 Sunny Cool Normal Weak 38

Weighted standard deviations for sunny


Wind Stdev for Golf Players Instances outlook and wind = (2/5)x9 + (3/5)x5.56 =
Strong 9 2 6.93
Standard deviation reduction for sunny
Weak 5.56 3 outlook and wind = 7.78 6.93 = 0.85
Feature Standard Deviation Reduction

Temperature 4.18

Humidity 3.33

Wind 0.85
Pruning
Cool branch has one instance in its sub data set.
We can say that if outlook is sunny and temperature is cool, then there would be 38 golf
players.
But what about hot branch? There are still 2 instances.
Should we add another branch for weak wind and strong wind? No, we should not.
Because this causes over-fitting.
We should terminate building branches
if there are less than five instances in the sub data set.
Or standard deviation of the sub data set can be less than 5% of the entire data set.
Here, terminate the branch if there are less than 5 instances in the current sub data set.
If this termination condition is satisfied, then calculate the average of the sub data set.
This operation is called as pruning in decision tree trees.
Overcast outlook
Overcast outlook branch has already 4 instances in the sub data set.
We can terminate building branches for this leaf.
Final decision will be average of the following table for overcast
outlook.
If outlook is overcast, then there would be (46+43+52+44)/4 = 46.25
golf players
Day Outlook Temp. Humidity Wind Golf Players

3 Overcast Hot High Weak 46

7 Overcast Cool Normal Strong 43

12 Overcast Mild High Strong 52

13 Overcast Hot Normal Weak 44


Rainy Outlook
Day Outlook Temp. Humidity Wind Golf Players

4 Rain Mild High Weak 45

5 Rain Cool Normal Weak 52

6 Rain Cool Normal Strong 23

10 Rain Mild Normal Weak 46

14 Rain Mild High Strong 30

Standard deviation for rainy outlook = 10.87


Rainy outlook and temperature
Weighted standard deviation for rainy outlook and
Temperature Standard deviation for golf players instances
temperature = (2/5)x14.50 + (3/5)x7.32 = 10.19
Cool 14.50 2
Standard deviation reduction for rainy outlook and
Mild 7.32 3 temperature = 10.87 10.19 = 0.67

Rainy outlook and humidity


Weighted standard deviation for rainy outlook and humidity =
Humidity Standard deviation for golf players instances
(2/5)x7.50 + (3/5)x12.50 = 10.50
High 7.50 2
Standard deviation reduction for rainy outlook and humidity =
Normal 12.50 3 10.87 10.50 = 0.37

Rainy outlook and wind

Wind Standard deviation for golf players instances Weighted standard deviation for rainy outlook and wind =
(3/5)x3.09 + (2/5)x3.5 = 3.25
Weak 3.09 3
Standard deviation reduction for rainy outlook and wind =
Strong 3.5 2
10.87 3.25 = 7.62
Feature Standard deviation reduction

Temperature 0.67

Humidity 0.37

Wind 7.62
Decision trees are powerful way to classify problems
They can be adapted into regression problems
Regression trees tend to over-fit much more than classification trees
Termination rule should be tuned carefully to avoid over-fitting

You might also like