0% found this document useful (0 votes)

13 views118 pages

Classification Techniques

The document outlines various classification techniques used in data analysis, including Decision Trees, Naive Bayesian Classification, Neural Networks, K-Nearest Neighbors, Support Vector Machines, Logistic Regression, and Ensemble Learning methods like AdaBoost. Each technique is described with its basic principles, advantages, and examples of application, emphasizing the importance of supervised learning and the use of training data. Additionally, the document discusses how these methods can be used to predict categorical outcomes based on input attributes.

Uploaded by

jasonnaren46

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views118 pages

Classification Techniques

Uploaded by

jasonnaren46

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Classification Techniques

Classification
• Classification predicts categorical (discrete) labels
• Example: categorize bank loan applications as either safe or risky
• A tuple, X, is represented by an n-dimensional attribute vector, X = (x1, x2, : : : , xn), depicting n
measurements made on the tuple from n database attributes, respectively, A1, A2, : : : , An
• Each tuple, X, is assumed to belong to a predefined class as determined by another database
attribute called the class label attribute.
• The class label attribute is discrete-valued and unordered and It is categorical in that each value
serves as a category or class.
• Data classification is a two-step process
• In the first step, classification algorithm builds the classifier by analyzing or “learning from” a training set
made up of database tuples and their associated class labels.
• In the second step, the model is used for classification
• Because the class label of each training tuple is provided, this step is also known as supervised
learning
Decision Tree
Decision Tree
• Decision tree induction is the learning of decision
trees from class-labeled training tuples.
• A decision tree is a flowchart-like tree structure,
where each internal node (nonleaf node) denotes
a test on an attribute, each branch represents an
outcome of the test, and each leaf node (or
terminal node) holds a class label.
• The topmost node in a tree is the root node.
• Given a tuple, X, for which the associated class
label is unknown, the attribute values of the
tuple are tested against the decision tree.
• A path is traced from the root to a leaf node,
which holds the class for that tuple.
Decision Tree types
• An attribute selection measure is for selecting the splitting criterion
that “best” separates a given data partition, D, of class-labeled
training tuples into individual classes.
• Attribute selection measure:
▪ Information gain – ID3
▪ Gain ratio – C4.5
▪ Gini index - CART
ID3 Decision Tree
Naive Bayesian Classification
Bayesian Classification
• Bayesian classifiers are statistical classifiers.
• They can predict class membership probabilities, such as the
probability that a given tuple belongs to a particular class.
• Bayes’ Theorem
Naive Bayesian Classification
• Let represent a tuple
• Let represent attribute vector
• Let represent classes
• X belongs to the class Ci if and only if

Where
Neural Network
Feed-forward Neural Network
• The backpropagation algorithm performs learning on a multilayer
feed-forward neural network.
• It iteratively learns a set of weights for prediction of the class label of
tuples.
• A multilayer feed-forward neural network consists of an input layer,
one or more hidden layers, and an output layer.
Learning by the backpropagation algorithm
Sample calculations for learning by the
backpropagation algorithm.

Let the learning rate be 0.9.

first training tuple is X = (1, 0, 1), whose class label is 1.

K Nearest Neighbour
(K- NN)
K- Nearest Neighbour Classification
• It is based on feature similarity
• A k-nearest-neighbor classifier searches for K training tuples that are
closest to the unknown tuple
• It is suitable for small, labelled, noise-free data
• Closeness is defined in terms of a distance metric, such as Euclidean
distance
K- Nearest Neighbour Classification
• A good value for k, the number of neighbors can be determined
experimentally.
• Starting with k = 1, use a test set to estimate the error rate of the
classifier.
• This process can be repeated each time by incrementing k to allow for
one more neighbor.
• The k value that gives the minimum error rate may be selected.
Example:
57,170 ?
√(170-167)2+(57-51)2 = 6.7
k=3
Weight Height Class Euclidean Distance
51 167 Underweight 6.7
62 182 Normal 13
69 176 Normal 13.4
64 173 Normal 7.6
65 172 Normal 8.2
56 174 Underweight 4.1
58 169 Normal 1.4
57 173 Normal 3
55 170 Normal 2
Support Vector Machine
(SVM)
SVM
• SVM attempt to pass a linearly separable hyperplane through a
dataset in order to classify the data into two groups
• This hyperplane could be a line (2D), plane (3D), and hyperplane
(4D+)
2-D training data
Linearly separable data Linearly inseparable data
Linearly separable training data

Attribute 1 Attribute 2 Class

3 1 C1
3 -1 C1
6 1 C1
6 -1 C1
1 0 C2
0 1 C2
0 -1 C2
-1 0 C2
Identify support vectors
S1= 1,0 S2=3,1 S3=3,-1
• Append 1 as initialbias input • Simplifying
S1= (1,0,1) ; S2=(3,1,1); S3=(3,-1,1) 2A1+ 4A2 + 4A3 = -1
• Calculate 3 variables for each SVs 4A1 + 11A2 + 9A3 = 1
A1.S1.S1+A2.S2.S1+A3.S3.S1 = -1 4A1 + 9A2. + 11A3 = 1
A1.S1.S2+A2.S2.S2+A3.S3.S2 = 1 • Solving
A1.S1.S3+A2.S2.S3+A3.S3.S3 = 1 A1 = -3.5, A2=0.75, A3=0.75
• Substitute S1,S2,S3
A1. (1,0,1). (1,0,1) +A2. (3,1,1). (1,0,1) W = Ʃ Ai . Si = -3.5 (1,0,1) + 0.75 (3,1,1) + 0.75
+A3. (3,-1,1). (1,0,1) = -1 (3,-1,1)
A1. (1,0,1). (3,1,1)+A2. (3,1,1). (3,1,1)+A3. = (1,0,-2)
(3,-1,1). (3,1,1) = 1 Y = W.X + B
A1. (1,0,1). (3,-1,1)+A2. (3,1,1). (3,- w1,w2 = 1,0 and b = -2
1,1)+A3. (3,-1,1). (3,-1,1) = 1 x-intercept = -b / w1 and y-intercept = -b / w2
• Take dot product of vectors and slope = - (b / w2) / (b / w1)
A1.(1+0+1) + A2.(3+0+1) + A3.(3+0+1) = - Substituting, x-intercept = 2, y-intercept =
1 infinity and slope = infinity
A1.(3+0+1) + A2.(9+1+1) + A3.(9-1+1) = 1 The resulting hyper plane is a 2-D vertical line
which meets x-axis at 2.
A1.(3+0+1) + A2.(9-1+1) + A3.(9+1+1) = 1
Logistic Regression
Logistic Regression
• Regression for Categorical data

• Supervised learning technique

Why Linear Regression fails for categorical data
AGE OUTCOMES AGE OUTCOMES AGE OUTCOMES

10 1 20 0 60 0 Y = -0.01404 * X + 0.8176
11 1 21 0 61 0
12 1 22 0 62 0
13 1 23 0 63 0
14 1 24 0 64 0
15 1 25 0 65 0
16 1 26 0 66 0
17 1 27 0 67 0
18 1 28 0 68 0
19 1 29 0 69 0
Why Linear Regression fails for categorical data
AGE OUTCOMES AGE OUTCOMES AGE OUTCOMES

10 0.6772 20 60 -0.0248 Y = -0.01404 * X + 0.8176

0.5368
11 0.66316 21 0.52276 61 -0.03884
12 0.64912 22 0.50872 62 -0.05288
13 0.63508 23 0.49468 63 -0.06692
14 0.62104 24 0.48064 64 -0.08096
15 0.607 25 0.4666 65 -0.095
16 0.59296 26 0.45256 66 -0.10904
17 0.57892 27 0.43852 67 -0.12308
18 0.56488 28 0.42448 68 -0.13712
19 0.55084 29 0.41044 69 -0.15116
Logistic Regression
p
log = β0 + β1 X
1−p

1
p=
1 + e-(β0 + β1 ∗ X)
The dataset of pass/fail in an exam for 5 students is given in the table below.
If we use Logistic Regression as the classifier and assume the model
suggested by the optimizer will become the following for Odds of passing a
course:

log (Odds)=−64+2×hours

• Calculate the probability of Pass for the student who studied 33 hours?
• At least how many hours the student should study that makes sure will pass
the course with the probability of more than 95%?
HOURS STUDIES RESULT (1= PASS, 0=FAIL)

29 0

15 0

33 1

28 1

39 1
Probability of Pass for the student who studied 33 hour

P = 1/(1+e-z)
Z = -64 + 2 * Hours
= -64 + 66 #Hours = 33
Z=2
P = 1/(1+e-2)
P = 0.88
A student who studies for 33 hours has 88% chance of passing the
course
At least how many hours the student should study that makes sure will pass the course with
the probability of more than 95%
P = 0.95 Log (odds) = -64 + 2 * hours
0.95 = 1/ (1 + e-z) Z = 2.94
0.95. (1 + e-z) = 1 2.94 = -64 + 2* hours
0.95 + 0.95 e-z =1 Hours = 33.5
0.95 e-z = 1 – 0.95
= 0.05 Z = -64 + 2. 3305
e-z = 0.0526 = -64 + 67
ln (e-z) = ln (0.0526) Z=3
-Z = -2.94 P = 1/(1+e-3)
Z = 2.94 P ≈ 0.952
Ensemble Learning
(AdaBoost Algorithm)
Ensemble Learning
• Ensemble learning combines several base algorithms to form one
optimized predictive algorithm
• Example: Instead of one Decision Tree, Ensemble Methods take
several different trees and aggregate them into one final, strong
predictor
• Types
• Bagging
• Boosting
• Stacking
Bagging Boosting Stacking
Weak learners Homogenous Homogenous Heterogenous
Learning Parallel Sequential Parallel
Combination Weak + deterministic averaging Weak + deterministic Weak + meta-model
process strategy
Goal Decrease Variance Decrease Bias Improve Predictions
Boosting
• Boosting algorithm tries to build a strong learner (predictive model) from the mistakes of several
weaker models.
• It starts by creating a model from the training data.
• Then, it creates a second model from the previous one by trying to reduce the errors from the
previous model.
• Models are added sequentially, each correcting its predecessor, until the training data is predicted
perfectly or the maximum number of models have been added.
• Boosting basically tries to reduce the bias error which arises when models are not able to identify
relevant trends in the data.
• This happens by evaluating the difference between the predicted value and the actual value.
• Types
• AdaBoost (Adaptive Boosting)
• Gradient Tree Boosting
• XGBoost
AdaBoost (Adaptive Boosting)
• Initialize weights wi = 1/N for every i
• For t=1 to T
❖ Generate training dataset by sampling with {Wi}
❖ Fit some weak learner gt
1−et
❖ Set λt = ½ ln
et
❖ et = σ𝑛𝑖=1(𝑒𝑖
∗ 𝑛
𝑤𝑖 )/ σ𝑖=1(𝑤𝑖 )
❖ Update the weights
➢ Wi ←wieλt if wrongly classified by gk
➢ Wi ←wie-λt if correctly classified
❖ Normalize wi to sum to one
• The new model is ft = ft-1 + λt gt
• fT (x) = sign [ σ𝑇𝑡=1 λt. gt ]
Example Initialize
weights wi = 1/N
Fit some weak learner gt
Update the weights
X1 X2 Decision
x1 x2 actual weight prediction loss
weight Wi ←wieλt if wrongly classified by gk
Wi ←wie-λt if correctly classified
x1 x2 actual Weight * loss
2 3 true
2 3 1 0.1 2 3 1 0.1 1 0 0 Normalize wi to sum to one
2.1 2 true 2 2 1 0.1 1 0 0
2 2 1 0.1 x x actu weig predicti norm(w_(i+
4 6 1 0.1 -1 1 0.1 w_(i+1)
1 2 al ht on 1))
4.5 6 true 4 6 1 0.1
4 3 -1 0.1 -1 0 0
4 3 -1 0.1 2 3 1 0.1 1 0.065 0.071
4 3.5 false 4 1 -1 0.1 -1 0 0
4 1 -1 0.1 5 7 1 0.1 -1 1 0.1 2 2 1 0.1 1 0.065 0.071
3.5 1 false
5 7 1 0.1 5 3 -1 0.1 -1 0 0
4 6 1 0.1 -1 0.153 0.167
5 7 true
6 5 1 0.1 -1 1 0.1
5 3 -1 0.1
4 3 -1 0.1 -1 0.065 0.071
5 3 false 8 6 -1 0.1 -1 0 0
6 5 1 0.1
8 2 -1 0.1 -1 0 0 4 1 -1 0.1 -1 0.065 0.071
6 5.5 true
8 6 -1 0.1
et = 0.3 5 7 1 0.1 -1 0.153 0.167
8 6 false 1−et
8 2 -1 0.1
λt = ½ ln = ln[(1 – 0.3)/0.3] / 2
8 2 false et 5 3 -1 0.1 -1 0.065 0.071
λt = 0.42
6 5 1 0.1 -1 0.153 0.167

8 6 -1 0.1 -1 0.065 0.071

8 2 -1 0.1 -1 0.065 0.071

Round-2 Round-3
et = 0.21, λt = 0.65 et = 0.31, λt = 0.38

x1 x2 actual weight prediction loss weight * loss w(i+1) norm(w(i+1)) x1 x2 actual weight prediction loss w * loss w(i+1) norm(w(i+1))

2 3 1 0.071 -1 1 0.071 0.137 0.167 2 3 1 0.167 1 0 0.000 0.114 0.122

2 2 1 0.071 -1 1 0.071 0.137 0.167 2 2 1 0.167 1 0 0.000 0.114 0.122

4 6 1 0.167 1 0 0.000 0.087 0.106 4 6 1 0.106 -1 1 0.106 0.155 0.167

4 3 -1 0.071 -1 0 0.000 0.037 0.045 4 3 -1 0.045 -1 0 0.000 0.031 0.033

4 1 -1 0.071 -1 0 0.000 0.037 0.045 4 1 -1 0.045 -1 0 0.000 0.031 0.033

5 7 1 0.167 1 0 0.000 0.087 0.106 5 7 1 0.106 -1 1 0.106 0.155 0.167

5 3 -1 0.071 -1 0 0.000 0.037 0.045 5 3 -1 0.045 -1 0 0.000 0.031 0.033

6 5 1 0.167 1 0 0.000 0.087 0.106 6 5 1 0.106 -1 1 0.106 0.155 0.167

8 6 -1 0.071 1 1 0.071 0.137 0.167 8 6 -1 0.167 -1 0 0.000 0.114 0.122

8 2 -1 0.071 -1 0 0.000 0.037 0.045 8 2 -1 0.045 -1 0 0.000 0.031 0.033

Round-4 round 1 alpha round 2 alpha round 3 alpha round 4 alpha

0.42 0.65 0.38 1.1

et = 0.1, λt = 1.1 round 1 prediction round 2 prediction round 3 prediction round 4 prediction

1 -1 1 1

-1 1 -1 1
x1 x2 actual weight prediction loss weight * loss w(i+1) norm(w(i+1))
-1 -1 -1 1

2 3 1 0.122 1 0 0.000 0.041 0.068 -1 -1 -1 1

-1 1 -1 1
2 2 1 0.122 1 0 0.000 0.041 0.068
-1 -1 -1 1
4 6 1 0.167 1 0 0.000 0.056 0.093 -1 1 -1 1

4 3 -1 0.033 1 1 0.033 0.100 0.167 -1 1 -1 -1

-1 -1 -1 -1
4 1 -1 0.033 1 1 0.033 0.100 0.167

5 7 1 0.167 1 0 0.000 0.056 0.093 For example, prediction of the 1st instance will be
5 3 -1 0.033 1 1 0.033 0.100 0.167 0.42 x 1 + 0.65 x (-1) + 0.38 x 1 + 1.1 x 1 = 1.25
6 5 1 0.167 1 0 0.000 0.056 0.093 And we will apply sign function
8 6 -1 0.122 -1 0 0.000 0.041 0.068 Sign(1.25) = +1 aka true which is correctly classified
8 2 -1 0.033 -1 0 0.000 0.011 0.019
Classification & Prediction
Accuracy and Error Measures
Classifier Accuracy Measures
The accuracy of a classifier is the percentage of tuples
that are correctly classified by the classifier.

Where,

pos is the number of positive (“cancer”) tuples

neg is the number of negative (“not cancer”) tuples
t_pos is the number of true positives (“cancer” tuples that were correctly classified as such)
t_neg is the number of true negatives (“not cancer” tuples that were correctly classified as such)
Classifier Accuracy Measures
• Sensitivity is true positive rate (proportion of positive tuples that are
correctly identified)
• Specificity is true negative rate (proportion of negative tuples that are
correctly identified)
• Precision access the percentage of tuples labelled as “cancer” that actually
are “cancer” tuples.

Where,
f_pos is the number of false positives (“not cancer”
tuples that were incorrectly labelled as “cancer”)
Confusion matrix
Predictor Error Measures
• Loss functions measure the error between actual value yi and the predicted value yi’

• Average loss is given as follows where mean squared error exaggerates the presence of outliers

• Relative error is the error to be relative to what it would have been if we had just predicted the mean
value for y from the training data, D.
Evaluating the Accuracy of a Classifier or
Predictor
• Holdout Method
• Two-thirds of the data are allocated to the training set, and the remaining
one-third is allocated to the test set.
• The training set is used to derive the model, whose accuracy is estimated with
the test set
• Random subsampling
• Holdout method is repeated k times
• The overall accuracy estimate is taken as the average of the accuracies
obtained from each iteration
• For prediction, the average of the predictor error rates is the overall error rate
Ensemble Methods - Bagging
Ensemble Methods - Boosting
W1

W2
Combine
all

Wn
ROC Curve
# Actual Predicted Prob. Y Prob. N Actual Prob.Y
1 Y N 0.35 0.65 N 0.55
2 N N 0.23 0.77 Y 0.54
3 N Y 0.55 0.45 N 0.47
4 Y N 0.32 0.68 Y 0.35
5 Y Y 0.54 0.46 Y 0.32
6 N N 0.47 0.53 N 0.23

TP Rate = TP/(TP+FN)
TN Rate = FP/(FP+TN)

Cut-off 0.5:
1/(1+2)=0.33
1/(1+2) =0.33

Cut-off 0.4
1/(1+2)=0.33
2/(1+2)=0.66
PREDICTION & CLUSTERING
TECHNIQUES
PREDICTION TECHNIQUES
Linear Regression
Multiple Linear Regression
Regression Tree
LINEAR REGRESSION
Prediction
Prediction is the task of predicting continuous values for given input.
For example, we may wish to predict the salary of college graduates with
10 years of work experience.
By far, the most widely used approach for numeric prediction is regression,
a statistical methodology.
Regression analysis can be used to model the relationship between one or
more predictor variables and a response variable (which is continuous-
valued).
Predictor variables are the attributes describing the tuple.
The values of the predictor variables are known.
The response variable is what we want to predict.
Linear Regression
Linear Regression develops a model Y as a linear function of X.
y = w0 + w1 x
Where w0 and w1 are Y-intercept and slope of the line respectively
These regression coefficients can be solved by the method of least
squares
- Data points
|D| - No. of data points
- Mean of
- Mean of
Using this equation, we can predict that the salary of a college graduate with, say, 10 years of experience is $58,600
MULTIPLE LINEAR REGRESSION
Multiple linear regression formula
The formula for a multiple linear regression is

y = the predicted value

a = the y-intercept

b1= the regression coefficient of the first predictor variable x1

bn= the regression coefficient of the last predictor variable xn

a and b has to be chosen so as to minimize the sum of squared errors of prediction. So the prediction equation is:
Bivariate Linear Regression
EXAMPLE

X1 X2 Y
2 6 7
4 5 7
5 8 9
1 3 5
3 4 6
2 2 4
1 4 5
Y X1 X2 X1X2 X1Y X2Y X12 X22 - 1) / N
2 2 2
1 1
7 2 6 12 14 42 4 36
7 4 5 20 28 35 16 25 - 2) / N
2 2 2
2 2
9 5 8 40 45 72 25 64
5 1 3 3 5 15 1 9 1y 1Y - 1 /N]
6 3 4 12 18 24 9 16
4 2 2 4 8 8 4 4 2y 2Y - 2 /N]
5 1 4 4 5 20 1 16
1= 18 = 32 1x2 =12.7143 1y = 12.4286 2y = 19.4286 2=13.7143 2 = 23.7143
2 1 2

b1 = 47.71/163.57 = 0.2917

b2 = 108.43/163.57 = 0.66288

a = 6.14 - (0.292.57) - (0.664.57) = 2.36245

Y = 2.36245 + 0.2917 X1 + 0.66288 X2

REGRESSION TREE
Regression Tree
Decision trees which are built for a data set where the target column
could be real number are called regression trees
Decision rules will be found based on standard deviations
Day Outlook Temp. Humidity Wind No. of Golf Players

1 Sunny Hot High Weak 25

Standard deviation

2 Sunny Hot High Strong 30

Golf players
3 Overcast Hot High Weak 46
= {25, 30, 46, 45, 52, 23, 43, 35, 38, 46, 48, 52,
4 Rain Mild High Weak 45 44, 30}

5 Rain Cool Normal Weak 52

6 Rain Cool Normal Strong 23

Average of golf players
= (25 + 30 + 46 + 45 + 52 + 23 + 43 + 35 + 38 +
7 Overcast Cool Normal Strong 43 46 + 48 + 52 + 44 + 30 )/14
= 39.78
8 Sunny Mild High Weak 35

9 Sunny Cool Normal Weak 38 Standard deviation of golf players

= 39.78)2 + (30 39.78)2 + (46
10 Rain Mild Normal Weak 46 39.78) 2 39.78)2 )/14]
11 Sunny Mild Normal Strong 48
= 9.32

12 Overcast Mild High Strong 52

13 Overcast Hot Normal Weak 44

14 Rain Mild High Strong 30

Standard Deviation of golf players for Sunny
outlook Golf players for sunny outlook = {25,
30, 35, 38, 48}
Day Outlook Temp. Humidity Wind Golf Players

1 Sunny Hot High Weak 25

Average of golf players for sunny
2 Sunny Hot High Strong 30 outlook
= (25+30+35+38+48)/5
8 Sunny Mild High Weak 35 = 35.2
9 Sunny Cool Normal Weak 38

11 Sunny Mild Normal Strong 48 Standard deviation of golf players for

sunny outlook
= 35.2)2 + (30 35.2)2
= 7.78
Standard Deviation of golf players for
Overcast outlook
Golf players for overcast outlook
Day Outlook Temp. Humidity Wind Golf Players = {46, 43, 52, 44}

3 Overcast Hot High Weak 46

Average of golf players for overcast
7 Overcast Cool Normal Strong 43
outlook
12 Overcast Mild High Strong 52 = (46 + 43 + 52 + 44)/4
= 46.25
13 Overcast Hot Normal Weak 44

Standard deviation of golf players for

overcast outlook
= -46.25)2+(43-46.25)2
= 3.49
Standard Deviation of golf players for Rainy
outlook
Golf players for overcast outlook
Day Outlook Temp. Humidity Wind Golf Players = {45, 52, 23, 46, 30}

4 Rain Mild High Weak 45

Average of golf players for overcast
5 Rain Cool Normal Weak 52 outlook
6 Rain Cool Normal Strong 23 = (45+52+23+46+30)/5
= 39.2
10 Rain Mild Normal Weak 46

14 Rain Mild High Strong 30 Standard deviation of golf players for

rainy outlook
= 39.2)2+(52 39.2)2
=10.87
standard deviations for the outlook feature
Outlook Stdev of Golf Players Instances

Overcast 3.49 4

Rain 10.87 5

Sunny 7.78 5

Weighted standard deviation for outlook = (4/14)x3.49 + (5/14)x10.87 + (5/14)x7.78 = 7.66

Standard deviation reduction for outlook = 9.32 7.66 = 1.66

Standard deviation of golf players for hot
temperature
Day Outlook Temp. Humidity Wind Golf Players

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

3 Overcast Hot High Weak 46

13 Overcast Hot Normal Weak 44

Golf players for hot temperature = {25, 30, 46, 44}

Standard deviation of golf players for hot temperature = 8.95

Standard deviation of golf players for cool
temperature
Day Outlook Temp. Humidity Wind Golf Players

5 Rain Cool Normal Weak 52

6 Rain Cool Normal Strong 23

7 Overcast Cool Normal Strong 43

9 Sunny Cool Normal Weak 38

Golf players for cool temperature = {52, 23, 43, 38}

Standard deviation of golf players for cool temperature = 10.51

Standard deviation of golf players for mild
temperature
Day Outlook Temp. Humidity Wind Golf Players

4 Rain Mild High Weak 45

8 Sunny Mild High Weak 35

10 Rain Mild Normal Weak 46

11 Sunny Mild Normal Strong 48

12 Overcast Mild High Strong 52

14 Rain Mild High Strong 30

Golf players for mild temperature = {45, 35, 46, 48, 52, 30}

Standard deviation of golf players for mild temperature = 7.65

standard deviations for temperature feature
Temperature Stdev of Golf Players Instances

Hot 8.95 4

Cool 10.51 4

Mild 7.65 6

Weighted standard deviation for temperature = (4/14)x8.95 + (4/14)x10.51 + (6/14)x7.65 = 8.84

Standard deviation reduction for temperature = 9.32 8.84 = 0.47

Standard deviation for golf players for high
humidity
Day Outlook Temp. Humidity Wind Golf Players

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

3 Overcast Hot High Weak 46

4 Rain Mild High Weak 45

8 Sunny Mild High Weak 35

12 Overcast Mild High Strong 52

14 Rain Mild High Strong 30

Golf players for high humidity = {25, 30, 46, 45, 35, 52, 30}

Standard deviation for golf players for high humidity = 9.36

Standard deviation for golf players for normal
humidity
Day Outlook Temp. Humidity Wind Golf Players

5 Rain Cool Normal Weak 52

6 Rain Cool Normal Strong 23

7 Overcast Cool Normal Strong 43

9 Sunny Cool Normal Weak 38

10 Rain Mild Normal Weak 46

11 Sunny Mild Normal Strong 48

13 Overcast Hot Normal Weak 44

Golf players for normal humidity = {52, 23, 43, 38, 46, 48, 44}

Standard deviation for golf players for normal humidity = 8.73

Summarizing standard deviations for humidity
feature
Humidity Stdev of Golf Player Instances

High 9.36 7

Normal 8.73 7

Weighted standard deviation for humidity = (7/14)x9.36 + (7/14)x8.73 = 9.04

Standard deviation reduction for humidity = 9.32 9.04 = 0.27

Standard deviation for golf players for strong
wind
Day Outlook Temp. Humidity Wind Golf Players

2 Sunny Hot High Strong 30

6 Rain Cool Normal Strong 23

7 Overcast Cool Normal Strong 43

11 Sunny Mild Normal Strong 48

12 Overcast Mild High Strong 52

14 Rain Mild High Strong 30

Golf players for strong wind= {30, 23, 43, 48, 52, 30}

Standard deviation for golf players for strong wind = 10.59

Standard deviation for golf players for weak
wind 1 Sunny Hot High Weak 25

3 Overcast Hot High Weak 46

4 Rain Mild High Weak 45

5 Rain Cool Normal Weak 52

8 Sunny Mild High Weak 35

9 Sunny Cool Normal Weak 38

10 Rain Mild Normal Weak 46

13 Overcast Hot Normal Weak 44

Golf players for weakk wind= {25, 46, 45, 52, 35, 38, 46, 44}

Standard deviation for golf players for weak wind = 7.87

standard deviations for wind feature

Wind Stdev of Golf Player Instances

Strong 10.59 6

Weak 7.87 8

Weighted standard deviation for wind = (6/14)x10.59 + (8/14)x7.87 = 9.03

Standard deviation reduction for wind = 9.32 9.03 = 0.29

Feature Standard Deviation Reduction

Outlook 1.66

Temperature 0.47

Humidity 0.27

Wind 0.29
Standard deviation for sunny outlook
Day Outlook Temp. Humidity Wind Golf Players

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

8 Sunny Mild High Weak 35

9 Sunny Cool Normal Weak 38

11 Sunny Mild Normal Strong 48

Golf players for sunny outlook = {25, 30, 35, 38, 48}

Standard deviation for sunny outlook = 7.78

Standard deviation for sunny outlook and hot
temperature
Day Outlook Temp. Humidity Wind Golf Players

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

Standard deviation for sunny outlook and hot temperature = 2.5

Standard deviation for sunny outlook and cool
temperature

Day Outlook Temp. Humidity Wind Golf Players

9 Sunny Cool Normal Weak 38

Standard deviation for sunny outlook and cool temperature = 0

Standard deviation for sunny outlook and
mild temperature
Day Outlook Temp. Humidity Wind Golf Players

8 Sunny Mild High Weak 35

11 Sunny Mild Normal Strong 48

Standard deviation for sunny outlook and mild temperature = 6.5

standard deviations for temperature feature
when outlook is sunny
Temperature Stdev for Golf Players Instances

Hot 2.5 2

Cool 0 1

Mild 6.5 2

Weighted standard deviation for sunny outlook and temperature = (2/5)x2.5 + (1/5)x0 + (2/5)x6.5 = 3.6

Standard deviation reduction for sunny outlook and temperature = 7.78 3.6 = 4.18
Standard deviation for sunny outlook and
high humidity = 4.08

Day Outlook Temp. Humidity Wind Golf Players

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

8 Sunny Mild High Weak 35

Standard deviation for sunny outlook and
normal humidity = 5

Day Outlook Temp. Humidity Wind Golf Players

9 Sunny Cool Normal Weak 38

11 Sunny Mild Normal Strong 48

standard deviations for humidity feature when
outlook is sunny
Humidity Stdev for Golf Players Instances

High 4.08 3

Normal 5.00 2

Weighted standard deviations for sunny outlook and humidity = (3/5)x4.08 + (2/5)x5 = 4.45

Standard deviation reduction for sunny outlook and humidity = 7.78 4.45 = 3.33
Sunny outlook and Wind
Day Outlook Temp. Humidity Wind Golf Players Standard deviation for sunny
outlook and strong wind = 9
2 Sunny Hot High Strong 30

11 Sunny Mild Normal Strong 48

Day Outlook Temp. Humidity Wind Golf Players Standard deviation for
sunny outlook and weak
1 Sunny Hot High Weak 25 wind = 5.56
8 Sunny Mild High Weak 35

9 Sunny Cool Normal Weak 38

Weighted standard deviations for sunny

Wind Stdev for Golf Players Instances outlook and wind = (2/5)x9 + (3/5)x5.56 =
Strong 9 2 6.93
Standard deviation reduction for sunny
Weak 5.56 3 outlook and wind = 7.78 6.93 = 0.85
Feature Standard Deviation Reduction

Temperature 4.18

Humidity 3.33

Wind 0.85
Pruning
Cool branch has one instance in its sub data set.
We can say that if outlook is sunny and temperature is cool, then there would be 38 golf
players.
But what about hot branch? There are still 2 instances.
Should we add another branch for weak wind and strong wind? No, we should not.
Because this causes over-fitting.
We should terminate building branches
if there are less than five instances in the sub data set.
Or standard deviation of the sub data set can be less than 5% of the entire data set.
Here, terminate the branch if there are less than 5 instances in the current sub data set.
If this termination condition is satisfied, then calculate the average of the sub data set.
This operation is called as pruning in decision tree trees.
Overcast outlook
Overcast outlook branch has already 4 instances in the sub data set.
We can terminate building branches for this leaf.
Final decision will be average of the following table for overcast
outlook.
If outlook is overcast, then there would be (46+43+52+44)/4 = 46.25
golf players
Day Outlook Temp. Humidity Wind Golf Players

3 Overcast Hot High Weak 46

7 Overcast Cool Normal Strong 43

12 Overcast Mild High Strong 52

13 Overcast Hot Normal Weak 44

Rainy Outlook
Day Outlook Temp. Humidity Wind Golf Players

4 Rain Mild High Weak 45

5 Rain Cool Normal Weak 52

6 Rain Cool Normal Strong 23

10 Rain Mild Normal Weak 46

14 Rain Mild High Strong 30

Standard deviation for rainy outlook = 10.87

Rainy outlook and temperature
Weighted standard deviation for rainy outlook and
Temperature Standard deviation for golf players instances
temperature = (2/5)x14.50 + (3/5)x7.32 = 10.19
Cool 14.50 2
Standard deviation reduction for rainy outlook and
Mild 7.32 3 temperature = 10.87 10.19 = 0.67

Rainy outlook and humidity

Weighted standard deviation for rainy outlook and humidity =
Humidity Standard deviation for golf players instances
(2/5)x7.50 + (3/5)x12.50 = 10.50
High 7.50 2
Standard deviation reduction for rainy outlook and humidity =
Normal 12.50 3 10.87 10.50 = 0.37

Rainy outlook and wind

Wind Standard deviation for golf players instances Weighted standard deviation for rainy outlook and wind =
(3/5)x3.09 + (2/5)x3.5 = 3.25
Weak 3.09 3
Standard deviation reduction for rainy outlook and wind =
Strong 3.5 2
10.87 3.25 = 7.62
Feature Standard deviation reduction

Temperature 0.67

Humidity 0.37

Wind 7.62
Decision trees are powerful way to classify problems
They can be adapted into regression problems
Regression trees tend to over-fit much more than classification trees
Termination rule should be tuned carefully to avoid over-fitting

Overview of Machine Learning Algorithms
No ratings yet
Overview of Machine Learning Algorithms
36 pages
Types of Machine Learning Explained
No ratings yet
Types of Machine Learning Explained
101 pages
Classification and Prediction Methods
No ratings yet
Classification and Prediction Methods
64 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
32 pages
Intro to Machine Learning for Data Science
No ratings yet
Intro to Machine Learning for Data Science
37 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
102 pages
Statistical Algorithms for Classification
No ratings yet
Statistical Algorithms for Classification
55 pages
Data Mining Classification Techniques
No ratings yet
Data Mining Classification Techniques
61 pages
Machine Learning in EDA Tools
No ratings yet
Machine Learning in EDA Tools
150 pages
Decision Tree 1
No ratings yet
Decision Tree 1
106 pages
Classification Techniques in Machine Learning
No ratings yet
Classification Techniques in Machine Learning
110 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
37 pages
Classification and Prediction in Data Mining
No ratings yet
Classification and Prediction in Data Mining
30 pages
Decision Tree Classification Techniques
No ratings yet
Decision Tree Classification Techniques
41 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
33 pages
Understanding Classification Methods
No ratings yet
Understanding Classification Methods
104 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
47 pages
Understanding Ensemble Classifiers
No ratings yet
Understanding Ensemble Classifiers
43 pages
Classification vs. Prediction in DWM
No ratings yet
Classification vs. Prediction in DWM
12 pages
Machine Learning Fundamentals and Techniques
No ratings yet
Machine Learning Fundamentals and Techniques
6 pages
Decision Trees and Ensemble Learning Techniques
No ratings yet
Decision Trees and Ensemble Learning Techniques
32 pages
Ensemble Learning & Unsupervised Techniques
No ratings yet
Ensemble Learning & Unsupervised Techniques
24 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Big Data 5
No ratings yet
Big Data 5
66 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
19 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
Overview of Machine Learning Types
No ratings yet
Overview of Machine Learning Types
35 pages
Unit III
No ratings yet
Unit III
12 pages
K-Nearest Neighbors and Classification Techniques
No ratings yet
K-Nearest Neighbors and Classification Techniques
10 pages
Overview of Classification Techniques
No ratings yet
Overview of Classification Techniques
59 pages
Decision Tree and KNN Classifiers Explained
No ratings yet
Decision Tree and KNN Classifiers Explained
142 pages
ML Master Notes-2
No ratings yet
ML Master Notes-2
24 pages
Validaciones - Bosstrap
No ratings yet
Validaciones - Bosstrap
50 pages
Classification Rule Mining Overview
No ratings yet
Classification Rule Mining Overview
31 pages
Understanding Data Classification Methods
No ratings yet
Understanding Data Classification Methods
31 pages
CART Decision Trees and Greedy Approach
No ratings yet
CART Decision Trees and Greedy Approach
50 pages
Classification and Prediction Methods
No ratings yet
Classification and Prediction Methods
22 pages
ML Mod 3 - Chap 2
No ratings yet
ML Mod 3 - Chap 2
19 pages
Supervised Machine Learning Overview
No ratings yet
Supervised Machine Learning Overview
77 pages
Learning Agents and Machine Learning Methods
No ratings yet
Learning Agents and Machine Learning Methods
30 pages
Best Algorithms for Prediction in ML
No ratings yet
Best Algorithms for Prediction in ML
30 pages
BDA Unit 4
No ratings yet
BDA Unit 4
15 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
ES335 Machine Learning Course Notes
No ratings yet
ES335 Machine Learning Course Notes
22 pages
Understanding Pattern Recognition and Classification
No ratings yet
Understanding Pattern Recognition and Classification
5 pages
KNN and Regression Techniques Explained
No ratings yet
KNN and Regression Techniques Explained
80 pages
Data Classification in Statistics Overview
No ratings yet
Data Classification in Statistics Overview
81 pages
Supervised Learning: Classification & Regression
No ratings yet
Supervised Learning: Classification & Regression
187 pages
Machine Learning Concepts and Applications
No ratings yet
Machine Learning Concepts and Applications
61 pages
Classification Rule Mining Overview
No ratings yet
Classification Rule Mining Overview
25 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
19 pages
Supervised Learning: Algorithms Explained
No ratings yet
Supervised Learning: Algorithms Explained
15 pages
Supervised Learning: Regression & Classification
No ratings yet
Supervised Learning: Regression & Classification
73 pages
Association Rule Learning and Algorithms
No ratings yet
Association Rule Learning and Algorithms
14 pages
Additive Models and Decision Trees Explained
No ratings yet
Additive Models and Decision Trees Explained
31 pages
Classification Algorithms Overview
No ratings yet
Classification Algorithms Overview
54 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
62 pages
Boosting Algorithms in Machine Learning
No ratings yet
Boosting Algorithms in Machine Learning
6 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
20 pages
Employee Attrition Analysis of Data Driven Models
No ratings yet
Employee Attrition Analysis of Data Driven Models
10 pages
50 AI/ML Interview MCQs with Answers
No ratings yet
50 AI/ML Interview MCQs with Answers
8 pages
Electricity Demand Forecasting Algorithms
No ratings yet
Electricity Demand Forecasting Algorithms
7 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
9 pages
Food Waste Fertilizer Effectiveness Analysis
No ratings yet
Food Waste Fertilizer Effectiveness Analysis
18 pages
Enhanced Fault Diagnosis of Wind Energy Conversion Systems Using Ensemble Learning Based On Sine Cosine Algorithm
No ratings yet
Enhanced Fault Diagnosis of Wind Energy Conversion Systems Using Ensemble Learning Based On Sine Cosine Algorithm
16 pages
Mid-1 MCQ Answers on Machine Learning
No ratings yet
Mid-1 MCQ Answers on Machine Learning
5 pages
Early Diabetes Detection with Ensemble Learning
No ratings yet
Early Diabetes Detection with Ensemble Learning
6 pages
BMS College Admission Scam Insights
No ratings yet
BMS College Admission Scam Insights
9 pages
AI Learning Algorithms Overview
No ratings yet
AI Learning Algorithms Overview
58 pages
AlbNews: Albanian News Corpus for NLP
No ratings yet
AlbNews: Albanian News Corpus for NLP
5 pages
Evaluating Machine Learning Models
No ratings yet
Evaluating Machine Learning Models
8 pages
Ensemble Methods: Bagging & Boosting Techniques
100% (1)
Ensemble Methods: Bagging & Boosting Techniques
48 pages
XGBoost Tutorial for Drug Development
No ratings yet
XGBoost Tutorial for Drug Development
14 pages
Supervised vs Unsupervised Learning Guide
No ratings yet
Supervised vs Unsupervised Learning Guide
52 pages
Imbalanced Classification Solutions
No ratings yet
Imbalanced Classification Solutions
32 pages
Bayesian Networks and Cognitive Computing Insights
No ratings yet
Bayesian Networks and Cognitive Computing Insights
10 pages
Machine Learning for Forest Fire Prediction
No ratings yet
Machine Learning for Forest Fire Prediction
28 pages
Detailed Online Payment Fraud Detection Report
No ratings yet
Detailed Online Payment Fraud Detection Report
12 pages
A Streaming Ensemble Algorithm (SEA) For Large-Scale Classification
No ratings yet
A Streaming Ensemble Algorithm (SEA) For Large-Scale Classification
6 pages
IoT Energy Consumption via Machine Learning
No ratings yet
IoT Energy Consumption via Machine Learning
23 pages
Santander Transaction Prediction Analysis
No ratings yet
Santander Transaction Prediction Analysis
4 pages
XGBoost Algorithm: Speed and Accuracy Analysis
No ratings yet
XGBoost Algorithm: Speed and Accuracy Analysis
12 pages
AI-Driven Demand Forecasting Insights
No ratings yet
AI-Driven Demand Forecasting Insights
12 pages
Machine Learning for Time Series Anomaly Detection
No ratings yet
Machine Learning for Time Series Anomaly Detection
72 pages
Machine Learning for Crop Yield Prediction
No ratings yet
Machine Learning for Crop Yield Prediction
13 pages
Machine Learning for Concrete Strength
No ratings yet
Machine Learning for Concrete Strength
17 pages
AI & ML Concepts: A Complete Guide
No ratings yet
AI & ML Concepts: A Complete Guide
32 pages

Classification Techniques

Uploaded by

Classification Techniques

Uploaded by

Classification Techniques

Let the learning rate be 0.9.

first training tuple is X = (1, 0, 1), whose class label is 1.

Attribute 1 Attribute 2 Class

• Supervised learning technique

10 0.6772 20 60 -0.0248 Y = -0.01404 * X + 0.8176

8 6 -1 0.1 -1 0.065 0.071

8 2 -1 0.1 -1 0.065 0.071

2 3 1 0.071 -1 1 0.071 0.137 0.167 2 3 1 0.167 1 0 0.000 0.114 0.122

2 2 1 0.071 -1 1 0.071 0.137 0.167 2 2 1 0.167 1 0 0.000 0.114 0.122

4 6 1 0.167 1 0 0.000 0.087 0.106 4 6 1 0.106 -1 1 0.106 0.155 0.167

4 3 -1 0.071 -1 0 0.000 0.037 0.045 4 3 -1 0.045 -1 0 0.000 0.031 0.033

4 1 -1 0.071 -1 0 0.000 0.037 0.045 4 1 -1 0.045 -1 0 0.000 0.031 0.033

5 7 1 0.167 1 0 0.000 0.087 0.106 5 7 1 0.106 -1 1 0.106 0.155 0.167

5 3 -1 0.071 -1 0 0.000 0.037 0.045 5 3 -1 0.045 -1 0 0.000 0.031 0.033

6 5 1 0.167 1 0 0.000 0.087 0.106 6 5 1 0.106 -1 1 0.106 0.155 0.167

8 6 -1 0.071 1 1 0.071 0.137 0.167 8 6 -1 0.167 -1 0 0.000 0.114 0.122

8 2 -1 0.071 -1 0 0.000 0.037 0.045 8 2 -1 0.045 -1 0 0.000 0.031 0.033

0.42 0.65 0.38 1.1

2 3 1 0.122 1 0 0.000 0.041 0.068 -1 -1 -1 1

4 3 -1 0.033 1 1 0.033 0.100 0.167 -1 1 -1 -1

pos is the number of positive (“cancer”) tuples

y = the predicted value

b1= the regression coefficient of the first predictor variable x1

bn= the regression coefficient of the last predictor variable xn

a = 6.14 - (0.29*2.57) - (0.66*4.57) = 2.36245

Y = 2.36245 + 0.2917 X1 + 0.66288 X2

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

5 Rain Cool Normal Weak 52

6 Rain Cool Normal Strong 23

9 Sunny Cool Normal Weak 38 Standard deviation of golf players

12 Overcast Mild High Strong 52

13 Overcast Hot Normal Weak 44

14 Rain Mild High Strong 30

1 Sunny Hot High Weak 25

11 Sunny Mild Normal Strong 48 Standard deviation of golf players for

3 Overcast Hot High Weak 46

Standard deviation of golf players for

4 Rain Mild High Weak 45

14 Rain Mild High Strong 30 Standard deviation of golf players for

Weighted standard deviation for outlook = (4/14)x3.49 + (5/14)x10.87 + (5/14)x7.78 = 7.66

Standard deviation reduction for outlook = 9.32 7.66 = 1.66

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

3 Overcast Hot High Weak 46

13 Overcast Hot Normal Weak 44

Golf players for hot temperature = {25, 30, 46, 44}

Standard deviation of golf players for hot temperature = 8.95

5 Rain Cool Normal Weak 52

6 Rain Cool Normal Strong 23

7 Overcast Cool Normal Strong 43

9 Sunny Cool Normal Weak 38

Golf players for cool temperature = {52, 23, 43, 38}

Standard deviation of golf players for cool temperature = 10.51

4 Rain Mild High Weak 45

8 Sunny Mild High Weak 35

10 Rain Mild Normal Weak 46

11 Sunny Mild Normal Strong 48

12 Overcast Mild High Strong 52

14 Rain Mild High Strong 30

Standard deviation of golf players for mild temperature = 7.65

Weighted standard deviation for temperature = (4/14)x8.95 + (4/14)x10.51 + (6/14)x7.65 = 8.84

Standard deviation reduction for temperature = 9.32 8.84 = 0.47

1 Sunny Hot High Weak 25

2 Sunny Hot High Strong 30

3 Overcast Hot High Weak 46

4 Rain Mild High Weak 45

8 Sunny Mild High Weak 35

12 Overcast Mild High Strong 52

14 Rain Mild High Strong 30

Standard deviation for golf players for high humidity = 9.36

5 Rain Cool Normal Weak 52

6 Rain Cool Normal Strong 23

7 Overcast Cool Normal Strong 43

9 Sunny Cool Normal Weak 38

a = 6.14 - (0.292.57) - (0.664.57) = 2.36245