0% found this document useful (0 votes)
14 views25 pages

Data Mining: Classification & Prediction

The document provides an introduction to classification and prediction in data mining, detailing various methods such as decision tree induction, Bayesian classification, and backpropagation. It discusses the processes involved in model construction and usage, as well as the differences between supervised and unsupervised learning. Additionally, it covers issues related to data preparation, evaluation of classification methods, and the fundamentals of prediction through regression analysis.

Uploaded by

shambelworku8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views25 pages

Data Mining: Classification & Prediction

The document provides an introduction to classification and prediction in data mining, detailing various methods such as decision tree induction, Bayesian classification, and backpropagation. It discusses the processes involved in model construction and usage, as well as the differences between supervised and unsupervised learning. Additionally, it covers issues related to data preparation, evaluation of classification methods, and the fundamentals of prediction through regression analysis.

Uploaded by

shambelworku8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Debre Tabor University

Gafat Institute of Technology


Department of Computer Science

Introduction to Data Mining & Warehousing


For 4th Year IT Computer Science students
Instructors: Habtu Hailu (PhD)

November, 24
Chapter 04
Classification and Prediction

 What is classification? What is prediction?


 Issues regarding classification and prediction
 Classification by decision tree induction
 Bayesian Classification
 Classification by Backpropagation
 Prediction
 Classification accuracy
 Summary
Classification vs. Prediction
 Classification
 predicts categorical class labels (discrete or nominal)

 classifies data (constructs a model) based on the training

set and the values (class labels) in a classifying attribute


and uses it in classifying new data
 Prediction
 models continuous-valued functions, i.e., predicts

unknown or missing values


 Typical applications
 Credit approval

 Target marketing

 Medical diagnosis

 Fraud detection

3
Classification—A Two-Step
Process
 Model construction: describing a set of predetermined classes
 Each tuple/sample is assumed to belong to a predefined

class, as determined by the class label attribute


 The set of tuples used for model construction is training

set
 The model is represented as classification rules, decision

trees, or mathematical formulae


 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model


The known label of test sample is compared with the
classified result from the model

Accuracy rate is the percentage of test set samples
that are correctly classified by the model

Test set is independent of training set, otherwise over-
fitting will occur
 If the accuracy is acceptable, use the model to classify

data tuples whose class labels are not known 4


Process (1): Model
Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
5
Process (2): Using the Model in
Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
6
Supervised vs. Unsupervised
Learning

 Supervised learning (classification)


 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc.
with the aim of establishing the existence of
classes or clusters in the data

7
Issues: Data Preparation
 Data cleaning
 Preprocess data in order to reduce noise and
handle missing values
 Relevance analysis (feature selection)
 Remove the irrelevant or redundant attributes
 Data transformation
 Generalize and/or normalize data

8
Issues: Evaluating Classification
Methods
 Accuracy
 classifier accuracy: predicting class label

 predictor accuracy: guessing value of predicted attributes

 Speed
 time to construct the model (training time)

 time to use the model (classification/prediction time)

 Robustness: handling noise and missing values


 Scalability: efficiency in disk-resident databases
 Interpretability
 understanding and insight provided by the model

 Other measures, e.g., goodness of rules, such as decision tree


size or compactness of classification rules

9
Classification by Decision Tree
Induction
 Decision tree
 A flow-chart-like tree structure

 Internal node denotes a test on an attribute

 Branch represents an outcome of the test

 Leaf nodes represent class labels or class distribution

 Decision tree generation consists of two phases


 Tree construction


At start, all the training examples are at the root

Partition examples recursively based on selected
attributes
 Tree pruning


Identify and remove branches that reflect noise or
outliers
 Use of decision tree: Classifying an unknown sample
 Test the attribute values of the sample against the

decision tree
Decision Tree Induction: Training
Dataset

age income student credit_rating buys_computer


<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

This follows an example of Quinlan’s ID3 (Playing Tennis) 11


Output: A Decision Tree for
“buys_computer”

age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes

12
Algorithm for Decision Tree
Induction
 Basic algorithm (a greedy algorithm)
 Tree is constructed in a top-down recursive divide-and-

conquer manner
 At start, all the training examples are at the root

 Attributes are categorical (if continuous-valued, they are

discretized in advance)
 Examples are partitioned recursively based on selected

attributes
 Test attributes are selected on the basis of a heuristic or

statistical measure (e.g., information gain)


 Conditions for stopping partitioning
 All samples for a given node belong to the same class

 There are no remaining attributes for further partitioning

– majority voting is employed for classifying the leaf


 There are no samples left
13
Attribute Selection Measure:
Information Gain (ID3/C4.5)
 Select the attribute with the highest information
gain
 Let pi be the probability that an arbitrary tuple in D
belongs to class Ci, estimated by |Ci, D|/|D|
m
 Expected information (entropy)
Info ( D) needed to classify
  pi log 2 ( pi )
a tuple in D: i 1

 Information needed (after using A to | D j | D into v


split
v
Info A ( D)  I ( D j )
partitions) to classify D: j 1 | D |

 Information gained by branching on attribute A


Gain(A) Info(D)  Info A(D)
14
Attribute Selection: Information Gain
g Class P: buys_computer = 5 4
Info age ( D )  I (2,3)  I (4,0)
“yes” 14 14
g Class N: buys_computer = 5
9 9 5 5
Info ( D)“no”
I (9,5)  log 2 ( )  log 2 ( ) 0.940  I (3,2) 0.694
14 14 14 14 14
age pi ni I(p i, n i) 5
I (2,3) means “age <=30”
<=30 2 3 0.971 14
has 5 out of 14 samples,
31…40 4 0 0
with 2 yes’es and 3 no’s.
>40 3 2 0.971
age income student credit_rating buys_computer GainHence
(age) Info ( D )  Info age ( D ) 0.246
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes Similarly,
>40
31…40
low
low
yes
yes
excellent
excellent
no
yes
Gain(income) 0.029
<=30 medium no fair no
<=30 low yes fair yes Gain( student ) 0.151
>40 medium yes fair yes
<=30
31…40
medium
medium
yes
no
excellent
excellent
yes
yes
Gain(credit _ rating ) 0.048
31…40 high yes fair yes
>40 medium no excellent no 15
Classification by
Backpropagation

 Backpropagation: A neural network learning


algorithm
 Started by psychologists and neurobiologists to
develop and test computational analogues of
neurons
 A neural network: A set of connected input/output
units where each connection has a weight
associated with it
 During the learning phase, the network learns
by adjusting the weights so as to be able to
predict the correct class label of the input tuples
16
Neural Network as a
Classifier
 Weakness
 Long training time
 Require a number of parameters typically best determined
empirically, e.g., the network topology or ``structure."
 Poor interpretability: Difficult to interpret the symbolic
meaning behind the learned weights and of ``hidden
units" in the network
 Strength
 High tolerance to noisy data
 Ability to classify untrained patterns
 Well-suited for continuous-valued inputs and outputs
 Successful on a wide array of real-world data
 Algorithms are inherently parallel
 Techniques have recently been developed for the
extraction of rules from trained neural networks
17
A Neuron (= a perceptron)

- mk
x0 w0
x1 w1
å f
output y
xn wn
For Example
n
Input weight weighted Activation y sign(  wi xi   k )
vector x vector w sum function i 0

 The n-dimensional input vector x is mapped into variable y


by means of the scalar product and a nonlinear function
mapping
18
A Multi-Layer Feed-Forward Neural
Network

Output vector

Err j O j (1  O j ) Errk w jk
Output layer k

 j  j  (l) Err j
wij wij  (l ) Err j Oi
Hidden layer Err j O j (1  O j )(T j  O j )
wij 1
Oj   Ij
1 e
Input layer
I j  wij Oi   j
i
Input vector: X
19
How A Multi-Layer Neural Network
Works?
 The inputs to the network correspond to the attributes
measured for each training tuple
 Inputs are fed simultaneously into the units making up the
input layer
 They are then weighted and fed simultaneously to a
hidden layer
 The number of hidden layers is arbitrary, although usually
only one
 The weighted outputs of the last hidden layer are input to
units making up the output layer, which emits the
network's prediction
 The network is feed-forward in that none of the weights
cycles back to an input unit or to an output unit of a
previous layer 20
Defining a Network Topology
 First decide the network topology: # of units in
the input layer, # of hidden layers (if > 1), # of
units in each hidden layer, and # of units in the
output layer
 Normalizing the input values for each attribute
measured in the training tuples to [0.0—1.0]
 One input unit per domain value, each initialized to
0
 Output, if for classification and more than two
classes, one output unit per class is used
 Once a network has been trained and its accuracy
is unacceptable, repeat the training process with
a different network topology or a different set of 21
What Is Prediction?
 (Numerical) prediction is similar to classification
 construct a model

 use model to predict continuous or ordered value for a

given input
 Prediction is different from classification
 Classification refers to predict categorical class label

 Prediction models continuous-valued functions

 Major method for prediction: regression


 model the relationship between one or more independent

or predictor variables and a dependent or response


variable
 Regression analysis
 Linear and multiple regression

 Non-linear regression

 Other regression methods: generalized linear model,

Poisson regression, log-linear models, regression trees 22


Linear Regression
 Linear regression: involves a response variable y and a single
predictor variable x
y = w0 + w1 x
where w0 (y-intercept) and w1 (slope) are regression
coefficients
 Multiple linear regression: involves more than one predictor
variable
 Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)
 Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2
 Solvable by extension of least square method or using
SAS, S-Plus
 Many nonlinear functions can be transformed into the
above

23
Nonlinear Regression
 Some nonlinear models can be modeled by a
polynomial function
 A polynomial regression model can be transformed
into linear regression model. For example,
y = w0 + w1 x + w2 x2 + w3 x3
convertible to linear with new variables: x2 = x2, x3=
x3
y = w0 + w1 x + w2 x2 + w3 x3
 Other functions, such as power function, can also be
transformed to linear model

24
Classifier Accuracy C1 C2

Measures C1 True positive False


negative
C2 False True negative
classes buy_computer = buy_computer =positive
total recognition(%
yes no )
buy_computer = 6954 46 7000 99.34
yes
buy_computer = 412 2588 3000 86.27
 Accuracy
no of a classifier M, acc(M): percentage of test set tuples
that total
are correctly classified
7366 by the model
2634 M 1000 95.52

Error rate (misclassification rate) of M = 1 – acc(M)
0

Given m classes, CMi,j, an entry in a confusion matrix,
indicates # of tuples in class i that are labeled by the
classifier as class j
 Alternative accuracy measures (e.g., for cancer diagnosis)
sensitivity = t-pos/pos /* true positive recognition rate */
specificity = t-neg/neg /* true negative recognition rate
*/
precision = t-pos/(t-pos + f-pos)
accuracy = sensitivity * pos/(pos + neg) + specificity *
neg/(pos + neg)

This model can also be used for cost-benefit analysis
25

You might also like