Classification and
Naïve Bayes Classifier
Classification vs. Prediction
• Classification
• predicts categorical class labels (discrete or nominal)
• classifies data (constructs a model) based on the training
set and the values (class labels) in a classifying attribute
and uses it in classifying new data
• Prediction
• models continuous-valued functions, i.e., predicts
unknown or missing values
• Typical applications
• Credit approval
• Target marketing
• Medical diagnosis
• Fraud detection
Classification: Definition
• Given a collection of records (training set )
• Each record contains a set of attributes, one of the attributes is the class.
• Find a model for class attribute as a function of the values of other
attributes.
• Goal: previously unseen records should be assigned a class as
accurately as possible.
• A test set is used to determine the accuracy of the model. Usually, the given data set is
divided into training and test sets, with training set used to build the model and test set used
to validate it.
Classification—A Two-Step Process
• Model construction: describing a set of
predetermined classes
• Each tuple/sample is assumed to belong to a
predefined class, as determined by the class
label attribute
• The set of tuples used for model
construction is training set
• The model is represented as classification
rules, decision trees, or mathematical
formulae
Classification—A Two-Step Process
• Model usage: for classifying future or unknown
objects
• Estimate accuracy of the model
• The known label of test sample is compared
with the classified result from the model
• Accuracy rate is the percentage of test set
samples that are correctly classified by the
model
• Test set is independent of training set,
otherwise over-fitting will occur
• If the accuracy is acceptable, use the model to classify data
tuples whose class labels are not known
Classification Process (1): Model Construction
Classification
Algorithms
Training
Data
NAME RANK YEARS TENURED Classifier
Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
Classification Process (2): Use the Model in Prediction
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
The Learning Process in spam mail
Example
Model Learning Model
Testin
g
● Number of recipients
● Size of message
● Number of attachments
● Number of "re's" in the
subject line
Email Server …
Classificati
An Example on
• A fish-packing plant wants to automate the
process of sorting incoming fish according to
species
• As a pilot project, it is decided to try to separate
sea bass from salmon using optical sensing
Classificati
An Example (continued) on
• Features/attributes:
Length
Lightness
Width
Position of mouth
•
Classificati
An Example (continued) on
Preprocessing: Images of different
fishes are isolated from one
another and from background;
Feature extraction: The information
of a single fish is then sent to a
feature extractor, that measure
certain “features” or “properties”;
Classification: The values of these
features are passed to a classifier
that evaluates the evidence
presented, and build a model to
discriminate between the two
species
Classificati
An Example (continued) on
Domain knowledge:
◦ A sea bass is generally longer than a salmon
Related feature: (or attribute)
◦ Length
Training the classifier:
◦ Some examples are provided to the classifier in this
form: <fish_length, fish_name>
◦ These examples are called training examples
◦ The classifier learns itself from the training examples,
how to distinguish Salmon from Bass based on the
fish_length
Classificati
An Example (continued) on
• Classification model (hypothesis):
◦ The classifier generates a model from the training data to
classify future examples (test examples)
◦ An example of the model is a rule like this:
◦ If Length >= l* then sea bass otherwise salmon
◦ Here the value of l* determined by the classifier
Testing the model
◦ Once we get a model out of the classifier, we may use the
classifier to test future examples
◦ The test data is provided in the form <fish_length>
◦ The classifier outputs <fish_type> by checking fish_length
against the model
Classificati
An Example (continued) on
Test/Unlabeled
• So the overall Training Data
Data
classification process
Preprocessing Preprocessing
goes like this , and feature , and feature
extraction extraction
Feature vector Feature vector
Testing against
Training model/
Classification
Model Prediction/
Evaluation
Classificati
An Example (continued)
on
Pre- 12, salmon
Training If len > 12,
processing, 15, sea bass then sea bass
Feature 8, salmon else salmon
extraction 5, sea bass
Training data Model
Feature vector
Labeled data
Pre- sea bass (error!)
15, salmon
processing, Test/ salmon (correct)
10, salmon
Feature Classify sea bass
18, ?
extraction salmon
8, ?
Test data Feature vector Evaluation/Prediction
Unlabeled data
Classificati
An Example (continued) on
• Why error?
Insufficient training data
Too few features
Too many/irrelevant features
Overfitting / specialization
Classificati
An Example (continued)
on
Pre- If ltns > 6 or
12, 4, salmon
processing, Training len*5+ltns*2>100
15, 8, sea bass
Feature then sea bass else
8, 2, salmon
extraction salmon
5, 10, sea bass
Training data Model
Feature vector
Pre- salmon (correct)
15, 2, salmon
processing, Test/ salmon (correct)
10, 7, salmon
Feature Classify sea bass
18, 7, ?
extraction salmon
8, 5, ?
Test data Feature vector Evaluation/Prediction
Linear, Non-linear, Multi-class
and
Multi-label classification
Linear Classification
• A linear classifier achieves this by making
a classification decision based on the value of
a linear combination of the characteristics.
• A classification algorithm (Classifier) that makes its
classification based on a linear predictor function
combining a set of weights with the feature vector
• Decision boundaries is flat
• Line, plane, ….
• May involve non-linear operations
Linear Classifiers
Email Length
New Recipients
Linear Classifiers
Email Length
Any of these would
be fine..
..but which is best?
New Recipients
No Linear Classifier can cover all instances
Email Length
How would you
classify this data?
New Recipients
• Ideally, the best decision boundary should
be the one which provides an optimal
performance such as in the following
figure
No Linear Classifier can cover all
instances
Email Length
New Recipients
What is multiclass
• Output
• In some cases, output space can be very large (i.e., K
is very large)
• Each input belongs to exactly one class
(c.f. in multilabel, input belongs to many classes)
26
Multi-Classes Classification
• Multi-class classification is simply
classifying objects into any one of multiple
categories. Such as classifying just into
either a dog or cat from the dataset.
• 1. When there are more than two
categories in which the images can be
classified, and
• 2. An image does not belong to more than
one class
•
• If both of the above conditions are
satisfied, it is referred to as a multi-class
image classification problem
Multi-label classification
• When we can classify an image
into more than one class (as in the
image beside), it is known as a • .
multi-label image classification
problem.
• Multi-label classification is a type
of classification in which an object
can be categorized into more than
one class.
• For example, In the image dataset,
we will classify a picture as These are all labels of the given images. Each
the image of a dog or cat and image here belongs to more than one class
also classify the same image based and hence it is a multi-label image
on the breed of the dog or cat classification problem.
Binary Vs Multi-class
Multi class Vs multi label classification
Naïve Bayes Classifier
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the
observed event B.
P(B|A) is Likelihood probability: Probability of the evidence
given that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before
observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
• To find the normalized probability,
• So, the given new instance falls under the classified target as “No”.