W8-Supervised Learning Methods

The document discusses artificial intelligence and predictive modeling techniques. It covers Bayesian inference including naive Bayes classification. It explains that naive Bayes classification is a simple probabilistic classifier based on Bayes' theorem. The document provides an example of how to classify a new data point using a naive Bayes classifier trained on a sample dataset. It also covers predictive regression techniques, specifically linear regression for predicting continuous variable values. Linear regression finds the line of best fit to model the relationship between variables.

Uploaded by

abbiha.mustafamalik

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

W8-Supervised Learning Methods

Uploaded by

abbiha.mustafamalik

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Artificial Intelligence

RSCI
Dr. Ayesha Kashif
• Bayesian Inference
– Naïve Bays Classifier
• Predictive Regression
– Linear Regression
– Logistic Regression
Bayesian Classification: Why?
– A statistical classifier:
• performs probabilistic prediction, i.e., predicts class membership
probabilities
– Foundation:
• Based on Bayes’ Theorem.
– Performance:
• A simple Bayesian classifier, naïve Bayesian classifier, has
comparable performance with decision tree and selected neural
network classifiers
– Incremental:
• Each training example can incrementally increase/decrease the
probability that a hypothesis is correct
• prior knowledge can be combined with observed data
Bayes’ Theorem: Basics
– Bayes’ Theorem:
P( H | X) = P(X | H ) P( H ) = P(X | H ) P( H ) / P(X)
P(X)
• Let X be a data sample (“evidence”): class
label is unknown
• Let H be a hypothesis that X belongs to class C
• Classification is to determine P(H|X), (i.e.,
posteriori probability): the probability that
the hypothesis holds given the observed data
sample X
• P(H) (prior probability): the initial probability
– E.g., X will buy computer, regardless of age,
income, …
• P(X): probability that sample data is observed
• P(X|H) (likelihood): the probability of
observing the sample X, given that the
hypothesis holds
– E.g., Given that X will buy computer, the prob.
that X is 31..40, medium income
Prediction Based on Bayes’ Theorem
• Given training data X, posteriori probability of a hypothesis H,
P(H|X), follows the Bayes’ theorem

P( H | X) = P(X | H ) P(H ) = P(X | H ) P( H ) / P(X)

P(X)

P(X | C )P(C )
P(C | X) = i i
i P(X)
• Informally, this can be viewed as
posteriori = likelihood x prior/evidence
• Predicts X belongs to Ci iff the probability P(Ci|X) is the highest
among all the P(Ck|X) for all the k classes
• Practical difficulty: It requires initial knowledge of many
probabilities, involving significant computational cost
Classification Is to Derive the
Maximum Posteriori
• Let D be a training set of tuples and their associated class labels, and
each tuple is represented by an n-D attribute vector X = (x1, x2, …,
xn)
• Suppose there are m classes C1, C2, …, Cm.
• Classification is to derive the maximum posteriori, i.e., the maximal
P(Ci|X)
• This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X) = i i
i P(X)

• Since P(X) is constant for all classes, only

• This greatly reduces the computation cost: Only counts the class
distribution
• If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk
for Ak divided by |Ci, D| (# of tuples of Ci in D)
• If Ak is continous-valued, P(xk|Ci) is usually computed based on
Gaussian distribution with a mean μ and standard deviation σ
( x− )2
1 −
and P(xk|Ci) is g ( x,  ,  ) = e 2 2

2 

P ( X | C i ) = g ( xk ,  Ci ,  Ci )
Naïve Bayes Classifier: Training Dataset
Class:
C1:buys_computer = ‘yes’ P(C | X) = P(X | C )P(C )
i i i
C2:buys_computer = ‘no’
age income student credit_rating buys_computer
<=30 high no fair no
Data to be classified: <=30 high no excellent no
X = (age <=30, 31…40 high no fair yes
>40 medium no fair yes
Income = medium, >40 low yes fair yes
Student = yes >40 low yes excellent no
31…40 low yes excellent yes
Credit_rating = Fair) <=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Naïve Bayes Classifier: An Example
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
• Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
• P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
Naïve Bayes Classifier: An Example
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
• Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
• P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
Exercise

• Given the table above, predict classification of the new

sample X = {1, 2, 2, class = ?}.
Predictive Regression
Linear Regression
• The prediction of continuous values can be modeled by a statistical
technique called regression .
• regression analysis is the process of determining how a variable Y is
related to one or more other variables x1 , x 2 , . . . , x n .
• The relationship that fits a set of data is characterized by a prediction
model called a regression equation .

• Common reasons for performing regression analysis

include
1. the output is expensive to measure but the inputs are not, and so a
cheap prediction of the output is sought;
2. the values of the inputs are known before the output is known, and a
working prediction of the output is required;
3. controlling the input values, we can predict the behavior of
corresponding outputs; and
4. there might be a causal link between some of the inputs and the
output, and we want to identify the links.
Regression And Model Building
• The engineer visits 25 randomly chosen retail outlets having
vending machines, and the in-outlet delivery time (in minutes)
and the volume of product delivered (in cases) are observed for
each.
• This graph is called a scatter diagram. This display clearly
suggests a relationship between delivery time and delivery
volume
Regression And Model Building
• Correlation coefficients measure the strength and sign of a
relationship, but not the slope.
• There are several ways to estimate the slope; the most
common is a linear least squares fit.
• A “linear fit” is a line intended to model the relationship
between variables.
• A “least squares” fit is one that minimizes the mean
squared error (MSE) between the line and the data.
Linear Regression
• Equation of straight line:
• Y= mX + b
• Y = b + mX
• Where Y represents the dependent variable
• X represents the independent variable
• ‘b’ represents the Y-intercept (i.e. the value of Y when X is equal to zero)
• ‘m’ represents the slope of the line
(i.e. the value of the tan Θ, where Θ represents the angle between the line and the
horizontal axis)
Linear Regression
• Linear regression with one input variable is the
simplest form of regression. It models a random
variable Y (called a response variable) as a linear
function of another random variable X (called a
predictor variable).
• Given n samples or data points of the form (x1, y1),
(x2, y2),…,(xn, yn), where xi∈X and yi∈Y, linear
regression can be expressed as

• where intercept α and slope β are unknown

constants or regression coefficients, and ε is a
random error component.
Linear Regression
– Find the Least Square Error
• minimizes the error between the actual
data points and the estimated line
• LS Minimizes the Sum of the Squared
Differences (errors) (SSE)

• where yi is the real output value given

in the data set, and yi’ is a response
value obtained from the model.
• Squaring has the obvious feature of
treating positive and negative residuals
the same.
Linear Regression: Regression coefficients
Differentiating SSE with respect to α and β Setting the partial derivatives equal to
zero (minimization of the total error) and
rearranging the terms,

which may be solved simultaneously to yield computing formulas for α and β. Using
standard relations for the mean values, regression coefficients for this simple case of
optimization are
Slope =

Intercept =
Beta equals the covariance between x and y divided by the variance of x.
Linear Regression: Example
– Training Data
• where α and β coefficients can be calculated based on previous
formulas (using meanA = 5, and meanB = 6), and they have the
values

• The optimal regression line is

Linear Regression: Goodness of fit
– Mean Square Error
• suppose that you are trying to guess someone’s weight. If
you didn’t know anything about them, your best strategy
would be to guess ȳ; in that case the MSE of your guesses
would be Var(Y):
Linear Regression: Goodness of fit
• A number given by MSE is still hard to immediately intuit. Is this a good
prediction?

• To measure the predictive power of a model, we can compute the

coefficient of determination, more commonly known as “R-squared”:
Linear Regression: Goodness of fit
• To measure the predictive power of a model, we can
compute the coefficient of determination, more commonly
known as “R-squared”:

• So the term Var(ε)/Var(Y) is the ratio of mean squared

error with and without the explanatory variable, which is
the fraction of variability left unexplained by the model.
Linear Regression
– Quality of the linear regression model
• One parameter, which shows this strength of linear association
between two variables by means of a single number, is called a
correlation coefficient r .
Covariance (x,y)/Standard Dev of x . Standard Dev y

• Where

• A correlation coefficient r = 0.85 indicates a good linear

relationship between two variables.
Logistic Regression
Logistic Regression
– Probability of dependent variable
• Rather than predicting the value of the dependent variable
• the logistic regression method tries to estimate the
probability that the dependent variable will have a given
value.
– Customer Credit Rating example
• If the estimated probability is greater than 0.50 then the
prediction is closer to YES (a good credit rating),
• otherwise the output is closer to NO (a bad credit rating is
more probable).
Logistic Regression
– Odds Ratio
• Logistic regression uses the concept of odds ratios to
calculate the probability.
• For example, the probability of a sports team to win a
certain match might be 0.75.
• The odds for that team to lose would be 1 – 0.75 = 0.25.
• The odds ratio for that team winning would be 0.75/0.25
= 3.
• This can be said as the odds of the team winning are 3 to
1 on.
Logistic Regression
– Linear Logistic Model
• Suppose that output Y has two possible categorical values
coded as 0 and 1. (output is a vector)

• This equation is known as the linear logistic model . The

function log (pj /[1 − pj ]) is often written as logit(p).
• The main reason for using the logit form of output is to
prevent the predicting probabilities from becoming values
out of the required range [0, 1].
Logistic Regression
– Example
• suppose that the new sample for classification has input values {x1 , x2
, x3 } = {1, 0, 1}.

• Using the linear logistic model, it is possible to estimate the probability

of the output value 1, (p[Y = 1]) for this sample.
• First, calculate the corresponding logit(p)
• and then the probability of the output value 1 for the given inputs:

Based on the final value for probability p, we

may conclude that output value Y = 1 is more
probable than the other categorical value Y = 0. Curves of the form are called sigmoidal
because they are S-shaped, and nonlinear.
References
• Allen B. Downey - Think Stats-O'Reilly Media, Inc. (2018)
• https://en.wikipedia.org/wiki/Numerical_methods_for_li
near_least_squares
• https://towardsdatascience.com/linear-regression-
derivation-d362ea3884c2

Machine Leraning Unit 2
No ratings yet
Machine Leraning Unit 2
62 pages
Unit Ii
No ratings yet
Unit Ii
48 pages
Supervised Machine Learning Algorithm
100% (1)
Supervised Machine Learning Algorithm
111 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
Module 3
No ratings yet
Module 3
132 pages
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
No ratings yet
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
28 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
datamining-lect12
No ratings yet
datamining-lect12
75 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Unit 3 ML
No ratings yet
Unit 3 ML
28 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
05 Logistic Regression
No ratings yet
05 Logistic Regression
12 pages
Unit Iii
No ratings yet
Unit Iii
18 pages
Unit 2linear Regression Bayesian Learning
No ratings yet
Unit 2linear Regression Bayesian Learning
49 pages
2.3 Bayes classification
No ratings yet
2.3 Bayes classification
15 pages
ICT202B AI ML and Emerging technologies UNIT 3 (Classification and Regression) 2
No ratings yet
ICT202B AI ML and Emerging technologies UNIT 3 (Classification and Regression) 2
23 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
6 Classification
No ratings yet
6 Classification
53 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Classificationi 4
No ratings yet
Classificationi 4
4 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
No ratings yet
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
79 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
ML
No ratings yet
ML
22 pages
UNIT IV Na-Ve Bayes Classifier Algorithm
No ratings yet
UNIT IV Na-Ve Bayes Classifier Algorithm
33 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
Unit IV
No ratings yet
Unit IV
43 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Classification
No ratings yet
Classification
33 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
Unit 2
No ratings yet
Unit 2
80 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Data Mining Chapter
No ratings yet
Data Mining Chapter
6 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
Classification DMKD
No ratings yet
Classification DMKD
50 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
ML Unit-IV Notes
No ratings yet
ML Unit-IV Notes
49 pages
Random Variables: Corr (X, Y) Cov (X, Y) / Cov (X, Y) Is The Covariance (X, Y)
No ratings yet
Random Variables: Corr (X, Y) Cov (X, Y) / Cov (X, Y) Is The Covariance (X, Y)
15 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Lesson 1 Notes
No ratings yet
Lesson 1 Notes
27 pages
MLA Manual
No ratings yet
MLA Manual
25 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
2nd Unit Data Mining
No ratings yet
2nd Unit Data Mining
3 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Calculus Super Review
From Everand
Calculus Super Review
Editors of REA
No ratings yet
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
غفاری 1362
No ratings yet
غفاری 1362
6 pages
Weather Prediction Based On LSTM Model Implemented AWS Machine Learning Platform
100% (1)
Weather Prediction Based On LSTM Model Implemented AWS Machine Learning Platform
10 pages
Script
No ratings yet
Script
460 pages
Evolution of Database 2
No ratings yet
Evolution of Database 2
1 page
"Educational Data Mining A Review of Satate of Art
No ratings yet
"Educational Data Mining A Review of Satate of Art
18 pages
Python - ML
No ratings yet
Python - ML
2 pages
Lecture 5 - Multi-Layer Feedforward Neural Networks Using Matlab Part 1
No ratings yet
Lecture 5 - Multi-Layer Feedforward Neural Networks Using Matlab Part 1
4 pages
A Classification of Arab Ethnicity Based On Face Image Using Deep Learning Approach
No ratings yet
A Classification of Arab Ethnicity Based On Face Image Using Deep Learning Approach
12 pages
Note 5
No ratings yet
Note 5
24 pages
Data Analytics and Model Evaluation
No ratings yet
Data Analytics and Model Evaluation
55 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
44 pages
EC3606
No ratings yet
EC3606
1 page
Uvod Do Aj - Properties of Lanugage
No ratings yet
Uvod Do Aj - Properties of Lanugage
2 pages
5.0 Frequency Response
No ratings yet
5.0 Frequency Response
80 pages
Shreya Ghosh MS Thesis Final Revised
No ratings yet
Shreya Ghosh MS Thesis Final Revised
64 pages
Prediction of Failures in The Project Management K
No ratings yet
Prediction of Failures in The Project Management K
14 pages
Data Architect
No ratings yet
Data Architect
11 pages
System Science
No ratings yet
System Science
7 pages
Thesis - Final Submission
No ratings yet
Thesis - Final Submission
261 pages
Statistical Classification
No ratings yet
Statistical Classification
6 pages
Digital Control of Dynamic Systems - Franklin
No ratings yet
Digital Control of Dynamic Systems - Franklin
770 pages
Deep Learning Based Concrete Beam Crack Detection and Classification
No ratings yet
Deep Learning Based Concrete Beam Crack Detection and Classification
77 pages
Assignment 5 - Ai Mpec
No ratings yet
Assignment 5 - Ai Mpec
3 pages
Dop-DenseNet: Densely Convolutional Neural Network-Based Gesture Recognition Using A Micro-Doppler Radar
No ratings yet
Dop-DenseNet: Densely Convolutional Neural Network-Based Gesture Recognition Using A Micro-Doppler Radar
9 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
23 pages
Expert Systems With Applications: D.K. Vishwakarma, Rajiv Kapoor
No ratings yet
Expert Systems With Applications: D.K. Vishwakarma, Rajiv Kapoor
9 pages
DR Lee Peng Hin: EE6203 Computer Control Systems 77 DR Lee Peng Hin: EE6203 Computer Control Systems 78
No ratings yet
DR Lee Peng Hin: EE6203 Computer Control Systems 77 DR Lee Peng Hin: EE6203 Computer Control Systems 78
44 pages
SEMANTIC FIELD Khana
No ratings yet
SEMANTIC FIELD Khana
3 pages
Deep Learning Based Fusion Approach For Hate Speech Detection
No ratings yet
Deep Learning Based Fusion Approach For Hate Speech Detection
7 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages