0% found this document useful (0 votes)

6 views

MLSlides1 Selected Shared

The document outlines the course structure for Machine Learning (BITS F464) taught by Dr. Paresh Saxena at BITS Pilani, including evaluation components, textbook references, and a detailed course plan covering various machine learning topics. It highlights key learning objectives such as supervised and unsupervised learning, linear regression, and model evaluation techniques. Additional resources and methods for addressing overfitting and validation in machine learning are also discussed.

Uploaded by

f20221227

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

MLSlides1 Selected Shared

Uploaded by

f20221227

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Machine Learning

(BITS F464)
Dr. Paresh Saxena
BITS Pilani Dpt. of Computer Science & Information Systems
Email: [email protected]
Hyderabad Campus
Course Handout Discussion

Lecturer:
Paresh Saxena – [email protected], https://psaxena86.github.io/

Announcements:
via CMS, in-class

Text Book:

T1. Christopher Bishop: Pattern Recognition and Machine Learning, Springer, 1st ed. 2006.
T2. Tom M. Mitchell: Machine Learning, The McGraw-Hill International Edition, 1997.

BITS Pilani, Hyderabad Campus

Course Evaluation

Component Weightage Date&Time Mode

Mid-Term exam 35% As announced in the Closed Book
Timetable
Course Project (with 25% Details will be announced in Open Book
final presentation/viva) (5% will be September
evaluated before
the mid-sem)

Comprehensive 40% As announced in the Closed Book

Timetable

* Slides will not be shared.

BITS Pilani, Hyderabad Campus

Course Plan

Lecture No. (Each Chapter in the Text

Learning objectives Topics to be covered
lecture 1 hour) Book
To introduce several ML overview, Python and ML Lecture Notes
1–3 relevant materials to frameworks.
understand ML algorithms
Gradient Descent, Bias-Variance, T1 – Ch. 3
To understand linear
3–8 Bayesian Regression, Bayesian
models for regression Model Comparison.
Discriminant Functions, T1 – Ch. 4
To understand linear Probabilistic Generative and
9–14
models for classification Discriminative Models, Bayesian
Logistic Regression
Feed-forward Network Functions, T1 – Ch.5
To understand Neural
15-22 Network Training,
networks Backpropagation, Regularization.
23-32 To understand Kernel Radial basis function networks, T1 – Ch. 6 and 7
methods and Sparse Kernel Gaussian processes, SVMs,
Multiclass SVMs
Machines
To develop the K-means Clustering, Mixture T2 – Ch. 9 and Ch. 14
understanding of Mixture Models, EM, Bagging, Boosting,
32-40
models and combining Decision Trees.
models

BITS Pilani, Hyderabad Campus

Subfields of ML

• Supervised Learning

• Unsupervised Learning

• Reinforcement Learning (Not covered in this course, join

CS F317 in odd semseters)

• Other familiar terms: Online learning, Query Learning,

Semi-Supervised Learning, Anomaly Detection.

BITS Pilani, Hyderabad Campus

Supervised Learning

• 20,000 images
• 0 to 116 years
• Labeled (age, gender,
and ethnicity)

Ref: https://susanqq.github.io/UTKFace/

Task: Given a new image, identify the age !! or Given a new image, identify the
gender !!
• Hard to choose traditional rule-based approaches for prediction.
• Supervised Learning can use the dataset and make an accurate predictor
BITS Pilani, Hyderabad Campus
Unsupervised Learning

Link: https://data.sfgov.org/Transportation/Air-Traffic-Passenger-Statistics/rkru-6vcg

Customers
#1 #2 #3 #4 …….. #100000
Coffee 1 0 0 1 …….. 0
Tea 0 1 0 0 …….. 1
Milk 1 1 0 1 …….. 1
Products

Objective: to find some common

Soap 0 1 1 1 …….. 0 patterns in data. Example – If someone
Aspirin 1 0 1 0 …….. 0 buys milk, they are also likely to buy
. . . . . …….. .
Coffee.
. . . . . .
. . . . . .
Perfume 0 1 1 0 …….. 1
Supermarket Data
BITS Pilani, Hyderabad Campus
Machine Learning Srihari

Regression vs Classification
Regression vs Classification

4
BITS Pilani, Hyderabad Campus
Linear Models for Regression:
Notations
• N observations {xn}, n=1,2,…,N
• Target values {tn} labels
D = 3 (features/attributes) tn

House Age Distance from Number of House Price (in

(years) the Center (Km) Rooms lakhs)
4 2 3 68

11 2 3 87
xn 5 4 2 45

10 4 2 23

20 8 4 35

Example: N=5
Predict
Inputs Regressor Real Number

• Goal: Predict t for a new value of x !!

• Solution (Linear models): Find a function y(x) that will give a value of t for a new value of x.
BITS Pilani, Hyderabad Campus
Linear Regression: History

BITS Pilani, Hyderabad Campus

Linear Models for Regression

• Given:

• Linear Regression:

• Extension with basis functions:

bias
parameters Basis functions (non-linear)

BITS Pilani, Hyderabad Campus

Linear Models for Regression
(Polynomial Basis Functions)

For polynomial regression:

• Single input variable 𝑥,
• 𝜙𝑗 𝑥 = 𝑥 𝑗 and so,
• 𝑦 = 𝑤0 𝑥 0 + 𝑤1 𝑥 1 + 𝑤2 𝑥 2 +…… +𝑤𝑀−1 𝑥 𝑀−1

Other functions:
• Gaussian basis function
• Sigmoidal basis function, Radial Basis Function (RBF), Wavelets, etc.
• Identity function:

BITS Pilani, Hyderabad Campus

Polynomial Curve Fitting

Polynomial Fitting

Determine the value of coefficients with training data

Error Function:

Minimize the error and find w !!

BITS Pilani, Hyderabad Campus
Model Selection (order of M)

Comparing errors in training

and testing data set.
Best Fit Overfitting High values of coefficients
as M increases

Resolving Overfitting with more data (M=9)

BITS Pilani, Hyderabad Campus

Minimizing the Squared Error
(Maximum Likelihood)

• Let us assume we have N observations

• 𝑡 𝑛 is a single output for nth observation.
• Error is given by sum of squared difference between the sum of observed outputs and predictions:
𝑁

𝐸 𝒘 = ෍ (𝑡𝑛 − 𝑦(𝒙𝑛 , 𝒘))2

𝑛=1
𝑁

= ෍ (𝑡𝑛 − 𝒘𝑇 𝜙(𝒙𝑛 ))𝑻 (𝑡𝑛 − 𝒘𝑇 𝜙(𝒙𝑛 ))

𝑛=1

Derive with respect to w and equate it to 0.

Moore-Penrose
Pseudo Inverse

Inverse is computationally
expensive !!

Design
Matrix
BITS Pilani, Hyderabad Campus
Ridge Regression
• To counter Overfitting user regularization
• Remember from the previous lectures (overfitting):

Comparing errors in training and testing data set.

High values of coefficients
as M increases
Best Fit Overfitting

Resolving Overfitting with more data (M=9)

BITS Pilani, Hyderabad Campus
Training, Testing, Validation
and Cross-Validation

• Overfitting problems mainly result in using validation set in

addition to training set (compare models on validation set)
• Multiple iterations with limited data size may also result in
over-fitting with validation set, and hence test set is also
required.
• With limited data and small validation set, cross-validation is
also one of the solutions.

Drawback:
Training Complexity!!

BITS Pilani, Hyderabad Campus

Loss Function: Likelihood

• Data set: {𝜙(𝑥𝑛 ), 𝑡𝑛 }, where 𝑡𝑛 ∈ {0,1}

• Total number: n = 1,2,…,N
• The likelihood function can be written as,

Likelihood function is considered as a loss function that

prefers the correct class labels of the training examples to
be more likely.

BITS Pilani, Hyderabad Campus

Continue - 1
Likelihood function is:

Take a log both side (handy mathematically) with the aim to maximize it. To
have a corresponding loss function, take negative logarithm (also known as
cross-entropy error function):

where 𝑦𝑛 = 𝜎(𝑎𝑛 ) and 𝑎𝑛 = 𝑤 𝑇 𝜙𝑛 . Here, 𝜎 is a

sigmoid function and it is equalt to 1/(1+e^{-x})

Minimize Error E(w), and take derivative E(w) is zero:

BITS Pilani, Hyderabad Campus

Gradient
Gradient of
Gradient of
Error Error Function
Function
of Error Function
Gradient of Error Function
ror Continue
Error function
function
Error function
– 2
Error function
- {t t ) ln(1
lny -+y(1)- t ) ln(1 - y )}
N

E(w) = - ln p(t | w) = - å {p(t | w) =+ -(1å

} - y )}
N
E(w) = - ln
E(w) = - ln p(t | w) = - å {t ln y + (1 - t ) ln(1
N
t lny n n n n

Error function: E(w) = - ln p(t | w) = - å {t lny + (1 - t ) ln(1 - y )}

n n N
n =1 n n
n =1 n n n n
n =1 n n n n
Tϕ ) yn= σ(w ϕn)
where T n =1
where y =
where: where
n σ(w yn= σ(w ϕn)
n T
where yn= σ(wTϕn)
Using Derivative
singUsing
Derivative of logistic of sigmoid
logistic sigmoid
dσ/da=σ(1-σ)dσ/da=σ(1-σ)
Derivative
Using of
Derivative logistic sigmoid dσ/da=σ(1-σ)
is given as:of logistic sigmoid dσ/da=σ(1-σ)
adient Gradient
Gradient of
of the of function
the error function Proof of gradient expression
Error function
error
Gradient of the error function
Proof ofProof
gradient expression
z =gradient
Let of z +z expression
Gradient of the error
Let z function
= z + z
Let z = z + zProof of gradient expression
1 2

)( ( ) )
N 1 2

( å = =s
z1 = zt ln fz) and z2 = (1 - t ) ln[1 - s (wf )]
N
ÑE(w) = - f +-
1 2
where Let (w
å yn =- tå s f 2t ) ln[1 - s (wf )]
N
y t = z1(1
ÑE(w) =ÑE(w) f - f
where z t ln
where (w
z = )t and
ln s z
(w f ) and z2 = (1 - t ) ln[1 - s (wf )]
( )
n N n n 1 2
y t s (wf )[1 z1-=st(w f f) and zds == (1
sfs)](w
å n n 1 n= dz1 t s (wf)[1 - s (wf)]f = s (1- sd)s = sda(1- - ts))ln[1 - s
1
n n nn=1 dz t where
n =1 ÑE(w)
n =1 = n n
y - tdzf t s (w f )[1 - s
1
=
(w f )]f dln 2 s (1-
s)
Error x Feature Vector
dw s=(w
dw
f ) dz1 st s(w (wff) )[1
da - s (wfda )]f d ds = s (1-a s)
Error x Feature Vector
n=1
Error x Feature Vector dw s (w =f ) d Using a (ln ax) =
da
and dw Using s (wf )(ln ax) = d (ln ax) dx = a x
Error x andVector and
Feature Using
dx xdx x d
Contribution to gradient by data dz2 (1 - t )s (wf )[1 - s (wf )](-f )Using (ln ax)
tribution to gradient
Contribution to by data
gradient by data - s f = - s
and f f dx
point nContribution
is error between dz2
targetby (1 t )
tn =datadz2 =dw (w(1 - t )s (wf )[1 - s (wf )](-f )
)[1 (w )](- )
nt n ispoint
errornbetween target tto gradient
t [1--t )ss(w ff9)])[1 - s (w9f )](-9f)
is error
and prediction between n target
σ (w
yerror
n= T
Tφ ) ntimes
dw basis dwφn[1 - s (wf )]dz2 (1
[1=- s (wf )] (w
prediction y = σ
point
and prediction (w Tnφ is
) times basisn
between φ target t
yn=n σ (w φn) timesnbasis φn dz Therefore
n dz - s f 9
n dw = ( s (w f
[1 ) - t f
(w
) )]
and prediction yn= σ (wTTherefore
φn) times = (sφ
basis
Therefore (wndz
f) -=dw
t()sf(wf ) - t )f
dw dz
dw
Therefore = (s (wf ) - t )f
dw

BITS Pilani, Hyderabad Campus

Gradient Descent

• Find the optimal weights that minimizes the error

function
• For Logistic regression, the loss function is convex,
hence has just one minimum

𝒘𝑡+1 = 𝒘𝑡 − 𝜂∇𝐸(𝒘)

BITS Pilani, Hyderabad Campus

[Ebooks PDF] download (Ebook) An Introduction to Scientific Computing with MATLAB® and Python Tutorials by Sheng Xu ISBN 9781032063157, 1032063157 full chapters
86% (7)
[Ebooks PDF] download (Ebook) An Introduction to Scientific Computing with MATLAB® and Python Tutorials by Sheng Xu ISBN 9781032063157, 1032063157 full chapters
81 pages
ML CS-2 CS3 Student Reference V1.0
No ratings yet
ML CS-2 CS3 Student Reference V1.0
88 pages
Merged Presentation Choladeck Choladeck-compressed
No ratings yet
Merged Presentation Choladeck Choladeck-compressed
239 pages
2.SupervisedLearning Error
No ratings yet
2.SupervisedLearning Error
32 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
Lect 1
No ratings yet
Lect 1
24 pages
EE353 - 769 00 Course Introduction
No ratings yet
EE353 - 769 00 Course Introduction
28 pages
ML CS3 Part 2 Student Reference V1.0
No ratings yet
ML CS3 Part 2 Student Reference V1.0
39 pages
Lecture-8-HCL-DSE - Sumita Narang
No ratings yet
Lecture-8-HCL-DSE - Sumita Narang
37 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
ML CS13_bagging
No ratings yet
ML CS13_bagging
39 pages
Unit II - 1 - Chapter 4 - Training Models
No ratings yet
Unit II - 1 - Chapter 4 - Training Models
20 pages
AML AfterMid Merged
No ratings yet
AML AfterMid Merged
389 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Linear Algebra For Machine Learning
No ratings yet
Linear Algebra For Machine Learning
65 pages
Lecture1 2015
No ratings yet
Lecture1 2015
52 pages
EDAN96_2024_Last_lecture-1
No ratings yet
EDAN96_2024_Last_lecture-1
78 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
BITS-F464-Handout
No ratings yet
BITS-F464-Handout
3 pages
PGP-AIML Curriculum - Great Lakes
No ratings yet
PGP-AIML Curriculum - Great Lakes
43 pages
Regression
No ratings yet
Regression
39 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
1Datamining Intro
No ratings yet
1Datamining Intro
42 pages
CS L03 MachineLearning Basics 01
No ratings yet
CS L03 MachineLearning Basics 01
66 pages
CS-11-01
No ratings yet
CS-11-01
124 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Lec 2 Basics of machine learning (1)
No ratings yet
Lec 2 Basics of machine learning (1)
35 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
L1 - Introduction
No ratings yet
L1 - Introduction
21 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
49 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
101827-FS2018-0: Programming With MATLAB: Advanced Course: Felix Wichmann
No ratings yet
101827-FS2018-0: Programming With MATLAB: Advanced Course: Felix Wichmann
31 pages
University Institute of Engineering Department of Computer Science and Engg
No ratings yet
University Institute of Engineering Department of Computer Science and Engg
15 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
ML Notes_compressed_organized
No ratings yet
ML Notes_compressed_organized
84 pages
Introduction To AI, ML and DL: Dr. Manjubala Bisi
No ratings yet
Introduction To AI, ML and DL: Dr. Manjubala Bisi
33 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Lecture9_ML-Algorithms
No ratings yet
Lecture9_ML-Algorithms
22 pages
Classification
No ratings yet
Classification
4 pages
Super Cheatsheet Machine Learning
100% (1)
Super Cheatsheet Machine Learning
15 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
L1 - Introduction
No ratings yet
L1 - Introduction
21 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Unit 3
No ratings yet
Unit 3
55 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
CL-I Lab Manual
No ratings yet
CL-I Lab Manual
131 pages
DSA Module 3
No ratings yet
DSA Module 3
30 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
51 pages
Statistical Methods-1
No ratings yet
Statistical Methods-1
63 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
SkriptOptMach
No ratings yet
SkriptOptMach
49 pages
Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
From Everand
Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
Avishek Nag
No ratings yet
Machine Learning for Beginners: Learn to Build Machine Learning Systems Using Python (English Edition)
From Everand
Machine Learning for Beginners: Learn to Build Machine Learning Systems Using Python (English Edition)
Harsh Bhasin
No ratings yet
3 - Polynomial and Rational Functions - y A (X-H) 2+k
No ratings yet
3 - Polynomial and Rational Functions - y A (X-H) 2+k
3 pages
Chapter 3 - Searching-Part 3
No ratings yet
Chapter 3 - Searching-Part 3
64 pages
Transportation, Assignment, and Network Models
No ratings yet
Transportation, Assignment, and Network Models
60 pages
GRAPH THEORY CST-013
No ratings yet
GRAPH THEORY CST-013
1 page
Grade 10 Daily Lesson Log School Grade Level 10 Teacher Learning Area Mathematics Quarter First
No ratings yet
Grade 10 Daily Lesson Log School Grade Level 10 Teacher Learning Area Mathematics Quarter First
14 pages
Nidheesh - Ctee - Simulation Result
No ratings yet
Nidheesh - Ctee - Simulation Result
11 pages
Chapter 19
No ratings yet
Chapter 19
61 pages
Get Python Programming and Numerical Methods: A Guide for Engineers and Scientist 1st Edition Qingkai Kong free all chapters
100% (7)
Get Python Programming and Numerical Methods: A Guide for Engineers and Scientist 1st Edition Qingkai Kong free all chapters
55 pages
Numerical Analysis: NA Team 2024
No ratings yet
Numerical Analysis: NA Team 2024
17 pages
CS3401 ALG UNIT 2 NOTES EduEngg
No ratings yet
CS3401 ALG UNIT 2 NOTES EduEngg
25 pages
Minimum Spanning Trees - Cormen Book Ch 23
No ratings yet
Minimum Spanning Trees - Cormen Book Ch 23
17 pages
Kruskal's Algorithm: Melanie Ferreri
No ratings yet
Kruskal's Algorithm: Melanie Ferreri
21 pages
Algorithm Design Techniques - 1556432967209
No ratings yet
Algorithm Design Techniques - 1556432967209
8 pages
MH Pre-Calculus 12 Textbook - MODULE 3 CH 3 (Pgs 104 - 161)
No ratings yet
MH Pre-Calculus 12 Textbook - MODULE 3 CH 3 (Pgs 104 - 161)
58 pages
Brief Description of The Problem:: Key Steps To Solve
No ratings yet
Brief Description of The Problem:: Key Steps To Solve
4 pages
Introduction To Management Science
No ratings yet
Introduction To Management Science
46 pages
Deep Learning Curriculum
No ratings yet
Deep Learning Curriculum
23 pages
daa-unit-IV
No ratings yet
daa-unit-IV
76 pages
Code PDF Merged
No ratings yet
Code PDF Merged
32 pages
Educational Wave Pakistan: Name Class 10 Marks 40 Roll# SUBJECT Mathematics Time 60 Min
No ratings yet
Educational Wave Pakistan: Name Class 10 Marks 40 Roll# SUBJECT Mathematics Time 60 Min
2 pages
Determining If There Are Alternative Optimal Solutions
No ratings yet
Determining If There Are Alternative Optimal Solutions
9 pages
ML_lab_Dipali
No ratings yet
ML_lab_Dipali
36 pages
Sven O Krumke Integer Programming Polyhedra and Algorithms Lecture Notes
No ratings yet
Sven O Krumke Integer Programming Polyhedra and Algorithms Lecture Notes
188 pages
Tutorial On Ellipsoid Method
No ratings yet
Tutorial On Ellipsoid Method
15 pages
Lagrange Multipliers.
No ratings yet
Lagrange Multipliers.
13 pages
CS3401 - Algorithm
No ratings yet
CS3401 - Algorithm
37 pages
MathEng5-M - Part 4
No ratings yet
MathEng5-M - Part 4
18 pages
Lecture Notes Chapter 7
No ratings yet
Lecture Notes Chapter 7
30 pages
Book 111
No ratings yet
Book 111
3 pages

MLSlides1 Selected Shared

Uploaded by

MLSlides1 Selected Shared

Uploaded by

Machine Learning

BITS Pilani, Hyderabad Campus

Component Weightage Date&Time Mode

Comprehensive 40% As announced in the Closed Book

* Slides will not be shared.

BITS Pilani, Hyderabad Campus

Lecture No. (Each Chapter in the Text

BITS Pilani, Hyderabad Campus

• Reinforcement Learning (Not covered in this course, join

• Other familiar terms: Online learning, Query Learning,

BITS Pilani, Hyderabad Campus

Objective: to find some common

House Age Distance from Number of House Price (in

• Goal: Predict t for a new value of x !!

BITS Pilani, Hyderabad Campus

• Extension with basis functions:

BITS Pilani, Hyderabad Campus

For polynomial regression:

BITS Pilani, Hyderabad Campus

Determine the value of coefficients with training data

Minimize the error and find w !!

Comparing errors in training

Resolving Overfitting with more data (M=9)

BITS Pilani, Hyderabad Campus

• Let us assume we have N observations

𝐸 𝒘 = ෍ (𝑡𝑛 − 𝑦(𝒙𝑛 , 𝒘))2

= ෍ (𝑡𝑛 − 𝒘𝑇 𝜙(𝒙𝑛 ))𝑻 (𝑡𝑛 − 𝒘𝑇 𝜙(𝒙𝑛 ))

Derive with respect to w and equate it to 0.

Comparing errors in training and testing data set.

Resolving Overfitting with more data (M=9)

• Overfitting problems mainly result in using validation set in

BITS Pilani, Hyderabad Campus

• Data set: {𝜙(𝑥𝑛 ), 𝑡𝑛 }, where 𝑡𝑛 ∈ {0,1}

Likelihood function is considered as a loss function that

BITS Pilani, Hyderabad Campus

where 𝑦𝑛 = 𝜎(𝑎𝑛 ) and 𝑎𝑛 = 𝑤 𝑇 𝜙𝑛 . Here, 𝜎 is a

Minimize Error E(w), and take derivative E(w) is zero:

BITS Pilani, Hyderabad Campus

E(w) = - ln p(t | w) = - å {p(t | w) =+ -(1å

Error function: E(w) = - ln p(t | w) = - å {t lny + (1 - t ) ln(1 - y )}

BITS Pilani, Hyderabad Campus

• Find the optimal weights that minimizes the error

BITS Pilani, Hyderabad Campus

You might also like