0% found this document useful (0 votes)
18 views33 pages

L9 RBF+PM

The document covers various topics in linear algebra and machine learning, including matrix operations, least squares solutions, radial basis function networks, and clustering techniques like K-means. It also discusses classifier evaluation metrics such as precision, recall, and ROC curves, emphasizing the importance of these metrics in assessing model performance. Additionally, performance metrics for regression are outlined, including RMSE and MAE.

Uploaded by

garvitkhurana47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views33 pages

L9 RBF+PM

The document covers various topics in linear algebra and machine learning, including matrix operations, least squares solutions, radial basis function networks, and clustering techniques like K-means. It also discusses classifier evaluation metrics such as precision, recall, and ROC curves, emphasizing the importance of these metrics in assessing model performance. Additionally, performance metrics for regression are outlined, including RMSE and MAE.

Uploaded by

garvitkhurana47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Matrix partial derivative

Matrix transpose properties


Linear Algebraic Equations
Under-determined systems
Ax b
A is a m x n matrix, x is a n x 1
vector, and b is a m x 1 vector
Minimum Norm
2 2 2
Solution J  x1  x 2   x n x T x.
f Ax  b 0

J a  J  (1 f 1   2 f 2     m  1 f m  1   m f m )  J  λ T f .
J a
0 2x  A T λ ,
x
J
0  a  Ax  b.
λ
x  A**b, A**  AT ( AAT )  1
2x1+3x2 A = [2 3]; b = 8; xa = A\b
=8 xa = 0
2.6667

xb = lsqminnorm(A,b)
xb = 1.2308
1.8462
Least Squares Solutions (Minimum error solution)
Over-determined system
The least squares solution is solution which minimizes
the squared norm (size) of the error

T T
J e e ( Ax  b) ( Ax  b).

Premultiplying

X=lsqr(A,b)
Radial Basis Function (RBF) Networks

1. They are two-layer feed-forward networks.


2. The hidden nodes implement a set of radial basis
functions (e.g. Gaussian functions).
3. The output nodes implement linear summation
functions as in an MLP.
4. The network training is divided into two stages: first the
weights from the input to hidden layer are determined,
and then the weights from the hidden to output layer.
5. The training/learning is very fast.
6. The networks are very good at interpolation.
There is considerable evidence that
neurons in visual cortex are tuned to
local regions in the retina. They are
maximally sensitive to some specific
stimulus, & their output falls off as
presented stimulus moves away from
this “best” stimulus.
Gaussian basis functions
Implementing XOR
2
 0.5
When mapped into the feature space (z1, z2) , the two
classes become linearly separable.
Training RBF nets
Typically, the weights of the two layers are determined separately,
i.e. find RBF weights, and then find output layer weights

Hidden layer
– estimate parameters for each hidden unit k (whose
output depends on distance between input and a stored
prototype)
e.g. for Gaussian activation function, estimate parameters:
µ k , σk 2
– This stage involves an Unsupervised training process (no
targets available)
Output layer
– set the weights (including bias weights)
– the same as training a single layer perceptron: each unit’s
output depends on weighted sum of inputs,
– using for example, the gradient descent rule
– This stage involves a Supervised training process
Clustering
K-Means Approach

1. Select k multidimensional points to be the


“seeds” or initial centroids for the k clusters to
be formed. Seeds usually selected at random
2. Assign each observation to the cluster with the
nearest seed.
3. Update cluster centroids once all observations
have been assigned.
4. Repeat steps 2 and 3 until changes in cluster
centroids small.
5. Repeat steps 1-5 with new starting seeds. Do this
step 3 to 5 times.
K-Means Illustration – two dimensions
Fine Tuning
Computing the Output Weights

We want W (a weight matrix) such that


Target T = WX
Thus W= TX-1
If an inverse exists, then the error can be minimized
If no inverse exists, then use the pseudo-inverse to
get minimum error
‘Minimum-norm solution to a
linear system’
The pseudo-inverse is defined as
W=TX+
Where X+ = (XTX)-1 XT
XOR Problem
The relationship between the input and the
output of the network can be given by

where xj is an input vector and dj is the associated value of the desired


output.
Classifier Evaluation Metrics
Test data
y
Sl. x1 x2 t True Positives (TP): number of actual positive
No Actual Predicted examples, predicted as positive. 2
1 0.7 0.7 - + False Positives (FP): number of actual negative
+ examples, predicted as positive. 1 False Alarms
2 0.8 0.9 +
True Negatives (TN): number of actual negative
3 0.8 0.25 - _
examples, predicted as negative. 1
_
4 1.2 0.8 + False Negatives (FN): number of actual positive
5 0.6 0.4 + + examples, predicted as negative. 2
6 1.3 0.5 + _

Actual class\Predicted class Positive Negative


Positive True Positives (TP) 2 False Negatives (FN) 2
Negative False Positives (FP) 1 True Negatives (TN) 1
Be careful of “Accuracy”
The simplest measure of performance would be the fraction of
items that are correctly classified, or the “accuracy” which is:
TP + TN
TP + TN + FP + FN

But this measure is dominated by the larger set (of positives or


negatives) and favors trivial classifiers.

e.g. if 5% of instances are actually positive, then a classifier that


always says “negative” is % accurate.
Confusion Matrix:
Actual class\Predicted class Positive Negative
Positive True Positives (TP) 2 False Negatives (FN) 2
Negative False Positives (FP) 1 True Negatives (TN) 1

Precision: Given all the predicted classes (for a given class X),
how many instances were correctly predicted?
Recall: For all instances that should have an actual class X,
how many of these were correctly predicted?

Recall, hit rate, sensitivity, True positive rate

False positive rate =FP/FP+TN

Precision measures what fraction of our detections are actually positive Pp= TP/(TP + FP)

Recall measures what fraction of the positives are detected Rp= TP/(TP + FN)
For multi-class classification

Actual class\ A B C
Predicted class

A True A (30) A FalseB (50) A False C (20)

B BFalse A (20) True B Bfalse C

C CFalseA (10) CFalse B True C

RA=30/100 PA=30/60
F measure (F1 or F-score): harmonic mean of
precision and recall,

TPR= TP/N+ sensitivity, recall FPR=FP/N- False alarm rate, type 1 error rate
FNR=FN/N+ miss rate, type 2 error rate TNR= TN/N- = Specificity

Sensitivity: Probability of predicting disease given true state is disease


Specificity: Probability of predicting non-disease given true state is non-disease

ROC (Receiver Operating Characteristics) curves: for


visual comparison of classification models
Originated from signal detection theory
Shows the trade-off between the true positive rate
and the false positive rate
The area under the ROC curve is a measure of the
accuracy of the model
Specific Example

People People
without with
disease disease

Test Result
Threshold
Call these patients Call these patients “positive”
“negative”

Test Result
Some definitions ...
Call these patients Call these patients “positive”
“negative”
True Positives

Test Result

without the
disease
with the disease
Moving the Threshold: left

‘‘-’ ‘‘+’
’ ’

Test Result

without the Which line has the higher recall of -?


disease
with the disease Which line has the higher precision
of -?
Call these patients Call these patients “positive”
“negative”

Test Result False


Positives
without the
disease
with the disease
ROC curve
100%
True Positive Rate
(Recall)

0
% 100
0
% False Positive %
Rate (1-
specificity)
Area under ROC curve (AUC)
100% 100%

AUC =

True Positive Rate


True Positive Rate

100%

AUC =
0
%
0
%
0
50% 100%
0 100% False Positive Rate
False Positive Rate %
%

100% 100%

AUC =
True Positive Rate

True Positive Rate


90%
AUC =
65%
0 0
% %
0 100% 100%
False Positive Rate 0
% % False Positive Rate
Performance Metrics for Regression

Root Mean Squared Error (RMSE)


Mean Absolute Error (MAE)
Mean Squared Error (MSE)
R-squared (Coefficient of Determination)
Mean Absolute Percentage Error (MAPE)

You might also like