0% found this document useful (0 votes)
9 views15 pages

Statistical Modelling and Evaluation

Uploaded by

Sehjad Padaniya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views15 pages

Statistical Modelling and Evaluation

Uploaded by

Sehjad Padaniya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Statistical modelling and evaluation

Model and Model selection


Error Term

Target Function

Cost Function: Helps to measure the extent to which the model is wrong
Model building

Model
Training
Development
Data
2/3

Pre-processe
d Data

1/3 Model Prediction


Test Data Assessment Accuracy
(Scoring)
► Simple Split:
The simple split (or holdout or test sample estimation) divides the dataset into 2
mutually exclusive sets. Usually, 70% of the dataset is for training, and the rest is for
testing.

In case of artificial neural net, the dataset is segregated into 3 mutually exclusive sets
for training (60%), testing (20%), validation (20%)
► K-fold cross validation Cross Validation
Resampling ► Leave one out Methods

► Bootstrapping
methods ► Jackknifing
Bias variance trade off
Measuring accuracy

True Class

Positive Negative
P Posit
re ive True positive (TP) False positive (FP)
di
ct Nega
e tive False negative (FN) True negative (TN)
d
C
la
ss Confusion matrix of two class classification
Metrics for classification models


► F-measure also known as F-scores is a measure of the classifier test’s accuracy. In
order to calculate the F-score of a test, both precision and recall are considered. In
other words, F-score can be defined as the harmonic mean of precision and recall.
The best value for F-score is close to 1 and worst value is close to 0. F-score can be
calculated using
Receiver Operating Characteristic (ROC)

► ROC is an important measure to check the accuracy of a classifier. It has been


previously used in signal detection theory to depict the trade off between hit rates
and false alarm rates over noisy channel. Now, it is widely used in machine
learning field as a useful technique to visualize the performance of the classifier.
ROC curve is a plot between TPR and FPR.
ROC curve
Regression


Clustering

-1 to + 1

You might also like