Cross Validation

Uploaded by

tharunekaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views4 pages

Cross Validation

Uploaded by

tharunekaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Cross-Validation in Machine Learning

Cross-validation is a technique for validating the model efficiency by training it on the

subset of input data and testing on previously unseen subset of the input data. We can also
say that it is a technique to check how a statistical model generalizes to an
independent dataset.

In machine learning, there is always the need to test the stability of the model. It means
based only on the training dataset; we can't fit our model on the training dataset. For this
purpose, we reserve a particular sample of the dataset, which was not part of the training
dataset. After that, we test our model on that sample before deployment, and this complete
process comes under cross-validation. This is something different from the general train-
test split.

Hence the basic steps of cross-validations are:

o Reserve a subset of the dataset as a validation set.

o Provide the training to the model using the training dataset.

o Now, evaluate model performance using the validation set. If the model performs
well with the validation set, perform the further step, else check for the issues.

Methods used for Cross-Validation

1. Leave One Out Cross-Validation

In this method, we divide the data into train and test sets. Instead of dividing the data into 2
subsets, we select a single observation as test data, and everything else is labeled as
training data and the model is trained. Now the 2nd observation is selected as test data and
the model is trained on the remaining data.

This process continues ‘n’ times and the average of all these iterations is calculated and
estimated as the test set error.
When it comes to test-error estimates, LOOCV gives unbiased estimates (low bias). But
bias is not the only matter of concern in estimation problems. We should also consider
variance.

LOOCV has an extremely high variance because we are averaging the output of n-models
which are fitted on an almost identical set of observations, and their outputs are highly
positively correlated with each other.

2. K-Fold Cross-Validation

In this resampling technique, the whole data is divided into k sets of almost equal sizes. The
first set is selected as the test set and the model is trained on the remaining k-1 sets. The
test error rate is then calculated after fitting the model to the test data.

In the second iteration, the 2nd set is selected as a test set and the remaining k-1 sets are
used to train the data and the error is calculated. This process continues for all the k sets.

The mean of errors from all the iterations is calculated as the CV test error estimate.

In K-Fold CV, the no of folds k is less than the number of observations in the data (k<n) and
we are averaging the outputs of k fitted models that are somewhat less correlated with
each other since the overlap between the training sets in each model is smaller. This leads
to low variance then LOOCV.
The best part about this method is each data point gets to be in the test set exactly once and
gets to be part of the training set k-1 times. As the number of folds k increases, the variance
also decreases (low variance). This method leads to intermediate bias because each
training set contains fewer observations (k-1)n/k than the Leave One Out method but more
than the Hold Out method.

Typically, K-fold Cross Validation is performed using k=5 or k=10 as these values have been
empirically shown to yield test error estimates that neither have high bias nor high
variance.

The major disadvantage of this method is that the model has to be run from scratch k-times
and is computationally expensive than the Hold Out method but better than the Leave One
Out method.

3. Stratified K-Fold Cross-Validation

This is a slight variation from K-Fold Cross Validation, which uses ‘stratified
sampling’ instead of ‘random sampling.’

Let’s quickly understand what stratified sampling is and how is it different from random
sampling.

Suppose your data contains reviews for a cosmetic product used by both the male and
female population. When we perform random sampling to split the data into train and test
sets, there is a possibility that most of the data representing males is not represented in
training data but might end up in test data. When we train the model on sample training
data that is not a correct representation of the actual population, the model will not predict
the test data with good accuracy.

This is where Stratified Sampling comes to the rescue. Here the data is split in such a way
that it represents all the classes from the population.

Let’s consider the above example which has a cosmetic product review of 1000 customers
out of which 60% is female and 40% is male. I want to split the data into train and test data
in proportion (80:20). 80% of 1000 customers will be 800 which will be chosen in such a
way that there are 480 reviews associated with the female population and 320
representing the male population. In a similar fashion, 20% of 1000 customers will be
chosen for the test data ( with the same female and male representation).
This is exactly what stratified K-Fold CV does and it will create K-Folds by preserving the
percentage of sample for each class. This solves the problem of random sampling
associated with Hold out and K-Fold methods.

4. Leave-P-out cross-validation

In this approach, the p datasets are left out of the training data. It means, if there are total n
datapoints in the original input dataset, then n-p data points will be used as the training
dataset and the p data points as the validation set. This complete process is repeated for all
the samples, and the average error is calculated to know the effectiveness of the model.

There is a disadvantage of this technique; that is, it can be computationally difficult for the
large p.

Limitations of Cross-Validation

There are some limitations of the cross-validation technique, which are given below:

o For the ideal conditions, it provides the optimum output. But for the inconsistent
data, it may produce a drastic result. So, it is one of the big disadvantages of cross-
validation, as there is no certainty of the type of data in machine learning.

o In predictive modeling, the data evolves over a period, due to which, it may face the
differences between the training set and validation sets. Such as if we create a model
for the prediction of stock market values, and the data is trained on the previous 5
years stock values, but the realistic future values for the next 5 years may drastically
different, so it is difficult to expect the correct output for such situations.

Applications of Cross-Validation

o This technique can be used to compare the performance of different predictive

modeling methods.

o It has great scope in the medical research field.

o It can also be used for the meta-analysis, as it is already being used by the data
scientists in the field of medical statistics.

Unit 9 Model Evaluation
No ratings yet
Unit 9 Model Evaluation
26 pages
Module3-Ensemble Learning
No ratings yet
Module3-Ensemble Learning
107 pages
DS Unit 5
No ratings yet
DS Unit 5
18 pages
ML-4th Unit
No ratings yet
ML-4th Unit
44 pages
List Steps in Data Preparation. Give Short Description of Each Step
No ratings yet
List Steps in Data Preparation. Give Short Description of Each Step
20 pages
Cross Validation in ML
No ratings yet
Cross Validation in ML
5 pages
Cross Validation in Machine Learning
No ratings yet
Cross Validation in Machine Learning
4 pages
18 Bias Variance K-foldCrossValidation Boosting
No ratings yet
18 Bias Variance K-foldCrossValidation Boosting
23 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
Cross Validation Techniques
No ratings yet
Cross Validation Techniques
27 pages
Unit 5 (ML)
No ratings yet
Unit 5 (ML)
25 pages
Cross Validation
No ratings yet
Cross Validation
16 pages
Cross Validation
No ratings yet
Cross Validation
5 pages
Cross-Validation in Machine Learning - Javatpoint
No ratings yet
Cross-Validation in Machine Learning - Javatpoint
8 pages
Intro to Resampling Methods
No ratings yet
Intro to Resampling Methods
15 pages
Cross Validation
No ratings yet
Cross Validation
7 pages
Cross Validation
No ratings yet
Cross Validation
16 pages
Cross-Validation Techniques Guide
No ratings yet
Cross-Validation Techniques Guide
10 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
Unit V
No ratings yet
Unit V
16 pages
Cross Validation for ML Models
No ratings yet
Cross Validation for ML Models
6 pages
Machine Learning Data Splits Guide
No ratings yet
Machine Learning Data Splits Guide
30 pages
Chapter2 1 33
No ratings yet
Chapter2 1 33
18 pages
Cross Validation Techniques Guide
No ratings yet
Cross Validation Techniques Guide
21 pages
Lecture Note #6 - PEC-CS701E
No ratings yet
Lecture Note #6 - PEC-CS701E
11 pages
Unit 2
No ratings yet
Unit 2
28 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
Cross-Validation in Machine Learning
No ratings yet
Cross-Validation in Machine Learning
18 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
Unit V
No ratings yet
Unit V
12 pages
Sklearn Cross-Validation Guide
100% (1)
Sklearn Cross-Validation Guide
9 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
Cross Validation 1
No ratings yet
Cross Validation 1
5 pages
Model Evaluation and Cross-Validation Methods
No ratings yet
Model Evaluation and Cross-Validation Methods
3 pages
Cross Validation
No ratings yet
Cross Validation
6 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
ML Module Iii
No ratings yet
ML Module Iii
12 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
Cross-Validation in Machine Learning
No ratings yet
Cross-Validation in Machine Learning
4 pages
Cofusion Matrix Cross - Validation
No ratings yet
Cofusion Matrix Cross - Validation
34 pages
Cross Validation and Performance Evaluation
No ratings yet
Cross Validation and Performance Evaluation
47 pages
Cross Validation It S Types and How To Choose Correct CV 1707762388
No ratings yet
Cross Validation It S Types and How To Choose Correct CV 1707762388
13 pages
Answer-4 Shreyansh
No ratings yet
Answer-4 Shreyansh
4 pages
Model Validation
No ratings yet
Model Validation
5 pages
Model Validation Techniques
No ratings yet
Model Validation Techniques
9 pages
DAV Module 2
No ratings yet
DAV Module 2
21 pages
Analysis of K-Fold Cross-Validation Over Hold-Out
No ratings yet
Analysis of K-Fold Cross-Validation Over Hold-Out
6 pages
Comparison Between Performance of Classifiers
No ratings yet
Comparison Between Performance of Classifiers
5 pages
Cross Validation - Notes
No ratings yet
Cross Validation - Notes
10 pages
K Fold
No ratings yet
K Fold
21 pages
3.1. Cross-Validation - Evaluating Estimator Performance - Scikit-Learn 1.3.0 Documentation
No ratings yet
3.1. Cross-Validation - Evaluating Estimator Performance - Scikit-Learn 1.3.0 Documentation
12 pages
Ovefitting, Generalization, Cross Validation
No ratings yet
Ovefitting, Generalization, Cross Validation
20 pages
Chap 2 Logistique Regression
No ratings yet
Chap 2 Logistique Regression
32 pages
Section 1: Cross-Validation and Model Performance
No ratings yet
Section 1: Cross-Validation and Model Performance
33 pages
Proof of Learning Definition and Practice
No ratings yet
Proof of Learning Definition and Practice
18 pages
Forest Fire Prediction Using Machine Learning
No ratings yet
Forest Fire Prediction Using Machine Learning
28 pages
Artificial Intelligence and Machine Learning (Theory Exam)
No ratings yet
Artificial Intelligence and Machine Learning (Theory Exam)
65 pages
A Survey On Bias Detection in Online News Using Deep Learning
No ratings yet
A Survey On Bias Detection in Online News Using Deep Learning
8 pages
2022hw01sol Na Na
No ratings yet
2022hw01sol Na Na
11 pages
CH 01 Intro To ML - Updated
No ratings yet
CH 01 Intro To ML - Updated
66 pages
A Comprehensive Survey of Grammar Error Correction
No ratings yet
A Comprehensive Survey of Grammar Error Correction
35 pages
MSC Chennamsetty LH 2020
No ratings yet
MSC Chennamsetty LH 2020
56 pages
Hospital Simulation for IPE Students
No ratings yet
Hospital Simulation for IPE Students
12 pages
R08 Multiple Regression and Machine Learning
No ratings yet
R08 Multiple Regression and Machine Learning
24 pages
Predicting Code Coverage Without Execution
No ratings yet
Predicting Code Coverage Without Execution
12 pages
Midterm Lab Exam - AI
No ratings yet
Midterm Lab Exam - AI
13 pages
Automated Software Vulnerability Assessment With Concept Drift
No ratings yet
Automated Software Vulnerability Assessment With Concept Drift
12 pages
PyTorch Cheat Sheet & Quick Reference
No ratings yet
PyTorch Cheat Sheet & Quick Reference
6 pages
Vedant Black Book PDF
No ratings yet
Vedant Black Book PDF
49 pages
Masters Thesis Revised
No ratings yet
Masters Thesis Revised
4 pages
Full-In Production
No ratings yet
Full-In Production
55 pages
Introduction To Machine Learning Notes
No ratings yet
Introduction To Machine Learning Notes
27 pages
Real-Time Soil NPK Estimation Methods
No ratings yet
Real-Time Soil NPK Estimation Methods
6 pages
Virtual Screening
No ratings yet
Virtual Screening
11 pages
Credit Card Fraud Detection Guide
No ratings yet
Credit Card Fraud Detection Guide
37 pages
Passenger Flow Prediction in Bus Transportation System Using Deep Learning
No ratings yet
Passenger Flow Prediction in Bus Transportation System Using Deep Learning
24 pages
CNN 6
No ratings yet
CNN 6
10 pages
Automated Grading Model With Adjusted Level of Lenience For Short Answer Questions Using Natural Language Processing
No ratings yet
Automated Grading Model With Adjusted Level of Lenience For Short Answer Questions Using Natural Language Processing
8 pages
Gmail - Inside The Mind of Claude, Llama 4's Mixture of Vision-Language Experts, More Open Multimodal Models, Neural Net For Tabular Data
No ratings yet
Gmail - Inside The Mind of Claude, Llama 4's Mixture of Vision-Language Experts, More Open Multimodal Models, Neural Net For Tabular Data
16 pages
Horovod
No ratings yet
Horovod
5 pages
AI27
No ratings yet
AI27
10 pages
Heart Disease Prediction with ML
No ratings yet
Heart Disease Prediction with ML
7 pages
Forest Fire Prediction Using Machine Learning
No ratings yet
Forest Fire Prediction Using Machine Learning
15 pages
Fraudability: Estimating Users' Susceptibility To Financial Fraud Using Adversarial Machine Learning
No ratings yet
Fraudability: Estimating Users' Susceptibility To Financial Fraud Using Adversarial Machine Learning
17 pages