0% found this document useful (0 votes)
11 views7 pages

Intro To Aids Proficency Sunil

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Intro To Aids Proficency Sunil

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

MADHAV INSTITUTE OF TECHNOLOGY AND

{ {
SCIENCE
GWALIOR (M.P)

INTRODUCTION TO ARTIFICIAL INTELLIGRNCE AND DATA SCIENCE


(3270121)

Submitted To: Submitted By:


Dr. Bhagat S. Raghuwanshi Sunil Dhakad (0901AD231065)
Centre for AI
Topic
Training ,Testing and Validation in Machine Learning
modeling
Types of datasets used in machine learning
model
 A data set (or dataset) is a collection of data.
 dataset is define as a collection of data that is treated as a single unit by a computer.
This means that a dataset contains a lot of separate pieces of data but can be used to
train an algorithm with the goal of finding predictable patterns inside the whole
dataset.
 In particular, three data sets are commonly used in different stages of the creation of
the model:
 Training Dataset
 Validation Dataset
 Testing Dataset
Training dataset
 Training data (or a training dataset) is the intial data used to train machine learning
model.
 Training data sets are fed to machine learning algorithms to teach them how to
make predictions or perform a task .
 It is the set of data that make the model learn the hidden features/ patterns in the
data set.
 The model evaluates the data repeatedly to learn more about the data’s behavior
and then adjusts itself to serve its intended purpose.
 In each epoch, the same training data is fed to the neural network repeatedly, and
the model continues to learn the features of the data
 The traing set should have a diversified set of input so that the model is trained in
all the scenarios and can predict any unseen data sample that may appear in the
future
Validation dataset
 Validation dataset is a set of data , separate from the training set , that is used to
validate our model performance during training.
 Validation data provides the first test against unseen data, allowing data scientists to
evaluate how well the model makes predictions based on the new data.
 This validation process gives information that helps us tune the model’s
hyperparameters and configurations accordingly. It is like a critic telling us whether
the training is moving in the right direction or not.
 The main idea of splitting the dataset into a validation set is to prevent our model
from overfitting i.e., the model becomes really good at classifying the samples in the
training set but cannot generalize and make accurate classifications on the data it
has not seen before.
Testing dataset
 A test data set is a data set that is independent of the training data set , but that
follows the same probability distribution as the training data set .
 The test data set is used to test the model after completing the training.
 After the model is built, testing data once again validates that it can make
accurate predictions.
 It provides an unbiased final model performance metric in terms of accuracy,
precision, etc.
 Test data provides a final, real-world check of an unseen dataset to confirm that
the ML algorithm was trained effectively.
 If training and validation data include labels to monitor performance metrics of
the model, the testing data should be unlabeled.
Thank you

You might also like