We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7
MADHAV INSTITUTE OF TECHNOLOGY AND
{ { SCIENCE GWALIOR (M.P)
INTRODUCTION TO ARTIFICIAL INTELLIGRNCE AND DATA SCIENCE
(3270121)
Submitted To: Submitted By:
Dr. Bhagat S. Raghuwanshi Sunil Dhakad (0901AD231065) Centre for AI Topic Training ,Testing and Validation in Machine Learning modeling Types of datasets used in machine learning model A data set (or dataset) is a collection of data. dataset is define as a collection of data that is treated as a single unit by a computer. This means that a dataset contains a lot of separate pieces of data but can be used to train an algorithm with the goal of finding predictable patterns inside the whole dataset. In particular, three data sets are commonly used in different stages of the creation of the model: Training Dataset Validation Dataset Testing Dataset Training dataset Training data (or a training dataset) is the intial data used to train machine learning model. Training data sets are fed to machine learning algorithms to teach them how to make predictions or perform a task . It is the set of data that make the model learn the hidden features/ patterns in the data set. The model evaluates the data repeatedly to learn more about the data’s behavior and then adjusts itself to serve its intended purpose. In each epoch, the same training data is fed to the neural network repeatedly, and the model continues to learn the features of the data The traing set should have a diversified set of input so that the model is trained in all the scenarios and can predict any unseen data sample that may appear in the future Validation dataset Validation dataset is a set of data , separate from the training set , that is used to validate our model performance during training. Validation data provides the first test against unseen data, allowing data scientists to evaluate how well the model makes predictions based on the new data. This validation process gives information that helps us tune the model’s hyperparameters and configurations accordingly. It is like a critic telling us whether the training is moving in the right direction or not. The main idea of splitting the dataset into a validation set is to prevent our model from overfitting i.e., the model becomes really good at classifying the samples in the training set but cannot generalize and make accurate classifications on the data it has not seen before. Testing dataset A test data set is a data set that is independent of the training data set , but that follows the same probability distribution as the training data set . The test data set is used to test the model after completing the training. After the model is built, testing data once again validates that it can make accurate predictions. It provides an unbiased final model performance metric in terms of accuracy, precision, etc. Test data provides a final, real-world check of an unseen dataset to confirm that the ML algorithm was trained effectively. If training and validation data include labels to monitor performance metrics of the model, the testing data should be unlabeled. Thank you