Hyperparameter Tuning Mits

The document discusses machine learning models, focusing on the importance of model parameters and hyperparameters in training algorithms. It explains various techniques for hyperparameter optimization, including Grid Search, Random Search, and Bayesian Optimization, highlighting their benefits and drawbacks. Additionally, it provides an example of using GridSearchCV from sklearn to automate the hyperparameter tuning process.

Uploaded by

Kapil Nagwanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views17 pages

Hyperparameter Tuning Mits

Uploaded by

Kapil Nagwanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Kapil Kumar Nagwanshi

PhD (CSE), Sr. Member IEEE, LMCSI, MIAENG

Associate Professor (CSE), ASET

Amity University Rajasthan, Jaipur
▪ Machine learning models are basically mathematical functions that represent the relationship between different
aspects of data. For instance, a linear regression model uses a line to represent the relationship between “features”
and “target.” The formula looks like this:
𝑦 = 𝑤 𝑇𝑥
▪ where x is a vector that represents features of the data and y is a scalar variable that represents the target (some
numeric quantity that we wish to learn to predict).
▪ This model assumes that the relationship between x and y is linear.
▪ The variable w is a weight vector that represents the normal vector for the line; it specifies the slope of the line.
▪ This is what’s known as a model parameter, which is learned during the training phase.

▪ “Training a model” involves using an optimization procedure to determine the best model parameter that “fits” the
data. 𝑛

𝑦 = ෍ 𝑤𝑖 𝑥𝑖
𝑖=0
▪ So a model parameter is a configuration variable that is internal to the model and whose value can be estimated
from the given data.
▪ They are required by the model when making predictions.
▪ Their values define the skill of the model on your problem.
▪ They are estimated or learned from data.
▪ They are often not set manually by the practitioner.
▪ They are often saved as part of the learned model.
▪ In statistics, hyperparameter is a parameter from a prior distribution; it captures the prior belief before
data is observed.
▪ In any machine learning algorithm, these parameters need to be initialized before training a model.
▪ These are values that must be specified outside of the training procedure.
▪ Vanilla linear regression doesn’t have any hyperparameters.
▪ But variants of linear regression do.
▪ Ridge regression and lasso both add a regularization term to linear regression; the weight for the
regularization term is called the regularization parameter.
▪ Decision trees have hyperparameters such as the desired depth and number of leaves in the tree.
Support vector machines (SVMs) require setting a misclassification penalty term.
▪ Kernelized SVMs require setting kernel parameters like the width for radial basis function (RBF)
kernels.
▪ Model Hyperparameters are the properties that govern the entire training process. The below are the
variables usually configure before training a model.
• Learning Rate
• Number of Epochs
• Hidden Layers
• Hidden Units
• Activations Functions
▪ Hyperparameters are important because they directly control the behavior of the training
algorithm and have a significant impact on the performance of the model is being trained.
▪ “A good choice of hyperparameters can really make an algorithm shine”.
▪ Choosing appropriate hyperparameters plays a crucial role in the success of our neural network
architecture. Since it makes a huge impact on the learned model.
▪ For example,
▪ if the learning rate is too low, the model will miss the important patterns in the data.
▪ If it is high, it may have collisions.

▪ Choosing good hyperparameters gives two benefits:

▪ Efficiently search the space of possible hyperparameters
▪ Easy to manage a large set of experiments for hyperparameter tuning.
The process of finding most
optimal hyperparameters in
machine learning is called Common algorithms include:
hyperparameter
optimisation.
• Grid Search
• Random Search
• Bayesian Optimisation
▪ Grid search is a very traditional technique for implementing
hyperparameters. It brute force all combinations. Grid search
requires to create two set of hyperparameters.
▪ Learning Rate
▪ Number of Layers

▪ Grid search trains the algorithm for all combinations by using the two
set of hyperparameters (learning rate and number of layers) and
measures the performance using “Cross Validation” technique.
▪ This validation technique gives assurance that our trained model got
most of the patterns from the dataset.
▪ One of the best methods to do validation by using “K-Fold Cross
Validation” which helps to provide ample data for training the model
and ample data for validations.
▪ The Grid search method is a simpler algorithm to use but it suffers if
data have high dimensional space called the curse of dimensionality.

https://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf
▪ It is difficult to manually change the hyperparameters and fit them on my training data every
time. Here’s why:
• it is time-consuming
• it is hard to keep track of hyperparameters we tried and we still have to try

▪ So, I quickly asked Google if there was any solution to my problem and Google showed me
something called GridSearchCV from Sklearn. Let me share how I took advantage of this
GridSearchCV to solve my problem with a simple example.
▪ GridSearchCV is a library function that is a member of sklearn’s model_selection package. It
helps to loop through predefined hyperparameters and fit your estimator (model) on your
training set. So, in the end, you can select the best parameters from the listed
hyperparameters.
▪ In addition to that, you can specify the number of times for the cross-validation for each set
of hyperparameters.
1. estimator: estimator object you created
2. params_grid: the dictionary object that holds the hyperparameters you want
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier to try
kn = KNeighborsClassifier() 3. scoring: evaluation metric that you want to use, you can simply pass a valid
string/ object of evaluation metric
params = { 4. cv: number of cross-validation you have to try for each selected set of
'n_neighbors' : [5, 25], hyperparameters
'weights': ['uniform', 'distance’],
5. verbose: you can set it to 1 to get the detailed print out while you fit the
'algorithm': ['auto', 'ball_tree', 'kd_tree’,
'brute’] data to GridSearchCV
} 6. n_jobs: number of processes you wish to run in parallel for this task if it -1 it
will use all available processors.
grid_kn = GridSearchCV(estimator = kn,
param_grid = params,
coring = 'accuracy’, That is all pretty much you need to define. Then you have to fit your training
cv = 5, data as you do normally. You will get the first line printed like this:
verbose = 1, Fitting 5 folds for each of 16 candidates, totalling 80 fits
n_jobs = -1) ...
... • Are you confused what it means?
grid_kn.fit(X_train, y_train) ... • Simple! Since we have to try two options for n_neighbors, two
for weights and four for algorithms, altogether there are 16
different combinations we should try out.
• And for each combination, we have 5 CV fits, so 80 different fits
will be tested by our GridSearcCV object.

The time for this fit depends on the number of hyperparameters you are trying out. Once everything is finished, you will get
an output like this: [Parallel(n_jobs=1)]: Done 80 out of 80 | elapsed: 74.1min finished]
Then to know what are the best parameters you can simply print it with

# extract best estimator

print(grid_kn.best_estimator_)

Output:
KNeighborsClassifier(algorithm='auto',
leaf_size=30, metric='minkowski',metric_params=None, n_jobs=-1,
n_neighbors=25, p=2, weights='distance’)

# to test the bestfit

print(grid_kn.score(X_test, y_test))

Output:
0.9524753
▪ Randomly samples the search space and evaluates sets from a
specified probability distribution.
▪ For example, Instead of trying to check all 100,000 samples, we can
check 1000 random parameters.
▪ Drawback
▪ However, it doesn’t use information from prior experiments to select the next set
and also it is very difficult to predict the next of experiments.
▪ Hyperparameter setting maximizes the performance of the model on a validation set.

▪ Machine learning algorithms frequently require to fine-tuning of model hyperparameters.

▪ Unfortunately, that tuning is often called as ‘black function’ because it cannot be written into a
formula since the derivates of the function are unknown.
▪ Much more appealing way to optimize and fine-tune hyperparameters are enabling
automated model tuning approach by using Bayesian optimization algorithm.
▪ The model used for approximating the objective function is called surrogate model.
▪ A popular surrogate model for Bayesian optimization is Gaussian process (GP).

▪ Bayesian optimization typically works by assuming the unknown function was sampled from a
Gaussian Process (GP) and maintains a posterior distribution for this function as observations
are made.
▪ There are two major choices must be made when performing Bayesian
optimization.
▪ Select prior over functions that will express assumptions about the function being
optimized. For this, we choose Gaussian Process prior
▪ Next, we must choose an acquisition function which is used to construct a utility function
from the model posterior, allowing us to determine the next point to evaluate.
▪ A Gaussian process defines the prior distribution over functions which can be
converted into a posterior over functions once we have seen some data.
▪ The Gaussian process uses Covariance matrix to ensure that values that are close
together.
▪ The covariance matrix along with a mean µ function to output the expected value
ƒ(x) defines a Gaussian process.
▪ 1. Gaussian process will be used as a prior for Bayesian inference
▪ 2. To computing the posterior is that it can be used to make predictions for unseen test cases.
▪ Acquisition Function
▪ Introducing sampling data into the search space is done by acquisition functions. It helps to
maximize the acquisition function to determine the next sampling point. Popular acquisition
functions are
• Maximum Probability of Improvement (MPI)
• Expected Improvement (EI)
• Upper Confidence Bound (UCB)
▪ The Expected Improvement (EI) function seems to be a popular one. It is defined as
▪ EI(x)=𝔼[max{0,ƒ(x)−ƒ(x̂ )}]

▪ where ƒ(x̂ ) is the current optimal set of hyperparameters. Maximising the hyperparameters will
improve upon ƒ.
1. EI is high when the posterior expected value of the loss µ(x) is higher than the current best value ƒ(x̂)
2. EI is high when the uncertainty σ(x)σ(x) around the point xx is high.
• “Random Search for Hyper-Parameter Optimization.” James Bergstra and Yoshua Bengio. Journal of
Machine Learning Research, 2012.
• “Algorithms for Hyper-Parameter Optimization.” James Bergstra, Rémi Bardenet, Yoshua Bengio,
and Balázs Kégl.” Neural Information Processing Systems, 2011. See also a SciPy 2013 talk by the
authors.
• “Practical Bayesian Optimization of Machine Learning Algorithms.” Jasper Snoek, Hugo Larochelle,
and Ryan P. Adams. Neural Information Processing Systems, 2012.
• “Sequential Model-Based Optimization for General Algorithm Configuration.” Frank Hutter, Holger
H. Hoos, and Kevin Leyton-Brown. Learning and Intelligent Optimization, 2011.
• “Lazy Paired Hyper-Parameter Tuning.” Alice Zheng and Mikhail Bilenko. International Joint
Conference on Artificial Intelligence, 2013.
• Introduction to Derivative-Free Optimization (MPS-SIAM Series on Optimization). Andrew R. Conn,
Katya Scheinberg, and Luis N. Vincente, 2009.
• Gradient-Based Hyperparameter Optimization Through Reversible Learning. Dougal Maclaurin,
David Duvenaud, and Ryan P. Adams. ArXiv, 2015.

Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
17 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
13 pages
Hyper Parameters
No ratings yet
Hyper Parameters
24 pages
ML Chap 5
No ratings yet
ML Chap 5
14 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Lec 04 05
No ratings yet
Lec 04 05
37 pages
Hyperparameter Optimization Techniques
No ratings yet
Hyperparameter Optimization Techniques
4 pages
Hyperparameter Tuning Guide
No ratings yet
Hyperparameter Tuning Guide
7 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Lecture 9 Model Selection
No ratings yet
Lecture 9 Model Selection
15 pages
Skit Learn Cheatsheet
No ratings yet
Skit Learn Cheatsheet
11 pages
Grid Search
No ratings yet
Grid Search
48 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
ML Answer Key (M.tech)
No ratings yet
ML Answer Key (M.tech)
31 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
9 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
13 pages
Hyper Parameter New
No ratings yet
Hyper Parameter New
4 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Lecture6c HyperparameterOptimization
No ratings yet
Lecture6c HyperparameterOptimization
19 pages
Hyperparameters
No ratings yet
Hyperparameters
2 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
Final ML
No ratings yet
Final ML
2 pages
Hyper Parameters
No ratings yet
Hyper Parameters
7 pages
Unit 1
No ratings yet
Unit 1
11 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
ML Questions Answers
No ratings yet
ML Questions Answers
4 pages
B24 ML Exp-3
No ratings yet
B24 ML Exp-3
10 pages
Updated Lecture 12 Zainab
No ratings yet
Updated Lecture 12 Zainab
17 pages
Mastering The Basics of Machine Learning
No ratings yet
Mastering The Basics of Machine Learning
65 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
64 pages
08 Classification
No ratings yet
08 Classification
46 pages
Predicting Student Pass Rates
No ratings yet
Predicting Student Pass Rates
17 pages
PS Notes (Machine Learning
No ratings yet
PS Notes (Machine Learning
14 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
8 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
63 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
DL Unit-2 - Deep Learning Unit 2 Material DL Unit-2 - Deep Learning Unit 2 Material
No ratings yet
DL Unit-2 - Deep Learning Unit 2 Material DL Unit-2 - Deep Learning Unit 2 Material
37 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
4 pages
Supple Maximizing Performance in Cs CuBiCl
No ratings yet
Supple Maximizing Performance in Cs CuBiCl
5 pages
Machine Learning Cheat Sheet: Karn Singh
No ratings yet
Machine Learning Cheat Sheet: Karn Singh
13 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
ML Cheat
No ratings yet
ML Cheat
9 pages
RO47002 - Lecture 2C - Hyperparameters and Cross-Validation
No ratings yet
RO47002 - Lecture 2C - Hyperparameters and Cross-Validation
10 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
BigData Week13
No ratings yet
BigData Week13
62 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
3 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
7 pages
Regression Pipeline in Machine Learning
No ratings yet
Regression Pipeline in Machine Learning
58 pages
Data Analysis ch1
No ratings yet
Data Analysis ch1
13 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Intro ML Linear Classifier
No ratings yet
Intro ML Linear Classifier
18 pages
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
No ratings yet
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
5 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
DL Unit1
100% (1)
DL Unit1
61 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Machine Learning Made Easy: A Review of Scikit-Learn Package in Python Programming Language
No ratings yet
Machine Learning Made Easy: A Review of Scikit-Learn Package in Python Programming Language
14 pages
Energy Forecasting Techniques and Technologies
No ratings yet
Energy Forecasting Techniques and Technologies
33 pages
Neural Network - Test Questions
No ratings yet
Neural Network - Test Questions
9 pages
Understanding DNN Building Blocks
No ratings yet
Understanding DNN Building Blocks
21 pages
Optimize Random Forest Hyperparameters
No ratings yet
Optimize Random Forest Hyperparameters
27 pages
AI Lab Glaucoma Classification
No ratings yet
AI Lab Glaucoma Classification
7 pages
AI-Driven Topology Optimization Method
No ratings yet
AI-Driven Topology Optimization Method
10 pages
Deep Learning - AD3501 - Important Questions and 2 Marks With Answer - Unit 4 - Model Evaluation
No ratings yet
Deep Learning - AD3501 - Important Questions and 2 Marks With Answer - Unit 4 - Model Evaluation
12 pages
Gradient Boosting Hyperparameter Tuning Tips
No ratings yet
Gradient Boosting Hyperparameter Tuning Tips
12 pages
Optimal Placement of Public Electric Vehicle Charging Stations Using Deep Reinforcement Learning
No ratings yet
Optimal Placement of Public Electric Vehicle Charging Stations Using Deep Reinforcement Learning
9 pages
Deepseek LLM
No ratings yet
Deepseek LLM
48 pages
Hyper Parameter Tuning Batch Normalization
No ratings yet
Hyper Parameter Tuning Batch Normalization
37 pages
Unit 3
No ratings yet
Unit 3
17 pages
Forest Fire Prediction Sem 8 - Review 1
No ratings yet
Forest Fire Prediction Sem 8 - Review 1
33 pages
HCA 5 Unit Notes
No ratings yet
HCA 5 Unit Notes
73 pages
Module I
No ratings yet
Module I
4 pages
Lab Manual
No ratings yet
Lab Manual
100 pages
Project
No ratings yet
Project
25 pages
AI Paradigms: Data vs. Model-Centric
No ratings yet
AI Paradigms: Data vs. Model-Centric
9 pages
Exam Killer
100% (1)
Exam Killer
246 pages
Hyperparameter Tuning - GeeksforGeeks
No ratings yet
Hyperparameter Tuning - GeeksforGeeks
23 pages
Exploring Text-Based Emotions Recognition Machine
No ratings yet
Exploring Text-Based Emotions Recognition Machine
8 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Land Use Land Cover Lulc Change Analysis in Baguio
No ratings yet
Land Use Land Cover Lulc Change Analysis in Baguio
7 pages
Unified Machine Learning Assisted Design of Sta 2023 Journal of Construction
No ratings yet
Unified Machine Learning Assisted Design of Sta 2023 Journal of Construction
11 pages
Machine Learning Interviews
100% (3)
Machine Learning Interviews
22 pages
Deep Learning in Population Genetics
No ratings yet
Deep Learning in Population Genetics
48 pages
DL Question Bank
No ratings yet
DL Question Bank
5 pages
Analytical Methods of Machine Learning Model For E-Commerce Sales Analysis and Prediction
No ratings yet
Analytical Methods of Machine Learning Model For E-Commerce Sales Analysis and Prediction
6 pages