Hyperparameters Optimization methods - ML
Last Updated :
12 Dec, 2023
In this article, we will discuss the various hyperparameter optimization techniques and their major drawback in the field of machine learning.
What are the Hyperparameters?
Hyperparameters are those parameters that we set for training. Hyperparameters have major impacts on accuracy and efficiency while training the model. Therefore it needed to be set accurately to get better and efficient results. Let's discuss some Hyperparameters Optimization Methods to optimize the hyperparameter.
Hyperparameters Optimization Technique
Exhaustive Search Methods
Let's first discuss some Exhaustive Search Methods to optimize the hyperparameter.
- Grid Search: In Grid Search, the possible values of hyperparameters are defined in the set. Then these sets of possible values of hyperparameters are combined by using Cartesian product and form a multidimensional grid. Then we try all the parameters in the grid and select the hyperparameter setting with the best result.
- Random Search: This is another variant of Grid Search in which instead of trying all the points in the grid we try random points. This solves a couple of problems that are in Grid Search such as we don't need to expand our search space exponentially every time add a new hyperparameter

Drawback:
Random Search and Grid Search are easy to implement and can run in parallel but here are few drawbacks of these algorithm:
- If the hyperparameter search space is large, it takes a lot of time and computational power to optimize the hyperparameter.
- There is no guarantee that these algorithms find local maxima if the sample is not meticulously done.
Bayesian Optimization:
Instead of random guess, In bayesian optimization we use our previous knowledge to guess the hyper parameter. They use these results to form a probabilistic model mapping hyperparameters to a probability function of a score on the objective function. These probability function is defined below.
P\left ( \frac{score (y)}{hyperparameters (x)} \right )
This function is also called "surrogate" of objective function. It is much easier to optimize than Objective function. Below are the steps for applying Bayesian Optimization for hyperparameter optimization:
- Build a surrogate probability model of the objective function
- Find the hyperparameters that perform best on the surrogate
- Apply these hyperparameters to the original objective function
- Update the surrogate model by using the new results
- Repeat steps 2–4 until n number of iteration
Sequential Model-Based Optimization:
Sequential Model-Based Optimization (SMBO) is a method of applying Bayesian optimization. Here sequential refers to running trials one after another, each time improving hyperparameters by applying Bayesian probability model (surrogate).
There are 5 important parameters of SMBO:
- Domain of the hyperparameter over which .
- An objective function which outputs a score which we want to optimize.
- A surrogate distribution of objective function
- A selection function to select which hyperparameter to choose next. Generally we take Expected Improvement into the consideration
- A data structure contains history of previous (score, hyperparmeter) pairs which are used in previous iterations.
There are many different version SMBO hyperparameter optimization algorithm. These common difference between them is the surrogate functions. Some surrogate function such as Gaussian Process, Random Forest Regression, Tree Prazen Estimator. In this post we will discuss Tree Prazen Estimator below.
Tree Prazen Estimators:
Tree Prazen Estimators uses tree-structure for optimizing the hyperparameter. Many hyperprameter can be optimized by using this method such as number of layers, optimizer in the model, number of neurons in each layer. In tree prazen estimator instead of calculating P\left ( \frac{y}{x} \right )
we calculate P\left ( \frac{x}{y} \right )
and P(y) (where y is an intermediate score that decides how good this hyperparameter values such as validation loss and x is hyperparameter).
In first of Tree Prazen Estimator, we sample the validation loss by random search in order to initialize the algorithm. Then we divide the observations into two groups: the best performing one (e.g. the upper quartile) and the rest, taking y* as the splitting value for the two groups.
Then we calculate the probability of hyperparameter being in each of these groups such as
P\left ( \frac{x}{y} \right )
= f(x) if y<y* and P\left ( \frac{x}{y} \right )
= g(x) if y>y*.
The two densities and g are modelled using Parzen estimators (also known as kernel density estimators) which are a simple average of kernels centred on existing data points.
P(y) is calculated using the fact that p(y<y*)= f(y*) which defines the percentile split in the two categories.
Using Baye’s rule (i.e. p(x, y) = p(y) P\left ( \frac{x}{y} \right )
), it can be shown ) that the definition of expected improvements equivalent to f(x)/g(x).
In this final step we try to maximize the \frac{f(x)}{g(x)}
Drawback:
The biggest disadvantage of Tree Prazen Estimator that it selects hyperparameter independently of each other, that somehow effects the efficiency and computation required because in most of the neural networks there are relationships between different hperparameters
Other Hyperparameter Estimation Algorithms:
Hyperband:
The underlying principle of this algorithm is that if a hyperparameter configuration is destined to be the best after a large number of iterations, it is more likely to perform in the top half of configurations after a small number of iterations. Below is step-by-step implementation of Hyperband.
- Randomly sample n number of hyperparameter sets in the search space.
- After k iterations evaluate the validation loss of these hyperpameters.
- Discard the half of lowest performing hyperparameters .
- Run the good ones for k iterations more and evaluate and discard the bottom half.
- Repeat until we have only one model of hyperparameter left.
Drawbacks:
If number of samples is large some good performing hyperparameter sets which required some time to converge may be discarded early in the optimization.
Population based Training (PBT):
Population based Training (PBT) starts similar to random based training by training many models in parallel. But rather than the networks training independently, it uses information from the remainder of the population to refine the hyperparameters and direct computational resources to models which show promise. This takes its inspiration from genetic algorithms where each member of the population, referred to as a worker, can exploit information from the rest of the population. for instance, a worker might copy the model parameters from a far better performing worker. It also can explore new hyperparameters by changing the present values randomly.
Bayesian Optimization and HyperBand (BOHB):
BOHB (Bayesian Optimization and HyperBand) is a combination of the Hyperband algorithm and Bayesian optimization. First, it uses Hyperband capability to sample many configurations with a small budget to explore quickly and efficiently the hyper-parameter search space and get very soon promising configurations, then it uses the Bayesian optimizer predictive capability to propose set of hyperparameters that are close to optimum.This algorithm can also be run in parallel (as Hyperband) which overcomes a strong drawback of Bayesian optimization.
Similar Reads
Hyperparameter Optimization Based on Bayesian Optimization
In this article we explore what is hyperparameter optimization and how can we use Bayesian Optimization to tune hyperparameters in various machine learning models to obtain better prediction accuracy. Before we dive into the how's of implementing Bayesian Optimization, let us learn what is meant by
7 min read
Hyperparameter tuning
Machine Learning model is defined as a mathematical model with several parameters that need to be learned from the data. By training a model with existing data we can fit the model parameters. However there is another kind of parameter known as hyperparameters which cannot be directly learned from t
8 min read
Hyperparameter Tuning with R
In R Language several techniques and packages can be used to optimize these hyperparameters, leading to better, more reliable models. in this article, we will discuss all the techniques and packages for Hyperparameter Tuning with R.What are Hyperparameters?Hyperparameters are the settings that contr
5 min read
Cross-validation and Hyperparameter tuning of LightGBM Model
In a variety of industries, including finance, healthcare, and marketing, machine learning models have become essential for resolving challenging real-world issues. Gradient boosting techniques have become incredibly popular among the myriad of machine learning algorithms due to their remarkable pre
14 min read
Sklearn | Model Hyper-parameters Tuning
Hyperparameter tuning is the process of finding the optimal values for the hyperparameters of a machine-learning model. Hyperparameters are parameters that control the behaviour of the model but are not learned during training. Hyperparameter tuning is an important step in developing machine learnin
12 min read
CatBoost Bayesian optimization
Bayesian optimization is a powerful and efficient technique for hyperparameter tuning of machine learning models and CatBoost is a very popular gradient boosting library which is known for its robust performance in various tasks. When we combine both, Bayesian optimization for CatBoost can offer an
10 min read
ML | ADAM (Adaptive Moment Estimation) Optimization
Prerequisite: Optimization techniques in Gradient Descent Gradient Descent is applicable in the scenarios where the function is easily differentiable with respect to the parameters used in the network. It is easy to minimize continuous functions than minimizing discrete functions. The weight update
2 min read
LightGBM Regularization parameters
LightGBM is a powerful gradient-boosting framework that has gained immense popularity in the field of machine learning and data science. It is renowned for its efficiency and effectiveness in handling large datasets and high-dimensional features. One of the key reasons behind its success is its abil
11 min read
HyperParameter Tuning: Fixing Overfitting in Neural Networks
Overfitting is a pervasive problem in neural networks, where the model becomes too specialized to the training data and fails to generalize well to new, unseen data. This issue can be addressed through hyperparameter tuning, which involves adjusting various parameters to optimize the performance of
6 min read
Random Forest Hyperparameter Tuning in Python
Random Forest is one of the most popular machine learning algorithms used for both classification and regression tasks. It works by building multiple decision trees and combining their outputs to improve accuracy and control overfitting. While Random Forest is a robust model, fine-tuning its hyperpa
5 min read