Open In App

Hyperparameter tuning

Last Updated : 11 Mar, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Machine Learning model is defined as a mathematical model with several parameters that need to be learned from the data. By training a model with existing data we can fit the model parameters. However there is another kind of parameter known as hyperparameters which cannot be directly learned from the regular training process.

These hyperparameters are typically set before the actual training process begins and control aspects of the learning process itself. They influence the model’s performance, its complexity and how fast it learns. This article aims to explore various strategies for tuning hyperparameters to optimize machine learning models.

Understanding Hyperparameter Tuning

Hyperparameter tuning is the process of selecting the optimal values for a machine learning model’s hyperparameters. Hyperparameters are configuration settings that control the learning process of the model.

For example the learning rate and number of neurons in a neural network in a neural network or the kernel size in a support vector machine can significantly impact how well the model trains and generalizes. The goal of hyperparameter tuning is to find the values that lead to the best performance on a given task.

In the context of machine learning, hyperparameters are configuration variables that are set before the training process begins. Unlike model parameters, which are learned directly from the data during training, hyperparameters are chosen beforehand and influence the way the model learns. These settings can affect both the speed and the quality of the model’s performance. For instance, a high learning rate can cause the model to converge too quickly, possibly skipping over the optimal solution, while a low learning rate might lead to a slower convergence, requiring more time and computational resources.

Note: Different models have different hyperparameters and they need to be tuned accordingly.

Techniques for Hyperparameter Tuning

Models can have many hyperparameters and finding the best combination of parameters can be treated as a search problem. The two best strategies for Hyperparameter tuning are:

1. GridSearchCV 

GridSearchCV is often considered a “brute force” approach to hyperparameter optimization. It works by fitting the model using all possible combinations of predefined hyperparameter values. A grid of potential discrete values for each hyperparameter is created and the model is trained for each possible combination. The performance of each set is logged and the combination that produces the best results is selected.

While grid search is exhaustive and can find the ideal hyperparameter combination it has some notable disadvantages. The main drawback is that it can be very slow and computationally expensive because it requires training the model with every potential combination of hyperparameters. This may not be feasible for large datasets or models with many hyperparameters.

For example if we want to tune two hyperparameters C and Alpha for a Logistic Regression Classifier model with the following sets of values:
C = [0.1, 0.2, 0.3, 0.4, 0.5]
Alpha = [0.01, 0.1, 0.5, 1.0]

GridSearchCV

The grid search technique will construct multiple versions of the model with all possible combinations of C and Alpha, resulting in a total of 5 * 4 = 20 different models. The best-performing combination is then chosen.

The following code illustrates how to use GridSearchCV 

Python
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
import numpy as np
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=1000, n_features=20, n_informative=10, n_classes=2, random_state=42)

c_space = np.logspace(-5, 8, 15)
param_grid = {'C': c_space}

logreg = LogisticRegression()

logreg_cv = GridSearchCV(logreg, param_grid, cv=5)

logreg_cv.fit(X, y)

print("Tuned Logistic Regression Parameters: {}".format(logreg_cv.best_params_))
print("Best score is {}".format(logreg_cv.best_score_))

Output:

Tuned Logistic Regression Parameters: {‘C’: 0.006105402296585327}
Best score is 0.853

This represents the highest accuracy achieved by the model using the hyperparameter combination C = 0.0061. The best score of 0.853 means the model achieved 85.3% accuracy on the validation data during the grid search process.

Drawback: GridSearchCV will go through all the intermediate combinations of hyperparameters which makes grid search computationally very expensive.   

2. RandomizedSearchCV 

As the name suggests the random search method selects values at random as opposed to the predetermined set of values used by GridSearchCV. In each iteration RandomizedSearchCV attempts a different set of hyperparameters and logs the model’s performance. After several iterations it returns the combination that yielded the best result. This approach can reduce unnecessary computation by exploring a wider range of hyperparameters in fewer iterations.

RandomizedSearchCV addresses the drawbacks of GridSearchCV by going through only a fixed number of hyperparameter settings. It randomly samples from the hyperparameter space, potentially finding a good combination faster. While it may not exhaustively search the entire space, it is often more efficient, especially when the hyperparameter space is large.

The following code illustrates how to use RandomizedSearchCV 

Python
import numpy as np
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_classes=2, random_state=42)

from scipy.stats import randint
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import RandomizedSearchCV

param_dist = {
    "max_depth": [3, None],
    "max_features": randint(1, 9),
    "min_samples_leaf": randint(1, 9),
    "criterion": ["gini", "entropy"]
}

tree = DecisionTreeClassifier()
tree_cv = RandomizedSearchCV(tree, param_dist, cv=5)
tree_cv.fit(X, y)

print("Tuned Decision Tree Parameters: {}".format(tree_cv.best_params_))
print("Best score is {}".format(tree_cv.best_score_))

Output:

Tuned Decision Tree Parameters: {‘criterion’: ‘entropy’, ‘max_depth’: None, ‘max_features’: 6, ‘min_samples_leaf’: 6}
Best score is 0.8

A score of 0.842 means the model performed with an accuracy of 84.2% on the validation set with following hyperparameters.

Drawback: One potential drawback of RandomizedSearchCV is that since the search is random, it may not find the absolute best combination of hyperparameters. However in practice it often generate a near-optimal result in a fraction of the time compared to grid search.

3. Bayesian Optimization

Grid search and random search are often inefficient because they evaluate many unsuitable hyperparameter combinations without considering the results from previous iterations. Bayesian optimization takes a more intelligent approach by treating the search for optimal hyperparameters as an optimization problem.

Unlike random and grid search methods Bayesian optimization uses a probabilistic model that considers past evaluation results to select the next set of hyperparameters. This method uses a surrogate function to predict the performance of new hyperparameter combinations based on prior evaluations. It applies a probabilistic approach to estimate which combinations are most likely to yield good results.

The surrogate function is a probabilistic estimate of the objective function (e.g., root-mean-square error, RMSE), which is typically too expensive to compute directly. The model aims to maximize or minimize the objective function with as few evaluations as possible.

[Tex]P(\text{score}(y) \mid \text{hyperparameters}(x))[/Tex]

Here the surrogate function models the relationship between hyperparameters [Tex]x[/Tex] and the score [Tex]y[/Tex]. By updating this model iteratively with each new evaluation, Bayesian optimization makes more informed decisions, reducing the number of function evaluations required to find an optimal solution.

Common surrogate models used in Bayesian optimization include:

  • Gaussian Processes
  • Random Forest Regression
  • Tree-structured Parzen Estimators (TPE)

Drawback: Requires an understanding of the underlying probabilistic model.

Advantages of Hyperparameter tuning

  • Improved Model Performance: Finding the optimal combination of hyperparameters can significantly boost model accuracy and robustness.
  • Reduced Overfitting and Underfitting: Tuning helps to prevent both overfitting and underfitting resulting in a well-balanced model.
  • Enhanced Model Generalizability: By selecting hyperparameters that optimize performance on validation data the model is more likely to generalize well to unseen data.
  • Optimized Resource Utilization: With careful tuning resources such as computation time and memory can be used more efficiently avoiding unnecessary work.
  • Improved Model Interpretability: Properly tuned hyperparameters can make the model simpler and easier to interpret.

Challenges in Hyperparameter Tuning

  • Dealing with High-Dimensional Hyperparameter Spaces: The larger the hyperparameter space the more combinations need to be explored. This makes the search process computationally expensive and time-consuming especially for complex models with many hyperparameters.
  • Handling Expensive Function Evaluations: Evaluating a model’s performance can be computationally expensive, particularly for models that require a lot of data or iterations.
  • Incorporating Domain Knowledge: It can help guide the hyperparameter search, narrowing down the search space and making the process more efficient. Using insights from the problem context can improve both the efficiency and effectiveness of tuning.
  • Developing Adaptive Hyperparameter Tuning Methods: Dynamic adjustment of hyperparameters during training such as learning rate schedules or early stopping can lead to better model performance.


Next Article

Similar Reads