Difference Between StandardScaler and Normalizer in sklearn.preprocessing
Last Updated :
23 Jul, 2025
Preprocessing step in machine learning task that helps improve the performance of models. Two commonly used techniques in the sklearn.preprocessing module are StandardScaler and Normalizer. Although both are used to transform features, they serve different purposes and apply different methods.
In this article, we will explore the differences between StandardScaler and Normalizer, and provide implementations to illustrate their usage.
StandardScaler
StandardScaler standardizes features by removing the mean and scaling to unit variance. It transforms the data to have a mean of 0 and a standard deviation of 1. This process is also known as z-score normalization.
The transformation applied by StandardScaler can be represented as: X_{scaled} = \frac{X - \mu}{\sigma}
Where:
- X is the original feature.
- \mu is the mean of the feature.
- \sigma is the standard deviation of the feature.
When to Use StandardScaler?
Normalizer
Normalizer scales individual samples to have unit norm. It transforms each sample (row) to a unit vector, which helps maintain the direction of the data while scaling.
For each sample x:
x' = \frac{x}{||x||}
Where ∣∣x∣∣ is the norm of the vector x. The default norm used is L2 norm (Euclidean distance), but you can specify L1 or max norms as well.
When to Use Normalizer?
- High-Dimensional Sparse Data: Useful for text classification, image processing, or any situation where the focus is on the direction of the data points.
- Data with Varying Magnitudes: When the magnitude of the feature vectors matters less than their direction.
Key Differences Between StandardScaler and Normalizer
Aspect | StandardScaler | Normalizer |
---|
Operation Basis | Feature-wise (across columns) | Sample-wise (across rows) |
Purpose | Standardizes features to zero mean and unit variance | Scales samples to unit norm (L2 by default) |
Impact on Data | Alters the mean and variance of each feature | Adjusts the magnitude of each sample vector |
Common Use Cases | Regression, PCA, algorithms sensitive to variance | Text classification, k-NN, direction-focused tasks |
Formula | \frac{X - \mu}{\sigma} | \frac{X}{\|X\|_2} |
Implementation: StandardScaler and Normalizer
Let’s illustrate the differences between StandardScaler and Normalizer using a sample dataset. We will create a synthetic dataset and apply both transformations.
Python
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler, Normalizer
# Creating a synthetic dataset
data = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Convert to DataFrame for better visualization
df = pd.DataFrame(data, columns=['Feature1', 'Feature2', 'Feature3'])
# Applying StandardScaler
scaler = StandardScaler()
standardized_data = scaler.fit_transform(df)
# Applying Normalizer
normalizer = Normalizer()
normalized_data = normalizer.fit_transform(df)
# Displaying the results
print("Original Data:\n", df)
print("\nStandardized Data (StandardScaler):\n", standardized_data)
print("\nNormalized Data (Normalizer):\n", normalized_data)
Output:
Original Data:
Feature1 Feature2 Feature3
0 1 2 3
1 4 5 6
2 7 8 9
Standardized Data (StandardScaler):
[[-1.22474487 -1.22474487 -1.22474487]
[ 0. 0. 0. ]
[ 1.22474487 1.22474487 1.22474487]]
Normalized Data (Normalizer):
[[0.26726124 0.53452248 0.80178373]
[0.45584231 0.56980288 0.68376346]
[0.50257071 0.57436653 0.64616234]]
Conclusion
Both StandardScaler and Normalizer are essential tools in the preprocessing step of machine learning workflows, but they serve distinct purposes. StandardScaler is ideal for standardizing features to have a mean of 0 and a standard deviation of 1, making it suitable for algorithms sensitive to feature scales. In contrast, Normalizer scales individual samples to unit norms, focusing on the direction of the data points rather than their magnitude.
Similar Reads
Data Pre-Processing with Sklearn using Standard and Minmax scaler Data Scaling is a data preprocessing step for numerical features. Many machine learning algorithms like Gradient descent methods, KNN algorithm, linear and logistic regression, etc. require data scaling to produce good results. Various scalers are defined for this purpose. This article concentrates
3 min read
What is the difference between pipeline and make_pipeline in scikit? Generally, a machine learning pipeline is a series of steps, executed in an order wise to automate the machine learning workflows. A series of steps include training, splitting, and deploying the model. Pipeline It is used to execute the process sequentially and execute the steps, transformers, or e
2 min read
Feature Engineering: Scaling, Normalization, and Standardization Feature Scaling is a technique to standardize the independent features present in the data. It is performed during the data pre-processing to handle highly varying values. If feature scaling is not done then machine learning algorithm tends to use greater values as higher and consider smaller values
6 min read
Feature Engineering: Scaling, Normalization, and Standardization Feature Scaling is a technique to standardize the independent features present in the data. It is performed during the data pre-processing to handle highly varying values. If feature scaling is not done then machine learning algorithm tends to use greater values as higher and consider smaller values
6 min read
Feature Engineering: Scaling, Normalization, and Standardization Feature Scaling is a technique to standardize the independent features present in the data. It is performed during the data pre-processing to handle highly varying values. If feature scaling is not done then machine learning algorithm tends to use greater values as higher and consider smaller values
6 min read
StandardScaler, MinMaxScaler and RobustScaler techniques - ML In machine learning value of features may have different ranges and units. This variation can impact negatively on the performance of algorithms like KNN, SVM or Logistic Regression. To avoid this issue feature scaling is used to standardize data. In this article, weâll see three commonly used scali
5 min read