Data Normalization
Data Normalization
Data normalization makes data easier to classify and understand. It is used to
scale the data of an attribute so that it falls in a smaller range
Need of Normalization?
• Normalization is generally required when multiple attributes are there but attributes
have values on different scales, this may lead to poor data models while performing
data mining operations.
• Otherwise, it may lead to a dilution in effectiveness of an important equally important
attribute(on lower scale) because of other attribute having values on larger scale.
• Heterogenous data with different units usually needs to be normalized. Otherwise, data
has the same unit and same order of magnitude it might not be necessary with
normalization.
• Unless normalized at pre-processing, variables with disparate ranges or varying
precision acquire different driving values.
2. Data Transformation: Data Normalization contd..
Example
Chart for Raw Data
Chart for Normalized Data
2. Data Transformation: Data Normalization contd..
Methods of Data Normalization:
a. Decimal Scaling
b. Min-Max Normalization
c. z-Score Normalization(zero-mean Normalization)
There are several approaches in normalisation which can be used in
deep learning models.
Batch Normalization
Layer Normalization
Group Normalization
Instance Normalization
Weight Normalization
2. Data Transformation: Data Normalization contd..
a. Decimal Scaling Normalization
- It normalizes by moving the decimal point of values of the data.
- To normalize the data by this technique, we divide each data value by the
maximum absolute value of data set.
- The data value, vi, of data is normalized to v'i by using the formula
[where j is the smallest integer such that max(|v'i|)<1.]
In this technique, the computation is generally scaled in terms of decimals. It means that the
result is generally scaled by multiplying or dividing it with pow(10,k).
Example:
- Normalize the input data is: - 15, 121, 201, 421, 561, 601, 850
- Step 1: Maximum value in given data(m): 850 and hence maximum absolute value is
1000
- Step 2: Divide the given data by 1000 (i.e j=3)
2. Data Transformation: Data Normalization contd..
b. Min-Max Normalization (Linear Transformation)
- Minimum and maximum value from data is fetched and each value is
replaced according to the following formula.
Where - A is the attribute data(col)
- v and v’ is the old and new value of each entry in data
- min(A), max(A) are the minimum and maximum of A
- new_max(A), new_min(A) is the max and min value of the
required range(i.e boundary value) respectively.
Example
Input:- 10, 15, 50, 60
Normalized to range 0 to 1.
Here min=10, max= 60, new_min=0, new_max=1
Output:- 0, 0.1, 0.8, 1
2. Data Transformation: Data Normalization contd..
c. z-Score Normalization (zero-mean Normalization)
- Values are normalized based on mean and standard deviation of the data A.
- It is also called Standard Deviation method.
- Unstructured data can be normalized using z-score parameter,
where - - : mean
- S is the standard deviation.
- v and v’ is the old and new value of each data
Input:- 10, 15, 50, 60
n
1
mean x
n x
i 1
i 33.75
2
Output:-
SD 0.9515,
(Xi X )
x 0.7512, 0.6510, 1.0517
n 1