2.10 Normalization
Normalization: -
- Normalization is a data pre-processing technique used to scale numerical features data into a standard range. It ensures that all input features contribute equally to the model and improves convergence of machine learning algorithms.
- Normalization is a technique used to scale numbers into a common range. It makes big values and small values come to a similar level.
- Normalization scales features to a standard range to improve model performance.
Need for Normalization
-
Features may have different ranges
-
Large-scale values can dominate small-scale values
-
Some algorithms depend on distance calculations
Features may have different ranges
Large-scale values can dominate small-scale values
Some algorithms depend on distance calculations
Normalization helps:
-
Improve model accuracy
-
Speed up training process
-
Improve convergence in gradient-based algorithms
Example: -
Imagine a dataset:
Age = 18 to 60
Salary = 10,000 to 1,00,000
Salary values are much bigger than age values.
If we train a model:
The model may give more importance to salary
Age may get ignored
So we normalize the data to bring all values into a similar range.
This helps:
Improve accuracy
Train model faster
Avoid bias due to large numbers
Normalization reshapes numerical columns to a standard scale without disturbing the relative differences between values
It may:
-
Rescale data between 0 and 1
-
Adjust data to have mean = 0 and standard deviation = 1
Types of Normalization
1) Min-Max Scaling (Rescaling)
This method transforms data (values) into a fixed range, between 0 and 1.
Example:
Marks: 40, 60, 80
Minimum = 40
Maximum = 80
Now normalize 60:
So 60 becomes 0.5
After normalization:
-
40 becomes 0
-
80 becomes 1
-
All values lie between 0 and 1
Normalize All Values
For 40:
For 80:
All values now lie between 0 and 1.
2) Standardization (Z-Score Normalization)
Standardization rescales data so that:
-
Mean = 0
-
Standard Deviation = 1
It transforms values based on how far they are from the mean, measured in standard deviations.
the result tells, How many standard deviations a value is away from the mean.
Example
Given Marks:
40, 60, 80
Step 1: Find Mean
Step 2: Find Standard Deviation
First compute deviations from mean:
-
(40 − 60) = −20
-
(60 − 60) = 0
-
(80 − 60) = 20
Square them:
-
400
-
0
-
400
Average of squares:
Standard deviation:
Step 3: Apply Formula
For 40:
For 60:
For 80:
Final Standardized Values