2.10 Normalization

 Normalization: - 

  • Normalization is a data pre-processing technique used to scale numerical features data into a standard range. It ensures that all input features contribute equally to the model and improves convergence of machine learning algorithms.
  • Normalization is a technique used to scale numbers into a common range. It makes big values and small values come to a similar level.
  • Normalization scales features to a standard range to improve model performance.

Need for Normalization

  1. Features may have different ranges

  2. Large-scale values can dominate small-scale values

  3. Some algorithms depend on distance calculations

Normalization helps:

  • Improve model accuracy

  • Speed up training process

  • Improve convergence in gradient-based algorithms


Example: -

Imagine a dataset:

  • Age = 18 to 60

  • Salary = 10,000 to 1,00,000

Salary values are much bigger than age values.

If we train a model:

  • The model may give more importance to salary

  • Age may get ignored

So we normalize the data to bring all values into a similar range.

This helps:

  • Improve accuracy

  • Train model faster

  • Avoid bias due to large numbers 


Normalization reshapes numerical columns to a standard scale without disturbing the relative differences between values

It may:

  • Rescale data between 0 and 1

  • Adjust data to have mean = 0 and standard deviation = 1


Types of Normalization

1) Min-Max Scaling (Rescaling)

This method transforms data (values) into a fixed range, between 0 and 1.


This formula is called Min–Max Normalization. It is used in data pre-processing, especially in machine learning, to scale values into a fixed range, usually 0 to 1.



Example:

Marks: 40, 60, 80

Minimum = 40
Maximum = 80

Now normalize 60:

(6040)/(8040)(60 - 40) / (80 - 40)
20/40=0.520 / 40 = 0.5

So 60 becomes 0.5

After normalization:

  • 40 becomes 0

  • 80 becomes 1

  • All values lie between 0 and 1


Normalize All Values

For 40:

(4040)/(8040)=0/40=0(40 - 40) / (80 - 40) = 0/40 = 0

For 80:

(8040)/(8040)=40/40=1(80 - 40) / (80 - 40) = 40/40 = 1

Final Normalized Values

All values now lie between 0 and 1.


2) Standardization (Z-Score Normalization)

Standardization rescales data so that:

  • Mean = 0

  • Standard Deviation = 1

It transforms values based on how far they are from the mean, measured in standard deviations.


the result tells, How many standard deviations a value is away from the mean.

Example

Given Marks:

40, 60, 80

Step 1: Find Mean

Mean=(40+60+80)/3=60\text{Mean} = (40 + 60 + 80) / 3 = 60

Step 2: Find Standard Deviation

First compute deviations from mean:

  • (40 − 60) = −20

  • (60 − 60) = 0

  • (80 − 60) = 20

Square them:

  • 400

  • 0

  • 400

Average of squares:

(400+0+400)/3=266.67(400 + 0 + 400) / 3 = 266.67

Standard deviation:

σ=266.6716.33\sigma = \sqrt{266.67} \approx 16.33

Step 3: Apply Formula

For 40:

(4060)/16.33=20/16.331.22(40 - 60) / 16.33 = -20 / 16.33 ≈ -1.22

For 60:

(6060)/16.33=0(60 - 60) / 16.33 = 0

For 80:

(8060)/16.33=20/16.331.22(80 - 60) / 16.33 = 20 / 16.33 ≈ 1.22

Final Standardized Values


Mean becomes 0

Values below mean → negative

Values above mean → positive

Standard deviation becomes 1
































Popular posts from this blog

operators in c programming

2.4 Arrays in c programming

Variables in c