4.2 Clustering in un supervised Learning

- March 17, 2026

Clustering in un supervised Learning: -

Clustering is an unsupervised learning technique which is used to divide data into groups so that similar data points are in the same group.

There are no labels given. The algorithm itself finds groups based on similarity.

Clustering is an unsupervised learning technique that groups similar data points using similarity and distance measures such as Euclidean distance.

It uses similarity and distance to form groups.

Example: Shopping

Online stores group customers like:

Group 1 → People who buy electronics
Group 2 → People who buy clothes
Group 3 → People who buy groceries

This helps in giving recommendations.

Same group → More similar
Different groups → Less similar

Clustering helps to:

Discover hidden patterns in data
Organize large datasets
Make better decisions (business, healthcare, etc.)

Common methods: K-means, Hierarchical, DBSCAN

Similarity in Clustering: -

Similarity means how much two data points are alike.

If two items are very similar, they will be placed in the same cluster.

Example: Fruits

Compare two fruits:

Apple → Red, round
Cherry → Red, round

These are very similar, so they go into the same group.

Another comparison:

Banana → Yellow, long
Apple → Red, round

These are less similar, so they go into different groups.

More similarity → Same group
Less similarity → Different groups

Distance Measures in Clustering

Distance measure tells us how far apart two data points are.

Instead of saying “similar” or “different”, we calculate a number (distance).

Small distance → Very similar
Large distance → Very different

Common Distance Measure:

Euclidean Distance is the most commonly used method.
It is the straight-line distance between two points.

Example

Let’s take two students:

Student A → (2 study hours, 50 marks)
Student B → (3 study hours, 55 marks)

Distance between them is small, so they are similar.

Another pair:

Student A → (2, 50)
Student C → (10, 90)

Distance is large, so they are very different.

Clustering → Group similar data points
Similarity → How alike two data points are
Distance → How far two data points are

Small distance = High similarity
Large distance = Low similarity

Advantages of Clustering

No labeled data needed
Finds hidden patterns
Useful for large datasets
Easy to understand (especially K-means)

Disadvantages of Clustering

Choosing correct number of clusters is difficult
Sensitive to noise and outliers
Results may vary based on algorithm

Applications of Clustering

Customer segmentation
Recommendation systems (Netflix, Amazon)
Social network analysis
Medical diagnosis
Image segmentation

Search This Blog

ROHIT's Smart Class Room