4.3 Agglomerative Algorithm or Agglomerative Clustering
Agglomerative Clustering: -
Agglomerative clustering is a type of hierarchical clustering that follows a bottom-up approach.
-
Start with each data point as its own cluster
-
Gradually merge the closest clusters
-
Continue until all points form one cluster (or desired groups are formed)
Agglomerative clustering is a hierarchical unsupervised learning algorithm that builds clusters by iteratively merging the closest data points or clusters.
Agglomerative Algorithm Works (Step-by-Step):-
Step 1: Start
- Each data point is considered as a separate cluster.
Example:
Points → A, B, C, D
Clusters → {A}, {B}, {C}, {D}
Step 2: Calculate Distance
- Find distance between all clusters. (Usually Euclidean distance is used)
Step 3: Merge Closest Clusters
- Combine the two clusters that are closest.
Example: If A and B are closest → merge → {A, B}
Step 4: Update Distances
- Recalculate distance between new clusters.
Step 5: Repeat
Keep merging until:
-
Only one cluster remains, or
-
Required number of clusters is reached
Example
Let’s take 4 data points:
Step 1:Clusters → {A}, {B}, {C}, {D}
Step 2: Find closest points
-
A and B → distance = 1 (smallest)
Merge → {A, B}
Step 3:
Clusters → {A, B}, {C}, {D}
-
C and D → distance = 2
Merge → {C, D}
Step 4:
Clusters → {A, B}, {C, D}
Now only 2 clusters remain.
Final Clusters:
-
Cluster 1 → {A, B}
-
Cluster 2 → {C, D}
Linkage Methods
- When merging clusters, we need to define how distance is calculated between clusters.
1. Single Linkage
Distance = minimum distance between points (closest points)
2. Complete Linkage
Distance = maximum distance (farthest points)
3. Average Linkage
Distance = average distance between all points (overall average)
Dendrogram (Tree Diagram)
Agglomerative clustering is often shown using a dendrogram.
-
It is a tree-like diagram
-
Shows how clusters are merged step by step
-
Cutting the tree at a level gives clusters
Advantages
-
No need to choose number of clusters initially
-
Easy to understand
-
Works well for small datasets
Disadvantages
-
Slow for large datasets
-
Cannot undo merging (once merged, always merged)
-
Sensitive to noise
Applications
-
Document clustering
-
Image segmentation
-
Gene analysis
-
Customer grouping