4.3 Agglomerative Algorithm or Agglomerative Clustering

Agglomerative Clustering: -

Agglomerative clustering is a type of hierarchical clustering that follows a bottom-up approach.

  • Start with each data point as its own cluster

  • Gradually merge the closest clusters

  • Continue until all points form one cluster (or desired groups are formed)

Agglomerative clustering is a hierarchical unsupervised learning algorithm that builds clusters by iteratively merging the closest data points or clusters.


Agglomerative Algorithm Works (Step-by-Step):- 

Step 1: Start

  • Each data point is considered as a separate cluster.

            Example:
                                Points → A, B, C, D
                                Clusters → {A}, {B}, {C}, {D}

Step 2: Calculate Distance

  • Find distance between all clusters. (Usually Euclidean distance is used)

Step 3: Merge Closest Clusters

  • Combine the two clusters that are closest.

            Example:  If A and B are closest → merge → {A, B}

Step 4: Update Distances

  • Recalculate distance between new clusters.

Step 5: Repeat

Keep merging until:

  • Only one cluster remains, or

  • Required number of clusters is reached


Example

Let’s take 4 data points:

Step 1:
Clusters → {A}, {B}, {C}, {D}

Step 2: Find closest points

  • A and B → distance = 1 (smallest)

            Merge → {A, B}

Step 3:
Clusters → {A, B}, {C}, {D}

  • C and D → distance = 2

            Merge → {C, D}

Step 4:
Clusters → {A, B}, {C, D}

Now only 2 clusters remain.

Final Clusters:

  • Cluster 1 → {A, B}

  • Cluster 2 → {C, D}


Linkage Methods 

  • When merging clusters, we need to define how distance is calculated between clusters.

1. Single Linkage

Distance = minimum distance between points (closest points)

2. Complete Linkage

Distance = maximum distance (farthest points)

3. Average Linkage

Distance = average distance between all points (overall average)


Dendrogram (Tree Diagram)

Agglomerative clustering is often shown using a dendrogram.

  • It is a tree-like diagram

  • Shows how clusters are merged step by step

  • Cutting the tree at a level gives clusters


Advantages

  • No need to choose number of clusters initially

  • Easy to understand

  • Works well for small datasets

Disadvantages

  • Slow for large datasets

  • Cannot undo merging (once merged, always merged)

  • Sensitive to noise

Applications

  • Document clustering

  • Image segmentation

  • Gene analysis

  • Customer grouping












































 

Popular posts from this blog

operators in c programming

2.4 Arrays in c programming

Variables in c