2.9 Feature Pruning

Feature Pruning: -

  • Feature Pruning is a technique in machine learning used to simplify a model by removing unnecessary parts such as branches in decision trees or redundant weights in neural networks. 
  • The main aim is to make the model smaller, faster, and more efficient while maintaining good accuracy.
  • When a model becomes too complex, it may learn noise from the training data. This leads to overfitting, where the model performs well on training data but poorly on new data. Pruning helps reduce this problem.
  • Feature pruning is an important technique to improve model performance. It reduces overfitting, increases interpretability, and improves computational efficiency. By removing unnecessary components, we get a smaller, faster, and more accurate model suitable for real-world applications.

Objectives of Feature Pruning

  1. Reduce Overfitting
    Pruning removes parts of the model that capture noise or irrelevant patterns in training data. This improves performance on test data

  2. Improve Model Interpretability
    A simpler model is easier to understand and explain. For example, a smaller decision tree is easier to analyze.

  3. Optimize Computational Efficiency

    A pruned model requires less memory and computation. This makes training and prediction faster 

Feature Pruning is Needed

In real-world datasets:

  • There may be irrelevant or redundant features

  • Models may grow too large

  • Training may become slow

  • Prediction may require more memory

Pruning solves these problems by keeping only the important parts of the model.

Pruning can be applied in two main stages:

1. Pre-Pruning (Early Stopping)

Pre-pruning stops model growth early.

Example in Decision Tree:

  • Stop splitting if:

    • Tree depth reaches a limit

    • Minimum samples per node is small

    • Information gain is low

This prevents the tree from becoming too complex.

Advantage: Saves training time
Disadvantage: May stop too early and reduce accuracy

2. Post-Pruning (Backward Pruning)

Post-pruning removes branches after the full model is built.

Steps:

  1. Build full model

  2. Evaluate performance

  3. Remove weak branches

  4. Compare accuracy

  5. Keep pruned version if performance improves

This is generally more effective than pre-pruning.


Types of Pruning: -

1) Structured Pruning:

Structured pruning removes entire components such as neurons, filters, or layers in neural networks. It changes the model architecture and reduces computational cost significantly

  • Removes groups of parameters

  • Reduces memory usage

  • Maintains organized structure

Example:
Suppose a neural network has 5 hidden neurons. If 2 neurons contribute very little to output, structured pruning removes those 2 neurons completely. The network now has 3 neurons, making it smaller and faster.

2) Unstructured Pruning:

Unstructured pruning removes individual weights instead of entire neurons or layers. It creates sparsity but does not change the overall architecture

  • Removes specific weights

  • Creates sparse connections

  • Architecture remains the same

Example:
If a neural network has 1000 weights and 200 of them are very small, unstructured pruning sets those 200 weights to zero. The structure remains the same, but fewer weights are active.

Example Using Decision Tree

Consider a decision tree built to predict whether a student will pass based on:

  • Study hours

  • Attendance

  • Favorite color

If “favorite color” does not affect performance but appears in the tree due to noise, pruning removes that branch. The tree becomes simpler and focuses only on important features like study hours and attendance.

Example 2: Student Performance Prediction

Features:

  • Study hours

  • Attendance

  • Internet usage

  • Shoe size

If shoe size has no relation to performance, pruning removes that feature from the model.

The final model focuses only on meaningful features.







































Popular posts from this blog

operators in c programming

2.4 Arrays in c programming

Variables in c