ML 1.9 DECISION TREE MODEL OF LEARNING

Decision Tree of Learning

  • A Decision Tree is a supervised learning model used for classification and regression.
  • A decision tree learns by splitting data into smaller and purer groups using simple rules, until it can confidently make a prediction.
  • It makes decisions by splitting data into subsets based on feature values, forming a tree-like structure of rules.
  • It learns by asking a sequence of simple questions about the data and splitting it step by step, like a flowchart, until it reaches a decision.




Key Components of a Decision Tree:

·        Root Node: The topmost node representing the entire dataset, which is then split into subsets.

·        Decision Nodes: Intermediate nodes that represent decisions based on specific features, leading to further splits.

·        Leaf Nodes (Terminal Nodes): Nodes that represent the final output or decision, containing no further splits. (Final Prediction

·        Branches: Connections between nodes that represent the outcome of a decision, leading to the next node.

            
            Root node: the first question about the data.

            Internal nodes: further questions based on feature values.
            
            Branches: answers to the questions.

            Leaf nodes: final prediction or output.


At each step, the model chooses the question that best separates the data. For classification, this is often done using measures like Gini Index or Entropy (Information Gain).

Problem: Decide whether to play cricket based on weather conditions.

Features

  • Outlook: Sunny, Overcast, Rainy

  • Humidity: High, Normal

  • Wind: Strong, Weak

Target

  • Play Cricket: Yes or No

Learning process

  1. The tree checks Outlook first.

  2. If Outlook = Overcast → Play = Yes

  3. If Outlook = Sunny → check Humidity

    • High → No

    • Normal → Yes

  4. If Outlook = Rainy → check Wind

    • Strong → No

    • Weak → Yes

Prediction

  • Input: Outlook = Sunny, Humidity = High

  • Path followed: Sunny → High

  • Output: No


Decision Trees Works on the followings: 

1. Splitting: The dataset is divided into subsets based on feature values. The selection of features for splitting is typically based on criteria like Gini impurity or information gain.

 

    Splitting Criteria:

    a) Gini Index (Classification):

  • Measures the probability of misclassifying a sample.
  • Gini index is a measure of impurity or purity used while creating a decision tree in the CART(Classification and Regression Tree) algorithm.
  • An attribute with the low Gini index should be preferred as compared to the high Gini index.
  • It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits.
  • Gini index can be calculated using the below formula:

 Formula:    


  b) Entropy/Information Gain (Classification): Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data. Entropy can be calculated as:

Formula:

                     Entropy(s)= - P(yes)log2 P(yes)- P(no) log2 P(no)

                    Where,

                                    S = Total number of samples

                                    P(yes)= probability of yes

                                    P(no)= probability of no


  • Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an attribute.
  • It calculates how much information a feature provides us about a class.
  • According to the value of information gain, we split the node and build the decision tree.
  • A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute having the highest information gain is split first. It can be calculated using the below formula:

                Information Gain= Entropy(S)- [(Weighted Avg) * Entropy (each feature)  


Pruning:

  • ·       Pre-pruning: Stop splitting early (e.g., max_depth).
  • ·        Post-pruning: Remove unnecessary branches after tree growth.


2) Stopping Criteria: The splitting process continues until a predefined stopping criterion is met, such as reaching a maximum depth or having a minimum number of samples in a node.

3) Prediction: For a new input, the tree is traversed from the root to a leaf node by following the decisions based on feature values, resulting in the final prediction.


Advantages

  • Easy to understand and explain

  • Works with both numbers and categories

  • No need for feature scaling

Limitations

  • Can overfit if the tree grows too deep

  • Small data changes can change the structure

  • Usually less accurate than ensemble methods unless tuned


Step 1: Take a small dataset

                Problem: Decide whether to Play Tennis



Target column: Play Tennis (Yes / No)
Feature column: Outlook

Step 2: Calculate Entropy of the whole dataset

Formula:

Count values:

  • Yes = 4

  • No = 3

  • Total = 7



This tells us how impure the dataset is.

Step 3: Split the data by the feature (Outlook)

Outlook = Sunny


Yes = 0, No = 3
Entropy = 0 (pure node)

Outlook = Overcast



Yes = 2, No = 0
Entropy = 0

Outlook = Rainy



Yes = 2, No = 0
Entropy = 0

Step 4: Calculate Weighted Entropy after split


Step 5: Calculate Information Gain


This is the maximum possible gain, meaning Outlook is a perfect feature to split on.

Step 6: Choose the best feature

The feature with highest Information Gain becomes the root node.

So the tree starts as:




The tree stops here because all leaf nodes are pure.

Step 7: Making a prediction

New input:   Outlook = Sunny

Follow the path:    Outlook → Sunny → No

outcome Prediction: Do not play tennis





































































Popular posts from this blog

operators in c programming

2.4 Arrays in c programming