ML 1.9 DECISION TREE MODEL OF LEARNING
Decision Tree of Learning
- A Decision Tree is a supervised learning model used for classification and regression.
- A decision tree learns by splitting data into smaller and purer groups using simple rules, until it can confidently make a prediction.
- It makes decisions by splitting data into subsets based on feature values, forming a tree-like structure of rules.
- It learns by asking a sequence of simple questions about the data and splitting it step by step, like a flowchart, until it reaches a decision.
Key Components of a Decision Tree:
·
Root Node: The
topmost node representing the entire dataset, which is then split into subsets.
·
Decision Nodes: Intermediate nodes that represent decisions based on specific
features, leading to further splits.
·
Leaf Nodes (Terminal Nodes): Nodes that represent the final output or decision, containing no
further splits. (Final Prediction
·
Branches:
Connections between nodes that represent the outcome of a decision, leading to
the next node.
Problem: Decide whether to play cricket based on weather conditions.
Features
-
Outlook: Sunny, Overcast, Rainy
-
Humidity: High, Normal
-
Wind: Strong, Weak
Target
-
Play Cricket: Yes or No
Learning process
-
The tree checks Outlook first.
-
If Outlook = Overcast → Play = Yes
-
If Outlook = Sunny → check Humidity
-
High → No
-
Normal → Yes
-
-
If Outlook = Rainy → check Wind
-
Strong → No
-
Weak → Yes
-
Prediction
-
Input: Outlook = Sunny, Humidity = High
-
Path followed: Sunny → High
-
Output: No
Decision Trees Works on the followings:
1. Splitting: The dataset is divided into subsets based on feature values. The
selection of features for splitting is typically based on criteria like Gini
impurity or information gain.
Splitting Criteria:
a) Gini Index (Classification):
- Measures the probability of misclassifying a sample.
- Gini index is a measure of impurity or purity used while creating a decision tree in the CART(Classification and Regression Tree) algorithm.
- An attribute with the low Gini index should be preferred as compared to the high Gini index.
- It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits.
- Gini index can be calculated using the below formula:
Formula:
b) Entropy/Information Gain (Classification): Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data. Entropy can be calculated as:
Formula:
Entropy(s)= - P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
S = Total number of samples
P(yes)= probability of yes
P(no)= probability of no
- Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an attribute.
- It calculates how much information a feature provides us about a class.
- According to the value of information gain, we split the node and build the decision tree.
- A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute having the highest information gain is split first. It can be calculated using the below formula:
Information Gain= Entropy(S)- [(Weighted Avg) *
Entropy (each feature)
Pruning:
- · Pre-pruning: Stop splitting early (e.g., max_depth).
- ·
Post-pruning: Remove unnecessary branches after tree growth.
2) Stopping Criteria: The splitting process continues until a predefined
stopping criterion is met, such as reaching a maximum depth or having a minimum
number of samples in a node.
3) Prediction: For a new input, the tree is traversed from the root to a leaf node by following the decisions based on feature values, resulting in the final prediction.
Advantages
-
Easy to understand and explain
-
Works with both numbers and categories
-
No need for feature scaling
Limitations
-
Can overfit if the tree grows too deep
-
Small data changes can change the structure
-
Usually less accurate than ensemble methods unless tuned
Step 1: Take a small dataset
Problem: Decide whether to Play Tennis
Target column: Play Tennis (Yes / No)
Feature column: Outlook
Step 2: Calculate Entropy of the whole dataset
Formula:
Count values:
-
Yes = 4
-
No = 3
-
Total = 7
This tells us how impure the dataset is.
Step 3: Split the data by the feature (Outlook)
Step 4: Calculate Weighted Entropy after split
Step 6: Choose the best feature
The feature with highest Information Gain becomes the root node.
So the tree starts as:
Step 7: Making a prediction
New input: Outlook = Sunny
Follow the path: Outlook → Sunny → No