ML 1.9 DECISION TREE MODEL OF LEARNING

Decision Tree of Learning

A Decision Tree is a supervised learning model used for classification and regression.

A decision tree learns by splitting data into smaller and purer groups using simple rules, until it can confidently make a prediction.

It makes decisions by splitting data into subsets based on feature values, forming a tree-like structure of rules.

It learns by asking a sequence of simple questions about the data and splitting it step by step, like a flowchart, until it reaches a decision.

Key Components of a Decision Tree:

· Root Node: The topmost node representing the entire dataset, which is then split into subsets.

· Decision Nodes: Intermediate nodes that represent decisions based on specific features, leading to further splits.

· Leaf Nodes (Terminal Nodes): Nodes that represent the final output or decision, containing no further splits. (Final Prediction

· Branches: Connections between nodes that represent the outcome of a decision, leading to the next node.

Root node: the first question about the data.

Internal nodes: further questions based on feature values.

Branches: answers to the questions.

Leaf nodes: final prediction or output.

At each step, the model chooses the question that best separates the data. For classification, this is often done using measures like Gini Index or Entropy (Information Gain).

Problem: Decide whether to play cricket based on weather conditions.

Features

Outlook: Sunny, Overcast, Rainy
Humidity: High, Normal
Wind: Strong, Weak

Target

Play Cricket: Yes or No

Learning process

The tree checks Outlook first.
If Outlook = Overcast → Play = Yes
If Outlook = Sunny → check Humidity
- High → No
- Normal → Yes
If Outlook = Rainy → check Wind
- Strong → No
- Weak → Yes

Prediction

Input: Outlook = Sunny, Humidity = High
Path followed: Sunny → High
Output: No

Decision Trees Works on the followings:

1. Splitting: The dataset is divided into subsets based on feature values. The selection of features for splitting is typically based on criteria like Gini impurity or information gain.

Splitting Criteria:

a) Gini Index (Classification):

Measures the probability of misclassifying a sample.
Gini index is a measure of impurity or purity used while creating a decision tree in the CART(Classification and Regression Tree) algorithm.
An attribute with the low Gini index should be preferred as compared to the high Gini index.
It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits.
Gini index can be calculated using the below formula:

Formula:

b) Entropy/Information Gain (Classification): Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data. Entropy can be calculated as:

Formula:

Entropy(s)= - P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

S = Total number of samples

P(yes)= probability of yes

P(no)= probability of no

Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an attribute.
It calculates how much information a feature provides us about a class.
According to the value of information gain, we split the node and build the decision tree.
A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute having the highest information gain is split first. It can be calculated using the below formula:

Information Gain= Entropy(S)- [(Weighted Avg) * Entropy (each feature)

Pruning:

· Pre-pruning: Stop splitting early (e.g., max_depth).
· Post-pruning: Remove unnecessary branches after tree growth.

2) Stopping Criteria: The splitting process continues until a predefined stopping criterion is met, such as reaching a maximum depth or having a minimum number of samples in a node.

3) Prediction: For a new input, the tree is traversed from the root to a leaf node by following the decisions based on feature values, resulting in the final prediction.

Advantages

Easy to understand and explain
Works with both numbers and categories
No need for feature scaling

Limitations

Can overfit if the tree grows too deep
Small data changes can change the structure
Usually less accurate than ensemble methods unless tuned

Step 1: Take a small dataset

Problem: Decide whether to Play Tennis

Target column: Play Tennis (Yes / No)
Feature column: Outlook

Step 2: Calculate Entropy of the whole dataset

Formula:

Count values:

Yes = 4
No = 3
Total = 7

This tells us how impure the dataset is.

Step 3: Split the data by the feature (Outlook)

Outlook = Sunny

Yes = 0, No = 3

Entropy = 0 (pure node)

Outlook = Overcast

Yes = 2, No = 0

Entropy = 0

Outlook = Rainy

Yes = 2, No = 0

Entropy = 0

Step 4: Calculate Weighted Entropy after split

Step 5: Calculate Information Gain

This is the maximum possible gain, meaning Outlook is a perfect feature to split on.

Step 6: Choose the best feature

The feature with highest Information Gain becomes the root node.

So the tree starts as:

The tree stops here because all leaf nodes are pure.

Step 7: Making a prediction

New input: Outlook = Sunny

Follow the path: Outlook → Sunny → No

outcome : Prediction: Do not play tennis

Search This Blog

ROHIT's Smart Class Room