ML 1.10 ID 3 ALGORITHM
v ID3 Algorithm: -
Ø
ID3 or Iterative Dichotomiser3 Algorithm is used in
machine learning for building decision trees from a given dataset.
Ø
It was developed in 1986 by Ross Quinlan.
Ø
It is a greedy algorithm that builds a decision tree by recursively
partitioning the data set into smaller and smaller subsets until all data
points in each subset belong to the same class.
Ø
It employs a top-down approach, recursively selecting features to split
the dataset based on information gain.
Ø
It is used for both classification and regression tasks.
Ø
ID3 deals primarily with categorical properties, which means that it can
efficiently handle objects with a discrete set of values. This property is
consistent with its suitability for problems where the input features are
categorical rather than continuous.
Ø
The primary purpose of the ID3 algorithm is to construct a decision tree
for classification tasks.
1.
Evaluating each attribute in the dataset to determine its potential to
reduce uncertainty (measured using entropy).
2.
Selecting the attribute with the highest information gain to create
splits that maximize classification accuracy.
3.
Repeating the process recursively on smaller subsets until the tree
fully classifies the data.
Ø
The ID3
algorithm constructs a decision tree by recursively splitting the dataset based
on the attribute that provides the highest information gain. Here’s a
step-by-step breakdown:
Step 1: Calculate the Entropy of the Dataset
·
Entropy
measures the impurity or randomness or uncertainty in the dataset.
·
The formula
for entropy is:
Entropy(s)= - P(yes)log2 P(yes)- P(no)
log2 P(no)
Where,
o
S = Total number of samples
o
P(yes)= probability of yes
o
P(no)= probability of no
where pi is the proportion
of instances belonging to class i.
Step 2: Compute Information Gain for Each
Attribute
·
Information Gain is the reduction in entropy after splitting the dataset based on
an attribute.
·
The formula for information gain is:
Here, sv is the
subset of s for which attribute A has value v.
Step 3: Select the Attribute with the Highest Information
Gain
·
Choose the attribute that most effectively reduces uncertainty and use
it as the decision node.
Step 4: Split the Dataset
·
Partition the dataset into subsets based on the selected attribute’s
values.
·
Assign branches for each possible outcome of the attribute.
Step 5: Recursively Apply the Process
·
Repeat steps 1 to 4 for each subset, excluding the previously used
attribute.
·
Continue until one of the following termination conditions is met:
a.
All instances in a subset belong to the same class.
b.
There are no remaining attributes to split.
c.
The dataset is empty.
Problem: Should we play golf based on weather
conditions?
Step 1: Calculate Root Entropy
·
Total samples: 10 (6 Yes, 4 No)
Entropy(S) = -[ (6/10)log₂(6/10) + (4/10)log₂(4/10)
] = 1
Step 2: Calculate Information Gain for
Each Attribute
a) Outlook:
·
Sunny (3 No, 1 Yes)
·
Overcast (0 No, 2 Yes)
·
Rain (1 No, 3 Yes)
Entropy(Sunny) = -[ (3/4)log₂(3/4) + (1/4)log₂(1/4)
] ≈ 0.8113
Entropy(Overcast) = 0 (pure)
Entropy(Rain) = -[ (1/4)log₂(1/4) +
(3/4)log₂(3/4) ] ≈ 0.8113
Gain(Outlook) = 1 - [ (4/10) × 0.8113 +
(2/10) × 0 + (4/10)×0.8113 ] ≈ 0.35096
b) Humidity:
·
High (3 No, 2 Yes)
·
Normal (1 No, 4 Yes)
Entropy(High) = -[ (3/5)log₂(3/5) + (2/5)log₂(2/5)
] ≈ 0.9710
Entropy(Normal) = -[ (1/5)log₂(1/5) + (4/5)log₂(4/5)
] ≈ 0.7219
Gain(Humidity) = 1 - [ (5/10) × 0.9710 + (5/10)
× 0.7219] ≈ 0.1536
c) Windy:
·
False (2 No, 5 Yes)
·
True (2 No, 1 Yes)
Entropy(False) = -[ (2/7)log₂(2/7) + (5/7)log₂(5/7)
] ≈ 0.8630
Entropy(True) = -[ (2/3)log₂(2/3) + (1/3)log₂(1/3)
] ≈ 0.9183
Gain(Windy) = 1 - [ (7/10) × 0.8630 + (3/10) ×
0.9183] ≈ 0.1204
d) Temperature:
·
Hot (2 No, 1 Yes)
·
Mild (1 No, 2 Yes)
·
Cool (1 No, 3 Yes)
Entropy(Hot) = -[ (2/3)log₂(2/3) + (1/3)log₂(1/3)
] ≈ 0.9183
Entropy(Mild) = -[ (1/3)log₂(1/3) + (2/3)log₂(2/3)
] ≈ 0.9183
Entropy(cool) = -[ (1/4)log₂(1/4) + (3/4)log₂(3/4)
] ≈ 0.8113
Gain(Windy) = 1 - [ (3/10) × 0.9183 + (3/10) ×
0.9183 + (4/10) × 0.8113] ≈ 0.1245
Step 3: Select Root Node
·
Highest gain: Outlook (0.35096)
Step 4: Build Sub-Trees
a)
Overcast Branch (All Yes)
·
Direct classification: "Yes"
b)
Sunny Branch
(3 No, 1 Yes)
·
Recalculate
gains for remaining attributes:
Gain(Humidity|Sunny) ≈ 0.8113 - [ (3/4) × 0 +
(1/4) × 0 ] = 0.971
Ø
To use Python for the ID3 decision tree algorithm, we need to import the
following libraries:
·
pandas: For data
analysis and manipulation
·
scikit-learn: For machine learning algorithms
Applications of ID3
1.
Fraud detection: ID3 can be used to develop models that can detect fraudulent
transactions or activities.
2.
Medical diagnosis: ID3 can be used to develop models that can diagnose diseases or
medical conditions.
3.
Customer segmentation: ID3 can be used to segment customers into different groups based
on their demographics, purchase history, or other factors.
4.
Risk assessment: ID3 can be used to assess risk in a variety of different areas,
such as insurance, finance, and healthcare.
5.
Recommendation systems: ID3 can be used to develop recommendation systems that can
recommend products, services, or content to users based on their past behavior
or preferences.
·
Amazon uses ID3 to recommend products to its customers.
·
Netflix uses ID3 to recommend movies and TV shows to its users.
·
Spotify uses ID3 to recommend songs and playlists to its users.
·
Lending Club uses ID3 to assess the risk of approving loans to
borrowers.
·
Healthcare organizations use ID3 to diagnose diseases, predict patient
outcomes, and develop personalized treatment plans.