ML 1.10 ID 3 ALGORITHM

 

v  ID3 Algorithm: -

Ø ID3 or Iterative Dichotomiser3 Algorithm is used in machine learning for building decision trees from a given dataset.

Ø It was developed in 1986 by Ross Quinlan.

Ø It is a greedy algorithm that builds a decision tree by recursively partitioning the data set into smaller and smaller subsets until all data points in each subset belong to the same class.

Ø It employs a top-down approach, recursively selecting features to split the dataset based on information gain.

Ø It is used for both classification and regression tasks.

Ø ID3 deals primarily with categorical properties, which means that it can efficiently handle objects with a discrete set of values. This property is consistent with its suitability for problems where the input features are categorical rather than continuous.

 

Ø The primary purpose of the ID3 algorithm is to construct a decision tree for classification tasks.

1.    Evaluating each attribute in the dataset to determine its potential to reduce uncertainty (measured using entropy).

2.    Selecting the attribute with the highest information gain to create splits that maximize classification accuracy.

3.    Repeating the process recursively on smaller subsets until the tree fully classifies the data.

 

Ø The ID3 algorithm constructs a decision tree by recursively splitting the dataset based on the attribute that provides the highest information gain. Here’s a step-by-step breakdown:

 

Step 1: Calculate the Entropy of the Dataset

·        Entropy measures the impurity or randomness or uncertainty in the dataset.

·        The formula for entropy is:



Entropy(s)= - P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

o   S = Total number of samples

o   P(yes)= probability of yes

o   P(no)= probability of no

 

where ​pi is the proportion of instances belonging to class i.

 

Step 2: Compute Information Gain for Each Attribute

·        Information Gain is the reduction in entropy after splitting the dataset based on an attribute.

·        The formula for information gain is:


Here, ​sv is the subset of s for which attribute A has value v.

 

Step 3: Select the Attribute with the Highest Information Gain

·        Choose the attribute that most effectively reduces uncertainty and use it as the decision node.

 

Step 4: Split the Dataset

·        Partition the dataset into subsets based on the selected attribute’s values.

·        Assign branches for each possible outcome of the attribute.

 

Step 5: Recursively Apply the Process

·        Repeat steps 1 to 4 for each subset, excluding the previously used attribute.

·        Continue until one of the following termination conditions is met:

a.      All instances in a subset belong to the same class.

b.      There are no remaining attributes to split.

c.      The dataset is empty.

 

Problem: Should we play golf based on weather conditions?


Step 1: Calculate Root Entropy

·        Total samples: 10 (6 Yes, 4 No)

 

Entropy(S) = -[ (6/10)log₂(6/10) + (4/10)log₂(4/10) ] = 1





Step 2: Calculate Information Gain for Each Attribute

a) Outlook:

·        Sunny (3 No, 1 Yes)

·        Overcast (0 No, 2 Yes)

·        Rain (1 No, 3 Yes)

 

 

Entropy(Sunny) = -[ (3/4)log₂(3/4) + (1/4)log₂(1/4) ] ≈ 0.8113

Entropy(Overcast) = 0 (pure)

Entropy(Rain) = -[ (1/4)log₂(1/4) + (3/4)log₂(3/4) ] ≈ 0.8113

 

Gain(Outlook) = 1 - [ (4/10) × 0.8113 + (2/10) × 0 + (4/10)×0.8113 ] ≈ 0.35096

 

b) Humidity:

·        High (3 No, 2 Yes)

·        Normal (1 No, 4 Yes)

 

 

Entropy(High) = -[ (3/5)log₂(3/5) + (2/5)log₂(2/5) ] ≈ 0.9710

Entropy(Normal) = -[ (1/5)log₂(1/5) + (4/5)log₂(4/5) ] ≈ 0.7219

 

Gain(Humidity) = 1 - [ (5/10) × 0.9710 + (5/10) × 0.7219] ≈ 0.1536

 

c) Windy:

·        False (2 No, 5 Yes)

·        True (2 No, 1 Yes)

 

Entropy(False) = -[ (2/7)log₂(2/7) + (5/7)log₂(5/7) ] ≈ 0.8630

Entropy(True) = -[ (2/3)log₂(2/3) + (1/3)log₂(1/3) ] ≈ 0.9183

 

Gain(Windy) = 1 - [ (7/10) × 0.8630 + (3/10) × 0.9183] ≈ 0.1204

 

d) Temperature:

·        Hot (2 No, 1 Yes)

·        Mild (1 No, 2 Yes)

·        Cool (1 No, 3 Yes)

 

Entropy(Hot) = -[ (2/3)log₂(2/3) + (1/3)log₂(1/3) ] ≈ 0.9183

Entropy(Mild) = -[ (1/3)log₂(1/3) + (2/3)log₂(2/3) ] ≈ 0.9183

Entropy(cool) = -[ (1/4)log₂(1/4) + (3/4)log₂(3/4) ] ≈ 0.8113

 

 

Gain(Windy) = 1 - [ (3/10) × 0.9183 + (3/10) × 0.9183 + (4/10) × 0.8113] ≈ 0.1245

 

Step 3: Select Root Node

·        Highest gain: Outlook (0.35096)

 

Step 4: Build Sub-Trees

 

a)     Overcast Branch (All Yes)

·        Direct classification: "Yes"

 

b)    Sunny Branch (3 No, 1 Yes)

·        Recalculate gains for remaining attributes:

 

Gain(Humidity|Sunny) ≈ 0.8113 - [ (3/4) × 0 + (1/4) × 0 ] = 0.971

 

 

Ø To use Python for the ID3 decision tree algorithm, we need to import the following libraries:

·      pandas: For data analysis and manipulation

·      scikit-learn: For machine learning algorithms


Applications of ID3

1.      Fraud detection: ID3 can be used to develop models that can detect fraudulent transactions or activities.

2.      Medical diagnosis: ID3 can be used to develop models that can diagnose diseases or medical conditions.

3.      Customer segmentation: ID3 can be used to segment customers into different groups based on their demographics, purchase history, or other factors.

4.      Risk assessment: ID3 can be used to assess risk in a variety of different areas, such as insurance, finance, and healthcare.

5.      Recommendation systems: ID3 can be used to develop recommendation systems that can recommend products, services, or content to users based on their past behavior or preferences.

·        Amazon uses ID3 to recommend products to its customers.

·        Netflix uses ID3 to recommend movies and TV shows to its users.

·        Spotify uses ID3 to recommend songs and playlists to its users.

·        Lending Club uses ID3 to assess the risk of approving loans to borrowers.

·        Healthcare organizations use ID3 to diagnose diseases, predict patient outcomes, and develop personalized treatment plans.



Popular posts from this blog

operators in c programming

2.4 Arrays in c programming