2.11 Evaluating Model Performance (Machine Learning)

Evaluating Model Performance (Machine Learning)

  • Evaluating model performance is an important step in machine learning during model development and testing.
  • It uses evaluation metrics to measure how well a model performs on data.
  • Model evaluation helps answer questions such as:

                    1. Did the model learn meaningful patterns from the data?

                    2. Will the model perform well on new unseen data?

                    3. Is the model overfitting or underfitting?

  • Machine learning models must be tested carefully because a model that works well on training data may fail on new data.    

Proper evaluation of a machine learning model helps in:

  • Measuring accuracy and reliability of predictions

  • Avoiding overfitting (model memorizes training data)

  • Avoiding underfitting (model fails to learn patterns)

  • Comparing multiple models to choose the best one

  • Tuning hyperparameters to improve performance

 

Methods for Evaluating Model Performance

To evaluate machine learning models and reduce overfitting, we commonly use two methods:

  1. Hold-Out Method

  2. Cross-Validation

1. Hold-Out Method

  • The Hold-Out method is the simplest technique used to evaluate machine learning models.
  • In this method, the dataset is split into two parts:

                    Training dataset – used to train the model

                    Testing dataset – used to evaluate the model


                Usually, a larger portion is used for training and a smaller portion for testing.

Example

Suppose we have 1000 data records.

  • Training data = 800 records

  • Testing data = 200 records

Steps:

  1. Train the model using the 800 records

  2. Test the model using the 200 records

  3. Measure the model performance using evaluation metrics

Advantages

  • Simple to implement

  • Fast evaluation

Disadvantages

  • Performance depends on how the data is split

  • Results may vary if the split changes


2. Cross-Validation

  • Cross-Validation is a more reliable evaluation method where the dataset is split multiple times to test the model.
  • The most common type is K-Fold Cross-Validation.

K-Fold Cross-Validation: 

In this method:

  1. The dataset is divided into K equal parts (folds).

  2. The model is trained K times.

  3. Each time:

    • One fold is used for testing

    • Remaining K-1 folds are used for training

Finally, the average performance of all runs is calculated.

Example (5-Fold Cross Validation)

Dataset → divided into 5 folds


The final accuracy = average of all 5 results.

Advantages

  • Uses entire dataset efficiently

  • Gives more reliable performance results

  • Reduces bias


Classification Model Evaluation Methods

Classification is used to categorize data into predefined classes or labels.

Examples:

  • Email → Spam / Not Spam

  • Image → Cat / Dog

  • Loan → Approved / Rejected

To evaluate classification models, we use several metrics:

  1. Accuracy

  2. Precision

  3. Recall

  4. F1 Score

  5. Confusion Matrix


1. Accuracy

  • Accuracy measures the percentage of correct predictions made by the model.

Formula

                                                        Accuracy = (TP + TN) / (TP + TN + FP + FN)

Where:

  • TP = True Positive

  • TN = True Negative

  • FP = False Positive

  • FN = False Negative

Example

Suppose a model tested 100 emails:

  • Correct spam predictions = 40

  • Correct non-spam predictions = 50

Total correct predictions = 90

Accuracy = 90 / 100 = 90%

Limitation

Accuracy does not work well with imbalanced datasets.

Example:

If 95% emails are non-spam, a model predicting always non-spam still gets 95% accuracy, but the model is useless.


2. Precision

  • Precision measures how many predicted positive cases are actually positive.

Formula

                                            Precision = TP / (TP + FP)

Example

Suppose a model predicts 50 emails as spam.

Out of those:

  • 40 are actually spam (TP)

  • 10 are not spam (FP)

Precision = 40 / (40 + 10)
Precision = 0.80 (80%)

Interpretation

Out of all predicted spam emails, 80% are correct.

Limitation

Precision does not consider False Negatives.


3. Recall

  • Recall measures how many actual positive cases the model correctly identifies.

Formula

                                            Recall = TP / (TP + FN)

Example

Suppose there are 60 actual spam emails.

The model detects 40 of them.

Recall = 40 / (40 + 20)
Recall = 0.67 (67%)

Interpretation

The model detects 67% of all spam emails.

Limitation

High recall may produce more false positives.


4. F1 Score

  • F1 Score is the harmonic mean of Precision and Recall.
  • It balances both metrics.

Formula

                                    F1 Score = 2 × (Precision × Recall) / (Precision + Recall)

Example

Precision = 0.80
Recall = 0.67

F1 Score =
2 × (0.80 × 0.67) / (0.80 + 0.67)

F1 Score ≈ 0.73

It is useful when:

  • Dataset is imbalanced

  • Both precision and recall are important


5. Confusion Matrix

  • A Confusion Matrix is a table used to evaluate classification models.
  • It shows the number of correct and incorrect predictions.
  • For binary classification, it is a 2 × 2 matrix.


True Positive (TP)

Model predicts Yes and the actual value is Yes.

Example:
Model predicts Dog and the image is actually Dog.


False Positive (FP)

Model predicts Yes, but the actual value is No.

Also called Type I Error.

Example:
Model predicts Dog, but the image is Not Dog.


False Negative (FN)

Model predicts No, but the actual value is Yes.

Also called Type II Error.

Example:
Model predicts Not Dog, but the image is actually Dog.


Example: Dog Image Classification



Example counts:

TP = 30
TN = 50
FP = 10
FN = 10

These values are used to calculate accuracy, precision, recall, and F1 score.
































Popular posts from this blog

operators in c programming

2.4 Arrays in c programming

Variables in c