2.14 Hypothesis Testing (Statistics & Machine Learning)

Hypothesis Testing (Statistics & Machine Learning)

  • A hypothesis is an assumption or statement about a population or data.
  • In machine learning, a hypothesis represents the relationship between input variables (features) and output variables (target values) that a model learns from data.
  • The goal of machine learning is to find the best hypothesis that can accurately predict results on new or unseen data.
  • Hypothesis testing is a statistical method used to evaluate assumptions about a population using sample data.
  • It helps us decide whether a statement about data is likely true or false.

Hypothesis testing involves two main hypotheses:

  1. Null Hypothesis (H₀)

  2. Alternative Hypothesis (H₁ or Ha)


1. Null Hypothesis (H₀)


The Null Hypothesis is the initial assumption that there is no significant difference or relationship between variables.

It represents the default or status quo statement.

Example

A company claims that its average daily production is 50 units.

            Null Hypothesis:

                                            H₀: μ = 50

This means the average production is equal to 50 units per day.

Here: μ = population mean


2. Alternative Hypothesis (H₁)


The Alternative Hypothesis is the opposite of the null hypothesis.

It suggests that there is a significant difference or relationship.

Example

If the company production is not equal to 50 units, then:

H₁: μ ≠ 50

This means the average production is different from 50 units.


Steps in Hypothesis Testing

The general process of hypothesis testing includes:

  1. State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁)

  2. Choose the significance level (α), usually 0.05

  3. Select a suitable statistical test (z-test, t-test, etc.)

  4. Calculate the test statistic

  5. Compare the result with the critical value or p-value

  6. Decide whether to reject or accept the null hypothesis


Applications of Hypothesis Testing in Machine Learning

Hypothesis testing is useful in several machine learning tasks.

1. Model Evaluation

  • Used to check whether a new model performs better than an existing model.
  • Example:   Compare the accuracy of two models using a paired t-test.

2. Feature Selection

  • Used to determine whether adding a new feature improves the model performance.
  • Example: Check whether adding age improves a prediction model.

3. Assumption Verification

  • Some algorithms require certain assumptions about data.
  • Hypothesis testing helps verify these assumptions.
  • Example: Checking whether data follows normal distribution.


Types of Hypothesis Testing

Hypothesis tests are mainly classified into:

  1. One-Tailed Test

  2. Two-Tailed Test

1. One-Tailed Test


  • A one-tailed test checks for a difference in only one direction.
  • It is used when we expect the result to be either greater than or less than a certain value, but not both.
  • Example: Testing whether a new algorithm increases accuracy.


Types of One-Tailed Tests

a) Left-Tailed Test

  • Used when the alternative hypothesis states that the value is less than the null hypothesis value.

Example:

                    H₀: μ ≥ 50
                    H₁: μ < 50

Interpretation:

  • The average production is less than 50 units.

b) Right-Tailed Test

  • Used when the alternative hypothesis states that the value is greater than the null hypothesis value.

Example:

                        H₀: μ ≤ 50
                        H₁: μ > 50

Interpretation:

  •       The average production is greater than 50 units.


2. Two-Tailed Test


  • A two-tailed test checks for differences in both directions.
  • It determines whether the value is either greater than or less than a specific value.
  • Used when we do not know the direction of change.

Example

H₀: μ = 50
H₁: μ ≠ 50

This means the average production may be greater or less than 50.


Example of Hypothesis Testing

Suppose a teacher claims:   "The average marks of students is 70."

Null Hypothesis:       H₀: μ = 70

Alternative Hypothesis:         H₁: μ ≠ 70

Steps:

  1. Collect a sample of student marks.

  2. Calculate the average marks.

  3. Perform a statistical test.

  4. Decide whether the teacher’s claim is correct.

























Popular posts from this blog

operators in c programming

2.4 Arrays in c programming

Variables in c