2.14 Hypothesis Testing (Statistics & Machine Learning)
Hypothesis Testing (Statistics & Machine Learning)
- A hypothesis is an assumption or statement about a population or data.
- In machine learning, a hypothesis represents the relationship between input variables (features) and output variables (target values) that a model learns from data.
- The goal of machine learning is to find the best hypothesis that can accurately predict results on new or unseen data.
- Hypothesis testing is a statistical method used to evaluate assumptions about a population using sample data.
- It helps us decide whether a statement about data is likely true or false.
Hypothesis testing involves two main hypotheses:
-
Null Hypothesis (H₀)
-
Alternative Hypothesis (H₁ or Ha)
1. Null Hypothesis (H₀)
The Null Hypothesis is the initial assumption that there is no significant difference or relationship between variables.
It represents the default or status quo statement.
Example
A company claims that its average daily production is 50 units.
Null Hypothesis:
H₀: μ = 50
This means the average production is equal to 50 units per day.
Here: μ = population mean
2. Alternative Hypothesis (H₁)
The Alternative Hypothesis is the opposite of the null hypothesis.
It suggests that there is a significant difference or relationship.
Example
If the company production is not equal to 50 units, then:
H₁: μ ≠ 50
This means the average production is different from 50 units.
Steps in Hypothesis Testing
The general process of hypothesis testing includes:
-
State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁)
-
Choose the significance level (α), usually 0.05
-
Select a suitable statistical test (z-test, t-test, etc.)
-
Calculate the test statistic
-
Compare the result with the critical value or p-value
-
Decide whether to reject or accept the null hypothesis
Applications of Hypothesis Testing in Machine Learning
Hypothesis testing is useful in several machine learning tasks.
1. Model Evaluation
- Used to check whether a new model performs better than an existing model.
- Example: Compare the accuracy of two models using a paired t-test.
2. Feature Selection
- Used to determine whether adding a new feature improves the model performance.
- Example: Check whether adding age improves a prediction model.
3. Assumption Verification
- Some algorithms require certain assumptions about data.
- Hypothesis testing helps verify these assumptions.
- Example: Checking whether data follows normal distribution.
Types of Hypothesis Testing
Hypothesis tests are mainly classified into:
-
One-Tailed Test
-
Two-Tailed Test
1. One-Tailed Test
- A one-tailed test checks for a difference in only one direction.
- It is used when we expect the result to be either greater than or less than a certain value, but not both.
- Example: Testing whether a new algorithm increases accuracy.
Types of One-Tailed Tests
a) Left-Tailed Test
- Used when the alternative hypothesis states that the value is less than the null hypothesis value.
Example:
H₀: μ ≥ 50
H₁: μ < 50
Interpretation:
- The average production is less than 50 units.
b) Right-Tailed Test
- Used when the alternative hypothesis states that the value is greater than the null hypothesis value.
Example:
H₀: μ ≤ 50
H₁: μ > 50
Interpretation:
- The average production is greater than 50 units.
2. Two-Tailed Test
- A two-tailed test checks for differences in both directions.
- It determines whether the value is either greater than or less than a specific value.
- Used when we do not know the direction of change.
Example
H₀: μ = 50
H₁: μ ≠ 50
This means the average production may be greater or less than 50.
Example of Hypothesis Testing
Suppose a teacher claims: "The average marks of students is 70."
Null Hypothesis: H₀: μ = 70
Alternative Hypothesis: H₁: μ ≠ 70
Steps:
-
Collect a sample of student marks.
-
Calculate the average marks.
-
Perform a statistical test.
-
Decide whether the teacher’s claim is correct.