3.3 Statistical Estimation in Machine Learning
3.3 Statistical Estimation in Machine Learning
- Statistical estimation is a method used in machine learning to estimate unknown values or parameters from sample data.
- In simple terms: We use sample data to estimate information about the whole dataset (population).
- Instead of checking all data, we take a small sample and estimate the result.
- In machine learning, statistical estimation helps models learn patterns, estimate probabilities, and make predictions from training data.
- Examples include estimating average marks, probability of spam emails, and relationships between variables.
Example: Suppose a college has 1000 students.
Instead of asking all students their average marks, we select 100 students and estimate the average.
The calculated value is called an estimate.
Example – Average Marks
Suppose we collect marks of 5 students:
We estimate the average marks using the mean formula.
Where:
-
x̄ (x bar) = estimated mean or average
-
Σx = sum of all values
-
n = number of observations
Calculation:
Mean = (60 + 70 + 80 + 75 + 65) / 5
Mean = 70
So we estimate that the average marks ≈ 70.
A machine learning model uses similar estimates to understand data patterns.
Machine learning models learn by estimating parameters from data.
Examples:
-
Estimating average height of people
-
Estimating probability of spam emails
-
Estimating relationship between study hours and marks
Without estimation, the model cannot learn from data.
Types of Statistical Estimation
1. Point Estimation
Point estimation gives one single value as the estimate.
Example:
Average marks = 70
The value 70 is the point estimate.
2. Interval Estimation
Interval estimation gives a range of values where the true value is likely to lie.
Example:
Average marks may lie between: 68 to 72
This range is called a confidence interval.
Example – Probability Estimation
Suppose in 100 emails:
-
30 are spam
-
70 are not spam
The estimated probability of spam is:
So the machine learning model estimates that 30% of emails are spam.
Statistical estimation is used in many machine learning algorithms such as:
-
Linear Regression
-
Naive Bayes
-
Logistic Regression
-
Gaussian Models
-
Bayesian Models
These algorithms estimate parameters like mean, variance, and probability from data.