ML 1.2 Supervised Learning in Machine Learning
Supervised Learning in Machine Learning
- Supervised learning is a machine learning approach where a model is trained using labeled data.
- Each training example includes input data and the correct output.
- The model learns the relationship between inputs and outputs and then uses that learning to make predictions on new, unseen data.
- In simple terms, the system learns under guidance, similar to how a student learns from a teacher who provides correct answers.
Supervised Learning Works
-
Collect labeled data (input + correct output)
-
Split data into training and testing sets
-
Train the model using the training data
-
Test the model on unseen data
-
Use the model for real-time predictions
Real-Time Examples of Supervised Learning
1. Email Spam Detection (Classification)
-
Input: Email content, sender, keywords
-
Output: Spam or Not Spam
-
Real-time use: Gmail filters spam emails automatically based on past labeled emails.
2. Student Performance Prediction (Regression)
-
Input: Attendance, internal marks, assignment scores
-
Output: Final exam marks
-
Real-time use: Colleges predict student results and identify students who need academic support.
3. Face Recognition (Classification)
-
Input: Image pixels
-
Output: Person name or Unknown
-
Real-time use: Mobile phone face unlock and attendance systems.
4. House Price Prediction (Regression)
-
Input: Location, area size, number of rooms
-
Output: House price
-
Real-time use: Real estate websites estimate property prices instantly.
5. Medical Diagnosis (Classification)
-
Input: Symptoms, test results
-
Output: Disease type or No disease
-
Real-time use: Decision-support systems assist doctors in diagnosis.
6. Credit Approval Systems (Classification)
-
Input: Income, credit score, employment history
-
Output: Loan Approved or Rejected
-
Real-time use: Banks evaluate loan applications instantly.
Common Algorithms Used in Supervised Learning
-
Linear Regression
-
Logistic Regression
-
Decision Tree
-
Random Forest
-
Support Vector Machine (SVM)
-
k-Nearest Neighbors (KNN)
-
Naive Bayes
Types of Supervised Learning
1. Classification
Used when the output is a category or class.
Examples of output:
-
Yes / No
-
Spam / Not Spam
-
Pass / Fail
Classification in Machine Learning
- Classification is a supervised learning technique used to assign input data into predefined categories or classes.
- The model learns from labeled examples and then predicts the class label for new data.
- In short, classification answers the question. “Which category does this data belong to?”
- Classification is a type of supervised learning where the algorithm learns to assign input data to a specific category or class based on input features.
- Classification teaches a machine to sort things into categories.
- The output labels in classification are discrete values.
- Classification algorithms can be binary, where the output is one of two possible classes, or multiclass, where the output can be one of several classes.
- The different Classification algorithms in machine learning are: Logistic Regression, Naive Bayes, Decision Tree, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), etc
Classification Works
-
Collect labeled data (input + class label)
-
Train the model to learn patterns
-
Provide new input data
-
Model predicts the most suitable class
Types of Classification
1. Binary Classification
Only two possible classes.
Examples:
-
Yes / No
-
True / False
-
Pass / Fail
Real-time examples:
-
Email: Spam or Not Spam
-
Loan: Approved or Rejected
-
Medical test: Disease or No Disease
More than two classes.
Examples:
-
Grades: A, B, C, D
-
Traffic signals: Red, Yellow, Green
Real-time examples:
-
Handwritten digit recognition (0 to 9)
-
Student grade prediction
-
Language detection (English, Hindi, Telugu)
3. Multi-label Classification
One input can belong to multiple classes at the same time.
- It is used when there are two or more classes and the data, we want to classify may belong to none of the classes or all of them at the same time, e.g. to classify which traffic signs are contained on an image.
- multi-label classification allows datapoints to belong to multiple classes.
Real-time examples:
-
A movie tagged as Action, Drama, and Thriller
-
News article labeled as Politics and Economy
Common Classification Algorithms
-
Logistic Regression
-
Decision Tree
-
Random Forest
-
Support Vector Machine (SVM)
-
Naive Bayes
-
k-Nearest Neighbors (KNN)
Real-Time Classification Examples Explained
1. Email Spam Detection
-
Input: Email text, sender details
-
Classes: Spam, Not Spam
-
Type: Binary Classification
2. Student Result Prediction
-
Input: Attendance, marks, assignments
-
Classes: Pass, Fail
-
Type: Binary Classification
3. Face Recognition
-
Input: Image pixels
-
Classes: Person A, Person B, Unknown
-
Type: Multiclass Classification
4. Disease Diagnosis
-
Input: Symptoms, test reports
-
Classes: Diabetes, Heart Disease, Normal
-
Type: Multiclass Classification
5. Customer Feedback Analysis
-
Input: Review text
-
Classes: Positive, Neutral, Negative
-
Type: Multiclass Classification
2. Regression
Used when the output is a numerical value.
Regression is a supervised learning technique used to predict a continuous numerical value based on input features.
Unlike classification, which predicts categories, regression predicts quantities.
In simple terms, regression answers the question:
“How much?” or “What value?”
Examples of output:
-
Price
-
Temperature
-
Marks
The different regression algorithms in
machine learning are: Linear Regression, Polynomial Regression, Ridge
Regression, Decision Tree Regression, Random Forest Regression, Support Vector
Regression, etc.
Ø We have two types of variables present in
regression:
·
Dependent
Variable (Target): The
variable we are trying to predict e.g. house price.
·
Independent
Variables (Features): The input
variables that influence the prediction e.g. locality, number of rooms.
Ø The linear regression model provides a sloped
straight line representing the relationship between the variables.
Regression Works
-
Collect labeled data where the output is a number
-
The model learns the relationship between inputs and the numerical output
-
For new input data, the model predicts a value
Simple Example to Understand Regression
House Price Prediction
-
Input features:
-
Area (square feet)
-
Number of bedrooms
-
Location
-
-
Output:
-
House price (in rupees)
-
If a model is trained on past house sales data, it can predict the price of a new house based on its features.
This is regression because the output is a number, not a category.
Real-Time Regression Examples
1. Student Marks Prediction
-
Input: Attendance percentage, internal marks, assignment scores
-
Output: Final exam marks
-
Use case: Identifying students who may need academic support
2. Weather Forecasting
-
Input: Temperature, humidity, wind speed
-
Output: Predicted temperature or rainfall amount
3. Sales Forecasting
-
Input: Past sales data, season, promotions
-
Output: Expected sales value for next month
4. Salary Prediction
-
Input: Experience, education level, skills
-
Output: Estimated salary
5. Stock Price Estimation
-
Input: Historical prices, volume, trends
-
Output: Predicted stock price
Types of Regression
1. Linear Regression
Simple Linear Regression aims to describe how one variable i.e the dependent variable changes in relation with reference to the independent variable.
The relationship between the dependent and independent variables is
represented by the simple linear equation:
Y = mx+by = mx+b
-
Output changes linearly with input
-
Example: Salary increases with experience
2. Multiple Linear Regression
- It extends simple linear regression by
using multiple independent variables to predict target variable.
- Multiple
Linear Regression attempts to model the relationship between two or more
features and a response by fitting a linear equation to observed data.
- We can use
it to find out which factor has the highest impact on the predicted output and
how different variables relate to each other.
- For example, predicting the price of a house based on multiple features
such as size, location, number of rooms, etc.
Uses multiple inputs
-
Example: House price based on area, rooms, and location
3. Polynomial Regression
- Polynomial
Regression is a form of
linear regression in which the relationship between the independent variable x
and dependent variable y is modelled as an nth-degree polynomial.
Polynomial regression fits a nonlinear relationship between the value of x and
the corresponding conditional mean of y, denoted E(y | x).
- While simple
linear regression models the relationship as a straight line, polynomial
regression allows for more flexibility by fitting a polynomial equation to the
data(curve).
- The general form of a polynomial regression equation of degree n is:
- Y = β0 + β1 x+ β2
x2+…+βn xn+ e
(or) y = a + b1 x + b2
x2 + e
where,
·
y is the dependent variable.
·
x is the independent variable.
·
β0,β1,…,βn are the coefficients of the polynomial terms.
·
n is the degree of the polynomial.
· e represents the error term.
-
Models non-linear relationships
-
Example: Growth rate that accelerates over time
iv. Ridge & Lasso Regression:
Ø
Ridge & lasso regression are regularized versions of linear
regression that help avoid overfitting by penalizing large coefficients.
Ø
When there’s a risk of overfitting due to too many features we use these
type of regression algorithms.
v. Support Vector Regression (SVR):
Ø
SVR is a type of regression algorithm that is based on the Support Vector Machine (SVM) algorithm.
Ø
SVM is a type of algorithm that is used for classification tasks but it
can also be used for regression tasks.
Ø
SVR works by finding a hyperplane that minimizes the sum of the squared
residuals between the predicted and actual values.
vi. Decision Tree Regression:
Ø
Decision tree Uses a tree-like structure to make
decisions where each branch of tree represents a decision and leaves represent
outcomes.
Ø
For example, predicting customer behaviour based on features like age,
income, etc there we use decision tree regression.
vii. Random Forest Regression:
Ø
Random Forest is an ensemble method that builds
multiple decision trees and each tree is trained on a different subset of the
training data. The final prediction is made by averaging the predictions of all
of the trees.
Ø
For example, customer sales data using this.
Common Regression Algorithms
-
Linear Regression
-
Multiple Linear Regression
-
Polynomial Regression
-
Decision Tree Regression
-
Random Forest Regression
-
Support Vector Regression (SVR)
Regression vs Classification (Quick View)
| Aspect | Regression | Classification |
|---|---|---|
| Output | Numerical value | Category |
| Example | Price, marks | Spam/Not spam |
| Question answered | How much? | Which class? |