2.6 Perceptron Convergence and Linear Separability
Perceptron Convergence and Linear Separability
The perceptron algorithm converges only when the dataset is linearly separable. If the data cannot be separated by a straight line or hyperplane, the algorithm will not converge.
Algorithm:
The
perceptron convergence theorem states that, for any data set which is linearly
separable, the perception learning rule or algorithm will converge to a solution in finite no of iterations or find a solution in a finite number of
steps.
The perceptron learning algorithm updates its weight vector w using the following rule:
The
update is performed only when the perceptron misclassifies a data point.
The theorem guarantees that:
- If the data is linearly separable, the perceptron algorithm will converge in a finite number of steps.
- If the data is not linearly separable, the perceptron will continue updating indefinitely.
Linear Separability
A dataset is linearly separable if we can separate the two classes using:
-
A straight line in 2D
-
A plane in 3D
-
A hyperplane in higher dimensions
Mathematically, there exists a weight vector w and bias b such that:
for all training samples. This means every point is correctly classified by the same linear boundary.
Example 1: - Linearly Separable Data (OR Gate)
Geometrically, this data can be separated by the line:
A single straight line separates class 0 and class 1. So the perceptron will converge.
example 2: Non-Linearly Separable Data (XOR)
Geometrically, this data can be separated by the line:
This is the XOR gate.
Check the points
Class 0 → (0,0) and (1,1)
Class 1 → (0,1) and (1,0)
If you plot these points:
-
The two 1’s are diagonally opposite
-
The two 0’s are also diagonally opposite
No single straight line can separate them.
There is no linear equation of the form
that can separate these two classes.
So, XOR is not linearly separable.
Perceptron Convergence
- Convergence means After a finite number of updates, the perceptron finds weights that correctly classify all training samples.
Perceptron Convergence Theorem
If the dataset is:
Linearly separable
Learning rate is positive
Then the perceptron algorithm is guaranteed to converge in a finite number of steps. If the dataset is not linearly separable, convergence is not guaranteed.
Convergence Happens (Geometric Intuition) each time the perceptron misclassifies a point:
Geometrically:
The weight vector rotates toward correctly classified direction.
The decision boundary shifts gradually.
With separable data, eventually all points lie on correct sides.
The weight vector keeps moving closer to an optimal separating hyperplane.
Perceptron Learning Algorithm Convergence steps:
Step 1: Initialization
-
Initialize all weights to zero (or small random values).
-
Initialize bias .
-
Choose learning rate .
-
Let iteration index be
Step 2: Activation (Apply Input)
For each training example at time step :
-
Input vector:
-
Desired output:
Step 3: Compute Actual Output
First compute cumulative input (weighted sum):
Then apply activation function.
For bipolar output (−1, 0, 1):
In most practical perceptron models, we use:
Step 4: Weight Update Rule (Adaptation)
Update weights only if there is an error.
Update bias:
Where:
-
-
→ learning rate
-
→ expected output
-
→ actual output
-
→ input vector
If , then no update.
Step 5: Continuation
-
Increase time step.
-
Repeat Steps 2 to 4 for all training samples.
-
Stop when:
-
All samples are correctly classified
-
Or maximum iterations reached
-
When:
Then:
No change in weights.