2.6 Perceptron Convergence and Linear Separability

- February 16, 2026

Perceptron Convergence and Linear Separability

The perceptron algorithm converges only when the dataset is linearly separable. If the data cannot be separated by a straight line or hyperplane, the algorithm will not converge.

Algorithm:

The perceptron convergence theorem states that, for any data set which is linearly separable, the perception learning rule or algorithm will converge to a solution in finite no of iterations or find a solution in a finite number of steps.

The perceptron learning algorithm updates its weight vector w using the following rule:

The update is performed only when the perceptron misclassifies a data point.

The theorem guarantees that:

If the data is linearly separable, the perceptron algorithm will converge in a finite number of steps.

If the data is not linearly separable, the perceptron will continue updating indefinitely.

Linear Separability

A dataset is linearly separable if we can separate the two classes using:

A straight line in 2D
A plane in 3D
A hyperplane in higher dimensions

Mathematically, there exists a weight vector w and bias b such that:

$y_i (w \cdot x_i + b) > 0$ for all training samples. This means every point is correctly classified by the same linear boundary.

Example 1: - Linearly Separable Data (OR Gate)

Geometrically, this data can be separated by the line:

A single straight line separates class 0 and class 1. So the perceptron will converge.

example 2: Non-Linearly Separable Data (XOR)

Geometrically, this data can be separated by the line:

This is the XOR gate.

Check the points

Class 0 → (0,0) and (1,1)
Class 1 → (0,1) and (1,0)

If you plot these points:

The two 1’s are diagonally opposite
The two 0’s are also diagonally opposite

No single straight line can separate them.

There is no linear equation of the form

$w_1 x_1 + w_2 x_2 + b = 0$

that can separate these two classes.

So, XOR is not linearly separable.

Perceptron Convergence

Convergence means After a finite number of updates, the perceptron finds weights that correctly classify all training samples.

Perceptron Convergence Theorem

If the dataset is:

Linearly separable
Learning rate is positive

Then the perceptron algorithm is guaranteed to converge in a finite number of steps. If the dataset is not linearly separable, convergence is not guaranteed.

Convergence Happens (Geometric Intuition) each time the perceptron misclassifies a point:

$w = w + η y x$

Geometrically:

The weight vector rotates toward correctly classified direction.
The decision boundary shifts gradually.
With separable data, eventually all points lie on correct sides.

The weight vector keeps moving closer to an optimal separating hyperplane.

Perceptron Learning Algorithm Convergence steps:

Step 1: Initialization

Initialize all weights to zero (or small random values).
Initialize bias $b = 0$ .
Choose learning rate $\eta > 0$ .
Let iteration index be $n = 1, 2, 3, \dots$

Step 2: Activation (Apply Input)

For each training example at time step $n$ :

Input vector: $x(n)$
Desired output: $d(n)$

Step 3: Compute Actual Output

First compute cumulative input (weighted sum):

$y_{in}(n) = b + \sum_{i=1}^{m} w_i(n)x_i(n)$

Then apply activation function.

For bipolar output (−1, 0, 1):

$y(n) = \begin{cases} 1 & \text{if } y_{in}(n) > \theta \\ 0 & \text{if } -\theta \le y_{in}(n) \le \theta \\ -1 & \text{if } y_{in}(n) < -\theta \end{cases}$

In most practical perceptron models, we use:

$y(n) = \begin{cases} 1 & \text{if } y_{in}(n) \ge 0 \\ 0 & \text{if } y_{in}(n) < 0 \end{cases}$

Step 4: Weight Update Rule (Adaptation)

Update weights only if there is an error.

$w (n + 1) = w (n) + η [d (n) - y (n)] x (n)$

Update bias:

$b (n + 1) = b (n) + η [d (n) - y (n)]$

Where:

$w (n) \to old weight$
$\eta$ → learning rate
$d(n)$ → expected output
$y(n)$ → actual output
$x(n)$ → input vector

If $d(n) - y(n) = 0$ , then no update.

Step 5: Continuation

Increase time step.
Repeat Steps 2 to 4 for all training samples.
Stop when:
- All samples are correctly classified
- Or maximum iterations reached

When:

$d (n) - y (n) = 0$

Then:

$w (n + 1) = w (n)$

No change in weights.

Search This Blog

ROHIT's Smart Class Room