2.6 Perceptron Convergence and Linear Separability

Perceptron Convergence and Linear Separability

The perceptron algorithm converges only when the dataset is linearly separable. If the data cannot be separated by a straight line or hyperplane, the algorithm will not converge.

Algorithm: 

The perceptron convergence theorem states that, for any data set which is linearly separable, the perception learning rule or algorithm will converge to a solution in finite no of iterations or find a solution in a finite number of steps.

The perceptron learning algorithm updates its weight vector w using the following rule:

 


The update is performed only when the perceptron misclassifies a data point.

The theorem guarantees that: 

  • If the data is linearly separable, the perceptron algorithm will converge in a finite number of steps.
  • If the data is not linearly separable, the perceptron will continue updating indefinitely.


Linear Separability

A dataset is linearly separable if we can separate the two classes using:

  • A straight line in 2D

  • A plane in 3D

  • A hyperplane in higher dimensions

Mathematically, there exists a weight vector w and bias b such that:

yi(wxi+b)>0y_i (w \cdot x_i + b) > 0for all training samples. This means every point is correctly classified by the same linear boundary.

Example 1: - Linearly Separable Data (OR Gate)


Geometrically, this data can be separated by the line:





A single straight line separates class 0 and class 1. So the perceptron will converge.

example 2: Non-Linearly Separable Data (XOR)


Geometrically, this data can be separated by the line:

This is the XOR gate.

Check the points

Class 0 → (0,0) and (1,1)
Class 1 → (0,1) and (1,0)

If you plot these points:

  • The two 1’s are diagonally opposite

  • The two 0’s are also diagonally opposite

No single straight line can separate them.

There is no linear equation of the form

w1x1+w2x2+b=0w_1 x_1 + w_2 x_2 + b = 0

that can separate these two classes.

So, XOR is not linearly separable.



Perceptron Convergence

  • Convergence means After a finite number of updates, the perceptron finds weights that correctly classify all training samples.

Perceptron Convergence Theorem

If the dataset is:

  • Linearly separable

  • Learning rate is positive

Then the perceptron algorithm is guaranteed to converge in a finite number of steps. If the dataset is not linearly separable, convergence is not guaranteed.


Convergence Happens (Geometric Intuition) each time the perceptron misclassifies a point:

w=w+ηyx

Geometrically:

  • The weight vector rotates toward correctly classified direction.

  • The decision boundary shifts gradually.

  • With separable data, eventually all points lie on correct sides.

The weight vector keeps moving closer to an optimal separating hyperplane.


Perceptron Learning Algorithm Convergence steps: 

Step 1: Initialization

  • Initialize all weights to zero (or small random values).

  • Initialize bias b=0b = 0.

  • Choose learning rate η>0\eta > 0.

  • Let iteration index be n=1,2,3,n = 1, 2, 3, \dots

Step 2: Activation (Apply Input)

For each training example at time step nn:

  • Input vector: x(n)x(n)

  • Desired output: d(n)d(n)

Step 3: Compute Actual Output

First compute cumulative input (weighted sum):

yin(n)=b+i=1mwi(n)xi(n)y_{in}(n) = b + \sum_{i=1}^{m} w_i(n)x_i(n)

Then apply activation function.

For bipolar output (−1, 0, 1):

y(n)={1if yin(n)>θ0if θyin(n)θ1if yin(n)<θy(n) = \begin{cases} 1 & \text{if } y_{in}(n) > \theta \\ 0 & \text{if } -\theta \le y_{in}(n) \le \theta \\ -1 & \text{if } y_{in}(n) < -\theta \end{cases}

In most practical perceptron models, we use:

y(n)={1if yin(n)00if yin(n)<0y(n) = \begin{cases} 1 & \text{if } y_{in}(n) \ge 0 \\ 0 & \text{if } y_{in}(n) < 0 \end{cases}

Step 4: Weight Update Rule (Adaptation)

Update weights only if there is an error.

w(n+1)=w(n)+η[d(n)y(n)]x(n)

Update bias:

b(n+1)=b(n)+η[d(n)y(n)]

Where:

  • w(n) → old weight

  • η\eta → learning rate

  • d(n)d(n) → expected output

  • y(n)y(n) → actual output

  • x(n)x(n) → input vector

If d(n)y(n)=0d(n) - y(n) = 0, then no update.

Step 5: Continuation

  • Increase time step.

  • Repeat Steps 2 to 4 for all training samples.

  • Stop when:

    • All samples are correctly classified

    • Or maximum iterations reached

When:

d(n)y(n)=0

Then:

w(n+1)=w(n)

No change in weights.

















































































Popular posts from this blog

operators in c programming

2.4 Arrays in c programming

Variables in c