2.16 Support Vector Machine (SVM)

Support Vector Machine (SVM)

  • Support Vector Machine (SVM) is a supervised machine learning algorithm used for both: Classification problems and Regression problems.
  • However, SVM is mostly used for classification tasks.
  • The goal of SVM is to find the best boundary that separates data into different classes. This boundary is called a Hyperplane.
  • The hyperplane divides the dataset so that new data points can be classified correctly.


Hyperplane

A hyperplane is a decision boundary that separates different classes. The dimension of the hyperplane depends on the number of features.

Example:

If we have two features:

  • Height

  • Weight

The hyperplane will be a straight line separating two classes.


Support Vectors

  • Support vectors are data points that are closest to the hyperplane.
  • These points are important because they determine the position of the hyperplane.
  • If support vectors move, the hyperplane also moves.


Margin

The margin is the distance between:

  • The hyperplane

  • The nearest data points from each class

SVM tries to maximize this margin.

A larger margin means:

  • Better separation of classes

  • Better prediction on new data


Optimal Hyperplane

  • The optimal hyperplane is the boundary that maximizes the margin between classes.
  • This gives the best classification result.


Hard Margin and Soft Margin

Hard Margin

Hard margin means:

  • Data must be perfectly separated

  • No misclassification allowed

Works only when the data is perfectly linearly separable.


Soft Margin

Soft margin allows:

  • Some misclassification

  • Better generalization for real-world data

Most real datasets use soft margin SVM.


Consider the following example:


·        In the given dataset, we have two classes coloured blue and green. Here we can draw a line in between two classes and separate the data into two classes.

·        Let us draw two lines randomly which are linearly separable. Between these two we have to select only one line which shows more accuracy known as hyperplane of SVM.

·        The dimension of the hyperplane depends on the features present in that particular data set which means if there are two features, then the hyperplane will be a straight line. If there are 3 features, then hyperplane will be a 2 – dimension plane.


·        We always create a hyperplane that has a maximum margin, which means maximum distance between the data points. The margin is the distance from the hyperplane to the nearest data points (support vectors) on each side.

·        The data points or vectors that are closest to the hyperplane and which affect the position of the hyperplane are known as Support vector.

·        The best hyperplane which divides the classes with maximum margin is also known as the “hard margin,” which is the one that maximizes the distance between the hyperplane and the nearest data points from both classes.

·        A soft  margin allows for some misclassification in order to find a better generalizing hyperplane.


Applications of SVM

Support Vector Machines are used in many fields such as:

  • Text classification

  • Email spam detection

  • Image classification

  • Face detection

  • Handwriting recognition

  • Bioinformatics

  • Anomaly detection


Types of SVM

There are mainly two types of SVM:

  1. Linear SVM

  2. Non-Linear SVM

1. Linear SVM: -

  1. Linear SVM is used when the data can be separated using a straight line. This type of data is called linearly separable data.

Example:

A straight line can separate these two classes.

Working of Linear SVM

  1. Multiple lines can separate the data.

  2. SVM selects the line with the maximum margin.

  3. The nearest points to the line become support vectors.

  4. That line becomes the optimal hyperplane.


·      In the above example, a data set with classes in which the data is drawn with star color and drawn with the circle. The data can be separated with the help of straight and hence it is called as the linear SVM.

 ·      when we draw multiple number of straight lines to separate this particular data the straight line should be considered as a hyper plane.

 

·      if we consider D as a hyperplane, the nearest data points are become support vectors so we need to draw a parallel line which will test this support vectors and they are parallel to the hyper plane and then this distance can be calculated for both lines.

 

·      Now if you compare two distance margins, first is smaller compared to second, so we can say that second margin is the best line which will divide this particular data set into two classes.so B will be considered as the hyperplane but it may not be the case all the time because we need to do it for all possible lines. The one which will give the maximum margin that should be considered to the calculation of this particular maximum margin 


·      SVM algorithm helps to find the best line or decision boundary; this best boundary or region is called as a hyperplane. SVM algorithm finds the closest data point of the lines from both the classes. These points are called support vectors. The distance between the vectors and the hyperplane is called as margin. And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is called the optimal hyperplane.


Example

Suppose we want to classify students as pass or fail based on:

  • Study hours

  • Attendance

A straight line can separate these groups.

This is a Linear SVM problem.


2. Non-Linear SVM: -

  • Non-Linear SVM is used when data cannot be separated by a straight line. This type of data is called non-linearly separable data.

Example:

Imagine data points forming a circular pattern. A straight line cannot separate them.



SVM uses kernel functions to solve nonlinear problems.

Kernel functions transform the data into a higher dimensional space where it becomes linearly separable.

Common kernel functions include:

  • Linear Kernel

  • Polynomial Kernel

  • Radial Basis Function (RBF)

  • Sigmoid Kernel


Example of Nonlinear SVM

Suppose we have two features:

  • X

  • Y

We can create a new dimension:

Z = X² + Y²

Now the data moves into 3D space, where it can be separated using a plane.

This is called feature transformation.


SVM will divide the datasets into classes in the following way.



Advantages of SVM

  • Works well with high-dimensional data

  • Effective for small datasets

  • Memory efficient

  • Strong theoretical foundation


Disadvantages of SVM

  • Training can be slow for large datasets

  • Selecting the correct kernel function is difficult

  • Less efficient when data contains large noise