Support Vector Machine
Last updated
Was this helpful?
Last updated
Was this helpful?
This is the line same to decision boundary. We now define the margin as the perpendicular distance between the line and the closet data point. We just want to find the line maximizing the margin.
This problem easily changes into the problem because c just changes the scale of w and b.
It is equal to a constrained convex quadratic problem.
This primal problem can be changed into a dual problem.
By Taking derivative and set to zero,
Using these equations, the problem becomes more easy
This is a constrained quadratic programming.
alpha can be zero, or the other thing can be zero. When alpha is nonzero, the training points which have to be on the decision line are called support vectors.
For a new test point z, we need to compute this
Data could be not linearly separable. In this case we use this.
It is also a constrained quadratic programming.
It is an unconstrained strong convex problem, but not necessarily differentiable. Because it doesn't be differentiable over domain, we use subgradient method instead of original gradient method.
Anyway, this gradient method is used to find the optimal w.
The problem above can be solved by another optimization problem.
By taking derivative be equal to 0,
Using KKT condition,
The form is summarized like this.
Code From Scratch
In the KKT condition, there is the condition .
, so
b can be derived from
b is solved by this equation:
In regression problem, we want to permit the error size by . Based on the regression line, the point in the range from to is regarded as the correct point.
SVMs solve binary classification problems by formulating them as convex optimization problems (Vapnik 1998). The optimization problem entails finding the maximum margin separating the hyperplane, while correctly classifying as many training points as possible. SVMs represent this optimal hyperplane with support vectors. -
In the view of support vector, this problem is changed into this form.(let be equal to )
are the optimization parameters of this quadratic problem.(It is just a form which replaces some elements in Lagrangian expression.)
By replacing with other inner product function, we can generalize this method to richer spaces. plays a role of determining how many errors we tolerate.