Quant GT
Browse all lessons
Section 19 · Lesson 19.5

Support Vector Machines

Maximum-margin classifiers and the kernel trick.

An SVM finds the linear decision boundary that maximizes the margin between two classes. For linearly separable data:

minw,b12w2s.t.yi(wxi+b)1\min_{w, b} \frac{1}{2}\|w\|^2 \quad \text{s.t.} \quad y_i(w^\top x_i + b) \ge 1

The dual formulation depends only on inner products xi,xj\langle x_i, x_j \rangle — the kernel trick replaces them with a kernel function K(xi,xj)K(x_i, x_j) to fit non-linear boundaries (RBF, polynomial, sigmoid).

For non-separable data, slack variables allow some misclassification at a cost (soft-margin SVM). The trade-off is governed by a hyperparameter CC.

SVMs were dominant in the 2000s before being largely supplanted by tree ensembles and deep learning on most problems. They're still used in some structured settings — text classification, anomaly detection, and high-dimensional small-data problems where kernels can capture useful geometry.