Supervised vs Unsupervised

Learning from labels versus learning structure from data alone.

Supervised learning fits a function $f: X \to Y$ from labeled training data $(x_i, y_i)$ . The labels $y$ might be discrete (classification: spam vs not, default vs not) or continuous (regression: predicted return, predicted volatility).

Unsupervised learning has no labels — only inputs $x_i$ . The goal is to find structure: clusters of similar points, low-dimensional manifolds, anomalous outliers, or generative distributions.

Semi-supervised mixes the two: a few labels and many unlabeled points. Self-supervised learning, dominant in modern NLP, creates labels from the data itself (predict the next word given the previous ones).

In quant work, supervised methods predict returns, default risk, and execution slippage. Unsupervised methods find regime clusters, factor structures, and trade-pattern anomalies.