Quant GT
Browse all lessons
Section 16 · Lesson 16.2

Kullback–Leibler Divergence

How different are two distributions, in nats?

Kullback–Leibler (KL) divergence measures how different a distribution PP is from a reference QQ:

DKL(PQ)=xP(x)logP(x)Q(x)D_{\mathrm{KL}}(P \| Q) = \sum_x P(x) \log \frac{P(x)}{Q(x)}

KL is always non-negative and zero iff P=QP = Q. It is not symmetric: DKL(PQ)DKL(QP)D_{\mathrm{KL}}(P \| Q) \ne D_{\mathrm{KL}}(Q \| P) in general, so it's not a true metric.

KL is the workhorse of probabilistic ML. Variational inference picks an approximate posterior QQ to minimize DKL(QP)D_{\mathrm{KL}}(Q \| P). Cross-entropy loss in classification is essentially the KL between empirical and predicted labels (up to a constant). In risk, KL appears in entropy-based VaR estimates and model-validation tests.