Quant GT
Browse all lessons
Section 16 · Lesson 16.3

Mutual Information

Quantifying the information one variable carries about another.

Mutual information measures how much knowing one variable reduces uncertainty about another:

I(X;Y)=H(X)H(XY)=H(Y)H(YX)I(X; Y) = H(X) - H(X \mid Y) = H(Y) - H(Y \mid X)

Equivalently, I(X;Y)=DKL(P(X,Y)P(X)P(Y))I(X; Y) = D_{\mathrm{KL}}(P(X, Y) \| P(X) P(Y)) — KL between the joint and the product of marginals. I(X;Y)=0I(X; Y) = 0 iff XX and YY are independent.

Unlike correlation, mutual information captures non-linear dependencies. Two variables can have ρ=0\rho = 0 but high II — e.g. Y=X2Y = X^2 on a symmetric XX.

Applications: feature selection (drop features with low II relative to the target), causal-graph learning, signal-processing decoding, and any setting where you suspect non-linear dependence.