Shannon Entropy

Measuring the uncertainty inside a distribution.

Shannon entropy quantifies the average uncertainty of a discrete random variable:

H(X) = -\sum_x p(x)\, \log p(x)

Units depend on the log base — bits for $\log_2$ , nats for $\ln$ . A fair coin has $H = 1$ bit; a biased coin has less.

Properties: $H(X) \ge 0$ , with equality iff $X$ is constant. For a uniform distribution on $n$ outcomes, $H = \log n$ — entropy is maximized at maximum uncertainty.

Entropy is the lower bound on the average bits needed to encode samples from $X$ (Shannon's source-coding theorem) — the foundation of compression. It also pops up in machine learning loss functions (cross-entropy), in feature-selection scores, and in statistical mechanics.