Quant GT
Browse all lessons
Section 8 · Lesson 8.5

Power and Effect Size

Designing tests that can actually detect what you care about.

Power, sample size, effect size, and significance level are linked by a single relationship: pick any three and the fourth is determined.

For comparing two means with nn per group, the approximate power is

powerΦ ⁣(δn/2σz1α/2)\text{power} \approx \Phi\!\left(\frac{\delta \sqrt{n/2}}{\sigma} - z_{1 - \alpha/2}\right)

where δ\delta is the true difference and σ\sigma is the within-group standard deviation. Bigger effects and bigger samples both raise power; bigger noise lowers it.

Underpowered studies are a notorious problem. They miss real effects and, when they do find significance, tend to overstate the effect size — a phenomenon known as the winner's curse.

In trading, this matters viscerally: if you're A/B testing two strategies that each have daily Sharpe of 11 and you want to detect a true Sharpe difference of 0.50.5 at 80%80\% power, you typically need a couple of years of data. Most "I just compared two backtests" exercises are dramatically underpowered.