Quant GT
Browse all lessons
Section 19 · Lesson 19.3

Cross-Validation

Estimating out-of-sample performance honestly.

Cross-validation estimates how a model will perform on unseen data by splitting the available data into training and validation parts. The simplest version is kk-fold: divide data into kk groups, train on k1k-1 and validate on the held-out group, rotate, and average.

Why bother? In-sample error always underestimates out-of-sample error because the model has been tuned to the data it's evaluated on. Cross-validation removes this leak.

Beware of look-ahead bias and shuffling in time series. For temporal data, use forward-chaining cross-validation (train on past, validate on future) rather than random splits — random splits can leak future information into training.

In quant trading, cross-validation results are necessary but not sufficient: even rigorous CV can produce overconfident estimates due to dataset re-use across many model trials. Walk-forward backtests and out-of-sample lockboxes provide harder-to-fool reality checks.