Quant GT
Browse all lessons
Section 20 · Lesson 20.6

Loss Functions

MSE, cross-entropy, Huber, and when to pick each.

The loss function defines what the network is being trained to do. Common choices:

Mean Squared Error (MSE): (yy^)2(y - \hat{y})^2. Standard for regression. Sensitive to outliers.Mean Absolute Error (MAE) / Huber: yy^|y - \hat{y}| or a smooth blend. More robust to outliers, harder to optimize than MSE.Cross-entropy: for classification, cyclogp^c-\sum_c y_c \log \hat{p}_c. Softmax + cross-entropy is the standard for multi-class.Binary cross-entropy: ylogp^(1y)log(1p^)-y \log \hat{p} - (1-y)\log(1-\hat{p}).Custom losses: Sharpe-aware loss, drawdown-penalized loss, asymmetric loss for asymmetric costs of over- vs. under-prediction.

Pick the loss to match the cost structure of mistakes. In trading, the cost of a wrong-direction prediction usually dwarfs the cost of size, so directionally-aware losses (rank loss, sign-prediction) often outperform pure MSE on returns.