Classification Metrics

Precision, recall, ROC, and PR curves — and which to trust when.

Accuracy alone is misleading on imbalanced data — predicting "no fraud" always gets $99\%$ accuracy if fraud is $1\%$ of cases. Better metrics:

Precision: $\text{TP} / (\text{TP} + \text{FP})$ . "Of what I flagged, how much was real?"Recall (sensitivity): $\text{TP} / (\text{TP} + \text{FN})$ . "Of the real positives, how many did I catch?"F1: harmonic mean of precision and recall.ROC curve: TPR vs FPR over all thresholds; AUC summarizes.PR curve: precision vs recall; better than ROC when classes are heavily imbalanced.

Pick the metric to match cost. False alarms on a fraud system are cheap; missed fraud is expensive — recall matters. False alarms in cancer screening trigger painful follow-ups; precision matters too.