ROC, AUC & PR, visualized
A trained classifier outputs probabilities; threshold turns those into class labels. ROC plots TPR vs FPR as τ sweeps from 0 to 1; PR plots precision vs recall. Drag the threshold slider below to move the magenta point on both curves. For training internals see Logistic Regression →.
Theory & exercises · math derivation, watch-outs, and ideas to try
The math, derived
1. The four counts.
Pick a threshold $\tau$. For each example with probability $\hat{p}_i$ and label $t_i$, the predicted class is $\hat{y}_i = \mathbf{1}[\hat{p}_i \ge \tau]$. Tally:
$$ \text{TP, FP, FN, TN} \;=\; \text{counts of } (\hat{y},\, t) \in \{(1,1),(1,0),(0,1),(0,0)\} $$2. The two rates.
$$ \text{TPR (recall)} \;=\; \frac{TP}{TP + FN} \qquad \text{FPR} \;=\; \frac{FP}{FP + TN} \qquad \text{Precision} \;=\; \frac{TP}{TP + FP} $$
As $\tau$ falls, more examples are predicted positive — TPR rises (good!), FPR rises (bad), precision usually falls. The whole game is in the trade-off.
3. ROC AUC — sweep all thresholds.
Plot $(\text{FPR}(\tau),\, \text{TPR}(\tau))$ for every $\tau \in [0, 1]$. The area under the curve is
$$ \text{AUC} \;=\; \int_0^1 \text{TPR}(\text{FPR}^{-1}(u))\, du \;=\; \Pr\big[\,\hat{p}_{+} > \hat{p}_{-}\,\big] $$The last equality is the key intuition: AUC is the probability that a randomly chosen positive example is ranked higher than a randomly chosen negative one. Threshold-independent.
4. PR AUC — the imbalance-aware sibling.
Plot precision vs recall as $\tau$ sweeps. PR AUC (a.k.a. average precision):
$$ \text{AP} \;=\; \sum_{i} \big(\text{recall}_i - \text{recall}_{i-1}\big)\,\text{precision}_i $$On a 99/1 imbalanced dataset, ROC AUC can be near 1.0 even when the classifier is barely better than majority-class. PR AUC stays sensitive because true negatives don’t enter the formulas at all.
5. F1 — the harmonic mean.
If you have to pick one $\tau$, F1 combines precision and recall into a single score (penalizing extreme imbalances between the two):
$$ F_1 \;=\; \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} $$Pick $\tau$ to maximize F1 when you don’t have a domain-specific cost matrix for FP vs FN.
Try this
Operating-point sweep
Slide τ from 0.05 to 0.95. Watch the magenta dot trace the entire ROC curve. At τ → 1, FPR=0 and TPR=0 (predict nothing positive). At τ → 0, both are 1.
Imbalanced data → use PR
Hit imbalanced. Auto-train a bit. Notice ROC AUC stays high (~0.9+) while PR AUC dips. The PR view tells the truth about minority-class performance.
Find the F1-optimal threshold
On the overlap dataset after training, slide τ until F1 peaks. It’s usually not 0.5 — depends on class balance and the cost of each error type.
Perfect separability
Hit near-perfect and Auto-train. ROC curve hugs the top-left, AUC ≈ 1.0. Confusion matrix shows zero (or near-zero) off-diagonals at τ = 0.5.
An "anti-classifier"
Hit blobs, then manually set w₀ = -1, w₁ = -1, b = 0 (without training). ROC AUC drops below 0.5 — the model is anti-correlated with the truth. Inverting predictions would beat it.
Confusion-matrix sweep
At each τ, the four cells trade off. Lower τ: TP and FP up, TN and FN down. The four metrics in the KPI row each respond differently — F1 most stably, accuracy least.