Logistic Regression, visualized
Logistic regression learns a straight line that separates two classes, then squashes the signed distance through a sigmoid to produce a probability. Click the chart to drop points, hit Train, and watch the boundary rotate while the loss curve falls on the right. For threshold/AUC/precision-recall exploration, head to ROC & AUC.
Tune the model
Evaluate
The math, derived
1. The model.
Combine the inputs linearly, then squash through a sigmoid to get a probability:
$$ z \;=\; w_0\,x + w_1\,y + b \qquad \hat{p} \;=\; \sigma(z) \;=\; \frac{1}{1 + e^{-z}} $$$\hat{p} \in (0, 1)$ is interpreted as probability of class 1. Predict class 1 when $\hat{p} > \tau$.
2. The loss — binary cross-entropy.
For each example $(x_i, y_i)$ with label $t_i \in \{0, 1\}$:
$$ L \;=\; -\frac{1}{N} \sum_{i=1}^{N} \Big[\, t_i \log \hat{p}_i + (1 - t_i)\log(1 - \hat{p}_i) \,\Big] $$Cross-entropy is convex in $(w_0, w_1, b)$ — gradient descent finds the global optimum (if you give it enough steps).
3. The gradient.
The chain rule makes the gradient surprisingly clean — the sigmoid’s derivative cancels nicely against the log:
$$ \frac{\partial L}{\partial w_0} \;=\; \frac{1}{N}\sum_i (\hat{p}_i - t_i)\,x_i \qquad \frac{\partial L}{\partial b} \;=\; \frac{1}{N}\sum_i (\hat{p}_i - t_i) $$Same shape as linear regression. Add $+\,2\lambda w_0$ to the $w_0$ gradient (and similarly for $w_1$) if you want L2 regularization.
4. The update.
Step opposite the gradient, scaled by the learning rate:
$$ w_0 \leftarrow w_0 - \eta\,\frac{\partial L}{\partial w_0} \qquad w_1 \leftarrow w_1 - \eta\,\frac{\partial L}{\partial w_1} \qquad b \leftarrow b - \eta\,\frac{\partial L}{\partial b} $$Repeat until the loss curve flattens. If it overshoots and bounces, lower $\eta$. If it crawls, raise $\eta$ (carefully).
Try this
The XOR wall
Hit XOR and Auto-train. Watch the boundary thrash — no single line separates four corners. This is what motivated multi-layer perceptrons in the 1980s.
Learning rate explosion
On blobs, crank η to 0.8+ and train. Loss spikes, weights swing wildly. The boundary may flip back and forth across runs. Halve and retry.
Regularization tightens the line
On overlap, train without L2, then with λ = 0.05. The regularized boundary is straighter — weights are smaller, the model is humbler.
Threshold-induced trade-off
Slide τ from 0.3 to 0.7. The confusion matrix shifts: lower τ catches more positives (TP up, FN down) but also more false alarms (FP up). For threshold sweeping with ROC/AUC, see the metrics page.
Imbalanced reality
imbalanced has 20 negatives, 100 positives. Notice 95% accuracy from the start — that’s near the trivial majority-class baseline. Accuracy lies on imbalanced data; use F1 or AUC instead.
Hand-set the boundary
Without training, slide w₀, w₁, b to draw a line that separates the data by eye. Compare your accuracy to gradient descent’s — humans can match it on easy data.
In one glance
Frequently asked
overlap dataset, try $\lambda = 0.05$ — the boundary becomes smoother and accuracy is more stable.