Loss Function Explorer

Develop an intuitive, 3D understanding of how loss functions measure prediction error and why gradient-based optimization always finds a better solution.

Interactive Controls

Current Loss
2.2500
Gradient ∂L/∂ŷ
3.0000
2.000
-33
0.500
-33
0.050
0.0010.5
40
5100

Gradient Descent Animation

Step 0 / -10% complete
Loss
0.0000
Gradient
0.0000
Prediction
0.0000
Loss Reduction0.0%
✓ Converged! Final ŷ = 0.0000, Loss = 0.000000
Recent Steps

Optimization Insights

Deep-dive explanations of gradient descent theory. Click any card to expand.

The Core Update Rule
θ_new = θ_old − α × ∇L(θ)
where α = learning rate, ∇L = gradient of loss

Loading 3D Engine…

Loss Curve(ŷ vs Loss, y = 0.50)

Mean Squared Error — Mathematical Analysis

Loss Formula
L=(yy^)2L = (y - \hat{y})^2

What It Measures

MSE measures the average squared difference between predictions and actual values. Squaring penalizes large errors far more than small ones.

When To Use

Regression problems where outliers should be heavily penalized — house price prediction, stock forecasting, temperature estimation.

Advantages

  • Smooth, differentiable everywhere
  • Convex — guarantees a single global minimum
  • Large errors are penalized disproportionately (good for outlier-sensitive tasks)
  • Mathematically clean for linear regression (closed-form solution exists)

Limitations

  • Very sensitive to outliers — one bad point can dominate the loss
  • Values are in squared units, making interpretation harder
  • Can cause very large gradients when errors are large

Intuition

MSE behaves like a stretched rubber band. The farther your prediction from reality, the harder it snaps back — and the force grows with the square of the distance.
🎯Step 01

Pick a Loss

Select MSE, MAE, BCE, or Hinge to explore its unique shape.

🎛️Step 02

Adjust Controls

Move the sliders to see how prediction error changes the loss in real-time.

🌄Step 03

Watch the Surface

Observe how the 3D landscape changes. Low regions = good predictions.

Step 04

Run Optimization

Hit "Start Optimization" and watch gradient descent roll downhill step-by-step.

Key Takeaways

Loss functions quantify how wrong a prediction is.
Gradient descent moves in the direction that reduces loss the fastest.
MSE squares errors — great for regression, sensitive to outliers.
MAE treats all errors equally — more robust to extreme values.
Cross-Entropy punishes confident wrong predictions exponentially.
Hinge loss enforces a safety margin — the basis of SVMs.
Learning rate controls step size — too large → diverge, too small → slow.
Most real loss landscapes are non-convex with many local minima.