Loss Function Explorer
Develop an intuitive, 3D understanding of how loss functions measure prediction error and why gradient-based optimization always finds a better solution.
Interactive Controls
Current Loss
2.2500
Gradient ∂L/∂ŷ
3.0000
2.000
-33
0.500
-33
0.050
0.0010.5
40
5100
Gradient Descent Animation
Step 0 / -10% complete
Loss
0.0000
Gradient
0.0000
Prediction
0.0000
Loss Reduction↓ 0.0%
✓ Converged! Final ŷ = 0.0000, Loss = 0.000000
Recent Steps
Optimization Insights
Deep-dive explanations of gradient descent theory. Click any card to expand.
The Core Update Rule
θ_new = θ_old − α × ∇L(θ)
where α = learning rate, ∇L = gradient of loss
Loading 3D Engine…
Loss Curve(ŷ vs Loss, y = 0.50)
Mean Squared Error — Mathematical Analysis
Loss Formula
What It Measures
MSE measures the average squared difference between predictions and actual values. Squaring penalizes large errors far more than small ones.
When To Use
Regression problems where outliers should be heavily penalized — house price prediction, stock forecasting, temperature estimation.
Advantages
- Smooth, differentiable everywhere
- Convex — guarantees a single global minimum
- Large errors are penalized disproportionately (good for outlier-sensitive tasks)
- Mathematically clean for linear regression (closed-form solution exists)
Limitations
- Very sensitive to outliers — one bad point can dominate the loss
- Values are in squared units, making interpretation harder
- Can cause very large gradients when errors are large
Intuition
“MSE behaves like a stretched rubber band. The farther your prediction from reality, the harder it snaps back — and the force grows with the square of the distance.”
🎯Step 01
Pick a Loss
Select MSE, MAE, BCE, or Hinge to explore its unique shape.
🎛️Step 02
Adjust Controls
Move the sliders to see how prediction error changes the loss in real-time.
🌄Step 03
Watch the Surface
Observe how the 3D landscape changes. Low regions = good predictions.
⚡Step 04
Run Optimization
Hit "Start Optimization" and watch gradient descent roll downhill step-by-step.
Key Takeaways
Loss functions quantify how wrong a prediction is.
Gradient descent moves in the direction that reduces loss the fastest.
MSE squares errors — great for regression, sensitive to outliers.
MAE treats all errors equally — more robust to extreme values.
Cross-Entropy punishes confident wrong predictions exponentially.
Hinge loss enforces a safety margin — the basis of SVMs.
Learning rate controls step size — too large → diverge, too small → slow.
Most real loss landscapes are non-convex with many local minima.