The algorithm that enables neural networks to learn by computing gradients efficiently
Backpropagation is the algorithm that makes deep learning possible. It efficiently computes gradients of the loss with respect to every parameter in a neural network, enabling gradient-based optimization.
The Core Problem
To train a neural network, we need:
Naively computing each gradient separately would be prohibitively expensive. Backpropagation does it efficiently.
The Chain Rule
Backpropagation is just the chain rule applied systematically:
If and , we can compute from and local gradient .
Forward and Backward Pass
Forward Pass
Compute outputs layer by layer:
Store activations for the backward pass.
Backward Pass
Propagate gradients from output to input:
Then compute weight gradients:
Interactive Visualization
Watch gradients flow backward through a network:
Backpropagation Flow
Compute activations layer by layer: a = σ(Wa + b)
Propagate gradients: δ = (W^T δ) ⊙ σ'(z)
Computational Graph View
Modern frameworks represent computation as a directed acyclic graph:
- Forward: Traverse graph, compute outputs
- Backward: Traverse in reverse, accumulate gradients
Each node stores:
- Forward function:
- Backward function: given
Common Layer Gradients
| Layer | Forward | Backward |
|---|---|---|
| Linear | ||
| ReLU | ||
| Softmax+CE | ||
| BatchNorm | (complex, involves batch statistics) |
Vanishing/Exploding Gradients
Deep networks can suffer from:
If factors < 1: vanishing gradients (early layers don’t learn) If factors > 1: exploding gradients (unstable training)
Solutions: ReLU, residual connections, careful initialization, normalization.
Automatic Differentiation
Modern frameworks (PyTorch, JAX) implement backprop automatically:
# Forward
y = model(x)
loss = criterion(y, target)
# Backward (computes all gradients)
loss.backward()
# Update
optimizer.step()
Why Backprop Matters
Backpropagation is:
- Efficient: gradient computation for parameters
- General: Works for any differentiable computation graph
- Foundational: Enables all modern deep learning