Expanding receptive fields exponentially without losing resolution or adding parameters
Dilated convolutions (also called atrous convolutions) expand the receptive field exponentially without increasing the number of parameters or reducing spatial resolution. This paper introduced them for dense prediction tasks like semantic segmentation.
The Problem
Standard convolutions face a tradeoff:
- Pooling increases receptive field but loses spatial resolution
- Larger kernels increase receptive field but add parameters quadratically
For dense prediction (assigning a label to every pixel), we need both large context and high resolution.
Dilated Convolution
A dilated convolution with dilation factor samples input at intervals:
A 3×3 kernel with dilation has a receptive field of while using only 9 parameters.
Exponential Expansion
By stacking layers with dilations , the receptive field grows exponentially:
| Layer | Dilation | Receptive Field |
|---|---|---|
| 1 | 1 | 3×3 |
| 2 | 2 | 7×7 |
| 3 | 4 | 15×15 |
| 4 | 8 | 31×31 |
After layers: receptive field.
Interactive Demo
Visualize how different dilation rates expand the receptive field:
Dilated Convolutions
Context Module
The paper proposes a multi-scale context aggregation module:
Multiple parallel dilated convolutions capture information at different scales, which are then combined for the final prediction.
Key Properties
- No resolution loss: Unlike pooling, dilated convolutions maintain spatial dimensions
- Parameter efficient: Same number of weights as standard convolution
- Flexible: Dilation rate can be adjusted per layer or learned
Applications
Dilated convolutions became foundational for:
- Semantic segmentation (DeepLab, PSPNet)
- Audio generation (WaveNet)
- Time series (TCN - Temporal Convolutional Networks)
Key Paper
- Multi-Scale Context Aggregation by Dilated Convolutions — Yu, Koltun (2015)
https://arxiv.org/abs/1511.07122