Connecting VAEs to lossy compression and the bits-back coding argument
Variational Lossy Autoencoder (VLAE) reframes VAEs through the lens of lossy compression, providing insights into the rate-distortion trade-off and introducing the bits-back coding argument.
VAEs as Compression
A VAE can be viewed as a lossy compressor:
- Encoder : Compress input to latent code
- Decoder : Decompress latent back to reconstruction
- Rate: Bits needed to transmit
- Distortion: Reconstruction error
The Rate-Distortion Trade-off
The VAE objective balances rate and distortion:
The KL term is exactly the rate—bits to encode the latent.
Interactive Demo
Explore the rate-distortion trade-off and bits-back coding:
Variational Lossy Autoencoder
Bits-Back Coding
A key insight: encoding with a learned posterior is more efficient than it appears.
Naive view:
Bits-back view: The decoder’s stochasticity can encode additional information “for free.”
Why Bits-Back Works
When transmitting with a stochastic decoder:
- Sample and transmit
- Decoder samples reconstruction—this randomness carries extra bits!
- Net rate is lower than the KL term alone
This explains why VAEs with powerful decoders don’t always use the latent.
The Posterior Collapse Problem
If the decoder is too powerful:
The model ignores entirely! VLAE analysis explains this as achieving zero rate.
Fixing Posterior Collapse
Solutions motivated by rate-distortion:
| Approach | Effect |
|---|---|
| KL annealing | Gradually increase rate penalty |
| Free bits | Minimum rate budget per dimension |
| δ-VAE | Explicit rate constraint |
Connection to β-VAE
β-VAE is exactly VLAE with controllable rate:
- : Allow higher rate, better reconstruction
- : Force lower rate, more compression/disentanglement
Information-Theoretic View
The KL bounds mutual information. Minimizing it limits how much “knows” about .
Key Insights
- VAEs are compressors: ELBO = rate + distortion
- Bits-back is real: Stochastic decoders provide free bits
- Posterior collapse explained: Zero-rate solution is valid
- β controls trade-off: Disentanglement = low rate
Key Paper
- Variational Lossy Autoencoder — Chen et al. (2016)
https://arxiv.org/abs/1611.02731