Variational Lossy Autoencoder

Connecting VAEs to lossy compression and the bits-back coding argument

Variational Lossy Autoencoder (VLAE) reframes VAEs through the lens of lossy compression, providing insights into the rate-distortion trade-off and introducing the bits-back coding argument.

VAEs as Compression

A VAE can be viewed as a lossy compressor:

  • Encoder q(zx)q(z|x): Compress input to latent code
  • Decoder p(xz)p(x|z): Decompress latent back to reconstruction
  • Rate: Bits needed to transmit zz
  • Distortion: Reconstruction error

The Rate-Distortion Trade-off

The VAE objective balances rate and distortion:

L=Eq(zx)[logp(xz)]Distortion+βDKL(q(zx)p(z))Rate\mathcal{L} = \underbrace{-\mathbb{E}_{q(z|x)}[\log p(x|z)]}_{\text{Distortion}} + \underbrace{\beta \cdot D_{KL}(q(z|x) \| p(z))}_{\text{Rate}}

The KL term is exactly the rate—bits to encode the latent.

Interactive Demo

Explore the rate-distortion trade-off and bits-back coding:

Variational Lossy Autoencoder

Rate-Distortion Trade-off (β)0.50
High FidelityHigh Compression
📷
Input x
z
Latent
📷
Recon x̂
85%
Reconstruction
50%
Compression
20%
Bits-Back Gain
The VLAE Objective
L = E[log p(x|z)] - β · DKL(q(z|x) || p(z))
β controls rate-distortion trade-off. Higher β = more compression, less fidelity.

Bits-Back Coding

A key insight: encoding with a learned posterior is more efficient than it appears.

Naive view:

Rate=DKL(q(zx)p(z))\text{Rate} = D_{KL}(q(z|x) \| p(z))

Bits-back view: The decoder’s stochasticity can encode additional information “for free.”

Effective Rate=DKL(q(zx)p(z))H[p(xz)]\text{Effective Rate} = D_{KL}(q(z|x) \| p(z)) - H[p(x|z)]

Why Bits-Back Works

When transmitting with a stochastic decoder:

  1. Sample zq(zx)z \sim q(z|x) and transmit
  2. Decoder samples reconstruction—this randomness carries extra bits!
  3. Net rate is lower than the KL term alone

This explains why VAEs with powerful decoders don’t always use the latent.

The Posterior Collapse Problem

If the decoder is too powerful:

p(xz)p(x)q(zx)p(z)p(x|z) \approx p(x) \quad \Rightarrow \quad q(z|x) \approx p(z)

The model ignores zz entirely! VLAE analysis explains this as achieving zero rate.

Fixing Posterior Collapse

Solutions motivated by rate-distortion:

ApproachEffect
KL annealingGradually increase rate penalty
Free bitsMinimum rate budget per dimension
δ-VAEExplicit rate constraint

Connection to β-VAE

β-VAE is exactly VLAE with controllable rate:

Lβ=Distortion+βRate\mathcal{L}_{\beta} = \text{Distortion} + \beta \cdot \text{Rate}
  • β<1\beta < 1: Allow higher rate, better reconstruction
  • β>1\beta > 1: Force lower rate, more compression/disentanglement

Information-Theoretic View

I(X;Z)DKL(q(zx)p(z))I(X; Z) \leq D_{KL}(q(z|x) \| p(z))

The KL bounds mutual information. Minimizing it limits how much ZZ “knows” about XX.

Key Insights

  1. VAEs are compressors: ELBO = rate + distortion
  2. Bits-back is real: Stochastic decoders provide free bits
  3. Posterior collapse explained: Zero-rate solution is valid
  4. β controls trade-off: Disentanglement = low rate

Key Paper

Found an error or want to contribute? Edit on GitHub