Variational Lossy Autoencoder

Understanding VAEs as compression systems with a rate-distortion trade-off

Variational Lossy Autoencoder (VLAE) is a way of looking at VAEs through the lens of compression. Instead of asking only “can the model reconstruct the input?”, it asks “how many bits should the latent code use, and what reconstruction quality do we get in return?”

Read VAE first if you are new to the topic. This page is best understood as a second pass that explains what the VAE objective means.

A Compression Analogy

Think about compressing an image before sending it over a network.

  • If you keep many details, the file is large but reconstruction is accurate
  • If you compress aggressively, the file is small but details are lost

VLAE says a VAE is solving that same trade-off in learned form.

VAEs as Compressors

A VAE has two main parts:

  • Encoder q(zx)q(z|x): turn the input into a latent code
  • Decoder p(xz)p(x|z): reconstruct the input from that code

This suggests two competing goals:

  • Rate: how many bits are needed to describe the latent
  • Distortion: how much reconstruction error we allow

The Rate-Distortion Objective

The VAE objective can be written as:

L=Eq(zx)[logp(xz)]Distortion+βDKL(q(zx)p(z))Rate\mathcal{L} = \underbrace{-\mathbb{E}_{q(z|x)}[\log p(x|z)]}_{\text{Distortion}} + \underbrace{\beta \cdot D_{KL}(q(z|x) \| p(z))}_{\text{Rate}}

The first term rewards accurate reconstruction. The second term discourages the model from storing too much information in the latent code.

Interactive Demo

Explore the rate-distortion trade-off and bits-back coding:

Variational Lossy Autoencoder

Rate-Distortion Trade-off (β)0.50
High FidelityHigh Compression
📷
Input x
z
Latent
📷
Recon x̂
85%
Reconstruction
50%
Compression
20%
Bits-Back Gain
The VLAE Objective
L = E[log p(x|z)] - β · DKL(q(z|x) || p(z))
β controls rate-distortion trade-off. Higher β = more compression, less fidelity.

Why This View Is Useful

Once you think of a VAE as a compressor, several behaviors make more sense:

  • a stronger decoder can sometimes ignore the latent
  • changing β\beta changes how much information the latent is allowed to carry
  • “posterior collapse” becomes easier to interpret

Bits-Back Coding

The famous bits-back argument says the KL term can overstate the true effective cost of the latent because stochastic decoding can recover some information “for free.”

Rate=DKL(q(zx)p(z))\text{Rate} = D_{KL}(q(z|x) \| p(z))

That is the technical idea behind why VAEs connect so naturally to information theory and compression.

For a first pass, the most important takeaway is simpler: the latent cost is not just a regularizer; it is part of a coding trade-off.

Posterior Collapse

If the decoder becomes so strong that it can reconstruct well without using the latent, then the model may push:

q(zx)p(z)q(z|x) \approx p(z)

In that case, the latent carries very little information. This is called posterior collapse.

Common Fixes

MethodGoal
KL annealingLet reconstruction stabilize before strongly penalizing rate
Free bitsReserve a minimum amount of information in the latent
β\beta-VAEDirectly control the rate-distortion trade-off

What To Remember

  • VLAE interprets VAEs as lossy compression systems
  • The KL term is about information budget, not just regularization
  • Posterior collapse means the model has stopped using the latent effectively

Key Paper

Found an error or want to contribute? Edit on GitHub