Variational Lossy Autoencoder

Variational Lossy Autoencoder (VLAE) is a way of looking at VAEs through the lens of compression. Instead of asking only “can the model reconstruct the input?”, it asks “how many bits should the latent code use, and what reconstruction quality do we get in return?”

Read VAE first if you are new to the topic. This page is best understood as a second pass that explains what the VAE objective means.

A Compression Analogy

Think about compressing an image before sending it over a network.

If you keep many details, the file is large but reconstruction is accurate
If you compress aggressively, the file is small but details are lost

VLAE says a VAE is solving that same trade-off in learned form.

VAEs as Compressors

A VAE has two main parts:

Encoder $q(z|x)$ : turn the input into a latent code
Decoder $p(x|z)$ : reconstruct the input from that code

This suggests two competing goals:

Rate: how many bits are needed to describe the latent
Distortion: how much reconstruction error we allow

The Rate-Distortion Objective

The VAE objective can be written as:

\mathcal{L} = \underbrace{-\mathbb{E}_{q(z|x)}[\log p(x|z)]}_{\text{Distortion}} + \underbrace{\beta \cdot D_{KL}(q(z|x) \| p(z))}_{\text{Rate}}

The first term rewards accurate reconstruction. The second term discourages the model from storing too much information in the latent code.

Interactive Demo

Explore the rate-distortion trade-off and bits-back coding:

Variational Lossy Autoencoder

Rate-Distortion Trade-off (β)0.50

High FidelityHigh Compression

📷

Input x

→

Latent

→

📷

Recon x̂

85%

Reconstruction

50%

Compression

20%

Bits-Back Gain

The VLAE Objective

L = E[log p(x|z)] - β · D_KL(q(z|x) || p(z))

β controls rate-distortion trade-off. Higher β = more compression, less fidelity.

Why This View Is Useful

Once you think of a VAE as a compressor, several behaviors make more sense:

a stronger decoder can sometimes ignore the latent
changing $\beta$ changes how much information the latent is allowed to carry
“posterior collapse” becomes easier to interpret

Bits-Back Coding

The famous bits-back argument says the KL term can overstate the true effective cost of the latent because stochastic decoding can recover some information “for free.”

\text{Rate} = D_{KL}(q(z|x) \| p(z))

That is the technical idea behind why VAEs connect so naturally to information theory and compression.

For a first pass, the most important takeaway is simpler: the latent cost is not just a regularizer; it is part of a coding trade-off.

Posterior Collapse

If the decoder becomes so strong that it can reconstruct well without using the latent, then the model may push:

q(z|x) \approx p(z)

In that case, the latent carries very little information. This is called posterior collapse.

Common Fixes

Method	Goal
KL annealing	Let reconstruction stabilize before strongly penalizing rate
Free bits	Reserve a minimum amount of information in the latent
$\beta$ -VAE	Directly control the rate-distortion trade-off

What To Remember

VLAE interprets VAEs as lossy compression systems
The KL term is about information budget, not just regularization
Posterior collapse means the model has stopped using the latent effectively

Key Paper

Variational Lossy Autoencoder - Chen et al. (2016)
https://arxiv.org/abs/1611.02731