Variational Autoencoder (VAE)

Probabilistic generative model with structured latent space

A Variational Autoencoder (VAE) is a generative model that learns a distribution over latent variables, enabling smooth interpolation, sampling, and principled uncertainty.

Core Idea

Instead of encoding an input x to a single latent vector, a VAE learns a posterior distribution:

qϕ(zx)=N(z;μ(x),σ2(x))q_\phi(z|x) = \mathcal{N}(z; \mu(x), \sigma^2(x))

Sampling from this distribution allows the model to generate diverse yet coherent outputs.

Evidence Lower Bound (ELBO)

VAEs are trained by maximizing the ELBO:

L(x)=Eq(zx)[logpθ(xz)]DKL(q(zx)p(z))\mathcal{L}(x) = \mathbb{E}_{q(z|x)}[\log p_\theta(x|z)] - D_{KL}(q(z|x) \| p(z))
  • Reconstruction term: encourages accurate decoding of samples
  • KL divergence: regularizes the posterior toward the prior p(z) = N(0,I), shaping a smooth latent space

This tradeoff balances fidelity and generalization.

Reparameterization Trick

Directly sampling z from q(z|x) blocks gradients. The solution is to rewrite sampling as:

z=μ+σϵ,ϵN(0,1)z = \mu + \sigma \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, 1)

Randomness is isolated in ε, allowing gradients to flow through μ and σ.

Interactive Visualization

Explore how VAEs encode distributions, sample latents, interpolate, and decode:

Encoder
x → μ, σ
μ = [0.50, -0.30]
σ = [0.60, 0.40]
Reparameterize
z = μ + σ·ε
Decoder
z → x̂
Latent Space & KL Regularization
Decoded Output
🖼️
x̂(z)
z = [0.59, -0.09]
Gray points: prior N(0, I) • Red: sampled / interpolated latent

VAE vs. Autoencoder

AutoencoderVariational Autoencoder
Deterministic latentProbabilistic latent
No prior on zExplicit prior p(z)
Poor samplingSmooth generation
Optimizes reconstructionOptimizes ELBO

Extensions

  • β-VAE – stronger disentanglement via increased KL weight
  • VQ-VAE – discrete latent codes via vector quantization
  • Conditional VAE – generation conditioned on labels
  • Hierarchical VAE – multi-level latent variables

Key Papers

VAEs form the foundation of many modern generative models, including diffusion model latents and learned priors.

Found an error or want to contribute? Edit on GitHub