Generative Adversarial Networks

Two neural networks compete to generate realistic data

GANs (Generative Adversarial Networks), introduced by Goodfellow et al. in 2014, revolutionized generative modeling through a simple but powerful idea: pit two neural networks against each other in a game.

The Core Idea

Two networks compete:

  • Generator (G): Creates fake samples from noise
  • Discriminator (D): Distinguishes real from fake

The generator tries to fool the discriminator; the discriminator tries not to be fooled.

minGmaxDExpdata[logD(x)]+Ezpz[log(1D(G(z)))]\min_G \max_D \mathbb{E}_{x \sim p_{data}}[\log D(x)] + \mathbb{E}_{z \sim p_z}[\log(1 - D(G(z)))]

The Minimax Game

Think of it as a counterfeiter vs. detective:

  • Counterfeiter (G): Gets better at making fake money
  • Detective (D): Gets better at spotting fakes

As training progresses, both improve until the fakes become indistinguishable from real.

Interactive Demo

Watch the generator and discriminator compete:

GAN: Adversarial Training

Epoch: 0/50
🎲
Noise z
G
Generator
🖼️
Fake Image
D
Discriminator
0/1
Real/Fake
Sample Distribution (Real vs Generated)
RealFakeTarget Distribution
Generator Quality
0%
Discriminator Accuracy
95%
Click "Train" to watch the generator learn to fool the discriminator.

Architecture

Generator

Takes random noise zz and transforms it into a sample:

G:zRdzxRdxG: z \in \mathbb{R}^{d_z} \rightarrow x \in \mathbb{R}^{d_x}

Typically uses transposed convolutions to upsample noise into images.

Discriminator

Takes a sample and outputs probability it’s real:

D:xRdx[0,1]D: x \in \mathbb{R}^{d_x} \rightarrow [0, 1]

Uses standard convolutions to classify images.

Training Algorithm

for epoch in epochs:
    # Train Discriminator
    real_samples = sample_data(batch_size)
    fake_samples = G(sample_noise(batch_size))

    D_loss = -mean(log(D(real)) + log(1 - D(fake)))
    update(D, D_loss)

    # Train Generator
    fake_samples = G(sample_noise(batch_size))
    G_loss = -mean(log(D(fake)))  # or mean(log(1 - D(fake)))
    update(G, G_loss)

Challenges

Mode Collapse

Generator produces limited variety, ignoring modes of the data distribution.

Training Instability

Delicate balance required—if D is too good, G gets no gradient; if D is too weak, G doesn’t improve.

Evaluation

No explicit likelihood. Metrics like FID (Fréchet Inception Distance) and IS (Inception Score) were developed.

GAN Variants

VariantInnovation
DCGANConvolutional architecture, stable training
WGANWasserstein distance, improved stability
StyleGANStyle-based generator, unprecedented quality
CycleGANUnpaired image-to-image translation
Pix2PixPaired image-to-image translation
BigGANLarge-scale, class-conditional generation
ProGANProgressive growing for high resolution

The Nash Equilibrium

At convergence, the optimal discriminator is:

D(x)=pdata(x)pdata(x)+pg(x)D^*(x) = \frac{p_{data}(x)}{p_{data}(x) + p_g(x)}

When pg=pdatap_g = p_{data}, D(x)=0.5D^*(x) = 0.5 everywhere—the discriminator can’t tell real from fake.

Why GANs Work

  1. Implicit density: No explicit likelihood computation needed
  2. Sharp samples: Adversarial loss produces crisp outputs (unlike blurry VAE reconstructions)
  3. Flexible architecture: Works with any differentiable generator/discriminator

Theoretical Connection

The GAN objective minimizes the Jensen-Shannon divergence:

minGJS(pdatapg)\min_G JS(p_{data} || p_g)

WGAN instead minimizes the Wasserstein (Earth Mover’s) distance, which provides smoother gradients.

Historical Impact

GANs enabled:

  • Photorealistic face generation (ThisPersonDoesNotExist)
  • Image-to-image translation (edges→photos, day→night)
  • Super-resolution (enhance low-res images)
  • Art and design (AI-generated art, fashion)
  • Data augmentation (synthetic training data)

Though diffusion models now surpass GANs for image generation, the adversarial training concept remains influential.

Key Papers

Found an error or want to contribute? Edit on GitHub