Two neural networks compete to generate realistic data
GANs (Generative Adversarial Networks), introduced by Goodfellow et al. in 2014, revolutionized generative modeling through a simple but powerful idea: pit two neural networks against each other in a game.
The Core Idea
Two networks compete:
- Generator (G): Creates fake samples from noise
- Discriminator (D): Distinguishes real from fake
The generator tries to fool the discriminator; the discriminator tries not to be fooled.
The Minimax Game
Think of it as a counterfeiter vs. detective:
- Counterfeiter (G): Gets better at making fake money
- Detective (D): Gets better at spotting fakes
As training progresses, both improve until the fakes become indistinguishable from real.
Interactive Demo
Watch the generator and discriminator compete:
GAN: Adversarial Training
Architecture
Generator
Takes random noise and transforms it into a sample:
Typically uses transposed convolutions to upsample noise into images.
Discriminator
Takes a sample and outputs probability it’s real:
Uses standard convolutions to classify images.
Training Algorithm
for epoch in epochs:
# Train Discriminator
real_samples = sample_data(batch_size)
fake_samples = G(sample_noise(batch_size))
D_loss = -mean(log(D(real)) + log(1 - D(fake)))
update(D, D_loss)
# Train Generator
fake_samples = G(sample_noise(batch_size))
G_loss = -mean(log(D(fake))) # or mean(log(1 - D(fake)))
update(G, G_loss)
Challenges
Mode Collapse
Generator produces limited variety, ignoring modes of the data distribution.
Training Instability
Delicate balance required—if D is too good, G gets no gradient; if D is too weak, G doesn’t improve.
Evaluation
No explicit likelihood. Metrics like FID (Fréchet Inception Distance) and IS (Inception Score) were developed.
GAN Variants
| Variant | Innovation |
|---|---|
| DCGAN | Convolutional architecture, stable training |
| WGAN | Wasserstein distance, improved stability |
| StyleGAN | Style-based generator, unprecedented quality |
| CycleGAN | Unpaired image-to-image translation |
| Pix2Pix | Paired image-to-image translation |
| BigGAN | Large-scale, class-conditional generation |
| ProGAN | Progressive growing for high resolution |
The Nash Equilibrium
At convergence, the optimal discriminator is:
When , everywhere—the discriminator can’t tell real from fake.
Why GANs Work
- Implicit density: No explicit likelihood computation needed
- Sharp samples: Adversarial loss produces crisp outputs (unlike blurry VAE reconstructions)
- Flexible architecture: Works with any differentiable generator/discriminator
Theoretical Connection
The GAN objective minimizes the Jensen-Shannon divergence:
WGAN instead minimizes the Wasserstein (Earth Mover’s) distance, which provides smoother gradients.
Historical Impact
GANs enabled:
- Photorealistic face generation (ThisPersonDoesNotExist)
- Image-to-image translation (edges→photos, day→night)
- Super-resolution (enhance low-res images)
- Art and design (AI-generated art, fashion)
- Data augmentation (synthetic training data)
Though diffusion models now surpass GANs for image generation, the adversarial training concept remains influential.
Key Papers
- Generative Adversarial Nets – Goodfellow et al., 2014
https://arxiv.org/abs/1406.2661 - Unsupervised Representation Learning with Deep Convolutional GANs (DCGAN) – Radford et al., 2015
https://arxiv.org/abs/1511.06434 - Wasserstein GAN – Arjovsky et al., 2017
https://arxiv.org/abs/1701.07875 - A Style-Based Generator Architecture for GANs (StyleGAN) – Karras et al., 2018
https://arxiv.org/abs/1812.04948