#computer-vision

10 pages

The deep CNN that won ImageNet 2012 and sparked the deep learning revolution

Learning visual concepts from natural language supervision

Stanford's foundational course on deep learning for computer vision

Generative models that learn to denoise, enabling high-quality image and video synthesis

Expanding receptive fields exponentially without losing resolution or adding parameters

Two neural networks compete to generate realistic data

High-resolution image generation by diffusing in learned latent spaces

Pre-activation ResNet design that enables training of 1000+ layer networks

Deep residual learning with skip connections that enabled training of 152+ layer networks

Applying Transformers directly to image patches for visual recognition