AI and machine learning concepts explained with interactive visuals

Start Here

New to the field? Follow this path from fundamentals to modern models.

Recent Topics

Attention Is All You Need
Feb 26, 2026
The 2017 paper that introduced the Transformer architecture
Maximum Likelihood Reinforcement Learning (MaxRL)
Feb 13, 2026
A recent idea for training models on pass-fail tasks when sampling matters
Adam Optimizer
Feb 1, 2026
Adaptive learning rates with momentum for deep learning
Backpropagation
Feb 1, 2026
The algorithm that enables neural networks to learn by computing gradients efficiently
BERT: Bidirectional Transformers
Feb 1, 2026
Pre-training deep bidirectional representations for NLP
Batch Normalization
Feb 1, 2026
Normalizing layer inputs to accelerate deep network training
Chain-of-Thought Prompting
Feb 1, 2026
Eliciting step-by-step reasoning in language models for complex problem solving
CLIP: Contrastive Language-Image Pre-training
Feb 1, 2026
Learning visual concepts from natural language supervision
Diffusion Models
Feb 1, 2026
Generative models that learn to denoise, enabling high-quality image and video synthesis
Deep Q-Networks (DQN)
Feb 1, 2026
Combining Q-learning with deep neural networks for Atari-level game playing
Dropout: Regularization for Neural Networks
Feb 1, 2026
Randomly dropping units during training to prevent overfitting
Generative Adversarial Networks
Feb 1, 2026
Two neural networks compete to generate realistic data
GPT: Generative Pre-Training
Feb 1, 2026
Autoregressive language models that learn to predict the next token
In-Context Learning
Feb 1, 2026
How large language models learn from examples in the prompt without weight updates
Layer Normalization
Feb 1, 2026
Normalizing each example across its features
Latent Diffusion Models
Feb 1, 2026
High-resolution image generation by diffusing in learned latent spaces
Mamba: State Space Models
Feb 1, 2026
A sequence model that keeps a running state instead of attending to every token pair
Policy Gradient Methods
Feb 1, 2026
Directly optimizing policies through gradient ascent on expected returns
Proximal Policy Optimization (PPO)
Feb 1, 2026
A stable, sample-efficient policy gradient algorithm for reinforcement learning
Recursive Language Models
Feb 1, 2026
A paradigm where LLMs treat context as an environment and recursively call themselves on sub-problems
Reinforcement Learning
Feb 1, 2026
Learning by trial and error through rewards
RLHF: Reinforcement Learning from Human Feedback
Feb 1, 2026
Teaching language models to prefer responses that people rank higher
Sequence to Sequence Learning
Feb 1, 2026
Encoder-decoder architecture for mapping sequences to sequences
Vision Transformer (ViT)
Feb 1, 2026
Applying Transformers directly to image patches for visual recognition
Word2Vec: Word Embeddings
Feb 1, 2026
Learning dense vector representations of words from text
AlexNet
Jan 13, 2026
The deep CNN that won ImageNet 2012 and sparked the deep learning revolution
The Annotated Transformer
Jan 13, 2026
Line-by-line PyTorch implementation of the Transformer architecture
Neural Machine Translation by Jointly Learning to Align and Translate
Jan 13, 2026
The paper that introduced the attention mechanism for sequence-to-sequence models
Quantifying the Rise and Fall of Complexity in Closed Systems
Jan 13, 2026
The Coffee Automaton paper formalizing how complexity peaks then declines
The First Law of Complexodynamics
Jan 13, 2026
Why complexity rises then falls while entropy only increases
CS231n: CNNs for Visual Recognition
Jan 13, 2026
Stanford's foundational course on deep learning for computer vision
Deep Speech 2: End-to-End Speech Recognition
Jan 13, 2026
Scaling up end-to-end speech recognition with RNNs and CTC
Multi-Scale Context Aggregation by Dilated Convolutions
Jan 13, 2026
Expanding receptive fields exponentially without losing resolution or adding parameters
GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism
Jan 13, 2026
Training giant neural networks by pipelining micro-batches across devices
Kolmogorov Complexity and Algorithmic Randomness
Jan 13, 2026
Measuring how short the best description of an object can be
Machine Super Intelligence
Jan 13, 2026
Shane Legg's PhD thesis formalizing universal intelligence and the AIXI agent
A Tutorial Introduction to the Minimum Description Length Principle
Jan 13, 2026
Grünwald's comprehensive guide to MDL for model selection and learning
Keeping Neural Networks Simple by Minimizing the Description Length of the Weights
Jan 13, 2026
Hinton's MDL approach to neural network regularization through noisy weights
Neural Message Passing for Quantum Chemistry
Jan 13, 2026
A unified framework for graph neural networks applied to molecular property prediction
Neural Turing Machines
Jan 13, 2026
Neural networks augmented with external memory and attention-based read/write heads
Order Matters: Sequence to Sequence for Sets
Jan 13, 2026
How input and output ordering affects seq2seq learning on set-structured data
Pointer Networks
Jan 13, 2026
Neural architecture that outputs pointers to input positions, enabling variable-size outputs
A Simple Neural Network Module for Relational Reasoning
Jan 13, 2026
Relation Networks for learning to reason about object relationships
Identity Mappings in Deep Residual Networks
Jan 13, 2026
Pre-activation ResNet design that enables training of 1000+ layer networks
Relational Recurrent Neural Networks
Jan 13, 2026
RNNs with relational memory that enables reasoning across time
ResNet
Jan 13, 2026
Deep residual learning with skip connections that enabled training of 152+ layer networks
The Unreasonable Effectiveness of Recurrent Neural Networks
Jan 13, 2026
Andrej Karpathy's influential blog post demonstrating RNN capabilities through character-level generation
Recurrent Neural Network Regularization
Jan 13, 2026
How to apply dropout to LSTMs without disrupting memory dynamics
Scaling Laws for Neural Language Models
Jan 13, 2026
Why bigger models, more data, and more compute lead to predictable gains
Transformer
Jan 13, 2026
Self-attention models that process sequences in parallel
Understanding LSTM Networks
Jan 13, 2026
Christopher Olah's visual guide to Long Short-Term Memory networks
Variational Autoencoder (VAE)
Jan 13, 2026
Probabilistic generative model with structured latent space
Variational Lossy Autoencoder
Jan 13, 2026
Understanding VAEs as compression systems with a rate-distortion trade-off
Gradient Boosted Decision Trees
Jan 10, 2026
Sequential tree ensembles optimized via gradient descent
NODE (Neural Oblivious Decision Ensembles)
Jan 10, 2026
Differentiable decision trees and oblivious ensembles for tabular learning
Pre-training
Jan 10, 2026
The stage where a model learns broad patterns from a very large dataset
Stable Marriage Problem
Jan 10, 2026
Finding a stable matching with the Gale-Shapley deferred acceptance algorithm
World of Bits
Jan 10, 2026
Open-domain platform for web-based reinforcement learning agents