AIpedia

AI and machine learning concepts explained with interactive visuals

Start Here

New to the field? Follow this path from fundamentals to modern models.

Backpropagation

The algorithm that enables neural networks to learn by computing gradients efficiently

Understanding LSTM Networks

Christopher Olah's visual guide to Long Short-Term Memory networks

Sequence to Sequence Learning

Encoder-decoder architecture for mapping sequences to sequences

Neural Machine Translation by Jointly Learning to Align and Translate

The paper that introduced the attention mechanism for sequence-to-sequence models

Self-attention models that process sequences in parallel

GPT: Generative Pre-Training

Autoregressive language models that learn to predict the next token

Reinforcement Learning

Learning by trial and error through rewards

Policy Gradient Methods

Directly optimizing policies through gradient ascent on expected returns

Recent Topics

Attention Is All You Need

The 2017 paper that introduced the Transformer architecture

Maximum Likelihood Reinforcement Learning (MaxRL)

A recent idea for training models on pass-fail tasks when sampling matters

Adaptive learning rates with momentum for deep learning

Backpropagation

The algorithm that enables neural networks to learn by computing gradients efficiently

BERT: Bidirectional Transformers

Pre-training deep bidirectional representations for NLP

Batch Normalization

Normalizing layer inputs to accelerate deep network training

Chain-of-Thought Prompting

Eliciting step-by-step reasoning in language models for complex problem solving

CLIP: Contrastive Language-Image Pre-training

Learning visual concepts from natural language supervision

Diffusion Models

Generative models that learn to denoise, enabling high-quality image and video synthesis

Deep Q-Networks (DQN)

Combining Q-learning with deep neural networks for Atari-level game playing

Dropout: Regularization for Neural Networks

Randomly dropping units during training to prevent overfitting

Generative Adversarial Networks

Two neural networks compete to generate realistic data

GPT: Generative Pre-Training

Autoregressive language models that learn to predict the next token

In-Context Learning

How large language models learn from examples in the prompt without weight updates

Layer Normalization

Normalizing each example across its features

Latent Diffusion Models

High-resolution image generation by diffusing in learned latent spaces

Mamba: State Space Models

A sequence model that keeps a running state instead of attending to every token pair

Policy Gradient Methods

Directly optimizing policies through gradient ascent on expected returns

Proximal Policy Optimization (PPO)

A stable, sample-efficient policy gradient algorithm for reinforcement learning

Recursive Language Models

A paradigm where LLMs treat context as an environment and recursively call themselves on sub-problems

Reinforcement Learning

Learning by trial and error through rewards

RLHF: Reinforcement Learning from Human Feedback

Teaching language models to prefer responses that people rank higher

Sequence to Sequence Learning

Encoder-decoder architecture for mapping sequences to sequences

Vision Transformer (ViT)

Applying Transformers directly to image patches for visual recognition

Word2Vec: Word Embeddings

Learning dense vector representations of words from text

The deep CNN that won ImageNet 2012 and sparked the deep learning revolution

The Annotated Transformer

Line-by-line PyTorch implementation of the Transformer architecture

Neural Machine Translation by Jointly Learning to Align and Translate

The paper that introduced the attention mechanism for sequence-to-sequence models

Quantifying the Rise and Fall of Complexity in Closed Systems

The Coffee Automaton paper formalizing how complexity peaks then declines

The First Law of Complexodynamics

Why complexity rises then falls while entropy only increases

CS231n: CNNs for Visual Recognition

Stanford's foundational course on deep learning for computer vision

Deep Speech 2: End-to-End Speech Recognition

Scaling up end-to-end speech recognition with RNNs and CTC

Multi-Scale Context Aggregation by Dilated Convolutions

Expanding receptive fields exponentially without losing resolution or adding parameters

GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism

Training giant neural networks by pipelining micro-batches across devices

Kolmogorov Complexity and Algorithmic Randomness

Measuring how short the best description of an object can be

Machine Super Intelligence

Shane Legg's PhD thesis formalizing universal intelligence and the AIXI agent

A Tutorial Introduction to the Minimum Description Length Principle

Grünwald's comprehensive guide to MDL for model selection and learning

Keeping Neural Networks Simple by Minimizing the Description Length of the Weights

Hinton's MDL approach to neural network regularization through noisy weights

Neural Message Passing for Quantum Chemistry

A unified framework for graph neural networks applied to molecular property prediction

Neural Turing Machines

Neural networks augmented with external memory and attention-based read/write heads

Order Matters: Sequence to Sequence for Sets

How input and output ordering affects seq2seq learning on set-structured data

Pointer Networks

Neural architecture that outputs pointers to input positions, enabling variable-size outputs

A Simple Neural Network Module for Relational Reasoning

Relation Networks for learning to reason about object relationships

Identity Mappings in Deep Residual Networks

Pre-activation ResNet design that enables training of 1000+ layer networks

Relational Recurrent Neural Networks

RNNs with relational memory that enables reasoning across time

Deep residual learning with skip connections that enabled training of 152+ layer networks

The Unreasonable Effectiveness of Recurrent Neural Networks

Andrej Karpathy's influential blog post demonstrating RNN capabilities through character-level generation

Recurrent Neural Network Regularization

How to apply dropout to LSTMs without disrupting memory dynamics

Scaling Laws for Neural Language Models

Why bigger models, more data, and more compute lead to predictable gains

Self-attention models that process sequences in parallel

Understanding LSTM Networks

Christopher Olah's visual guide to Long Short-Term Memory networks

Variational Autoencoder (VAE)

Probabilistic generative model with structured latent space

Variational Lossy Autoencoder

Understanding VAEs as compression systems with a rate-distortion trade-off

Gradient Boosted Decision Trees

Sequential tree ensembles optimized via gradient descent

NODE (Neural Oblivious Decision Ensembles)

Differentiable decision trees and oblivious ensembles for tabular learning

The stage where a model learns broad patterns from a very large dataset

Stable Marriage Problem

Finding a stable matching with the Gale-Shapley deferred acceptance algorithm

Open-domain platform for web-based reinforcement learning agents

All topics Browse by tag