Topics

58 articles across 85 tags

Tags

All Articles

Adam Optimizer
Adaptive learning rates with momentum for deep learning
AlexNet
The deep CNN that won ImageNet 2012 and sparked the deep learning revolution
The Annotated Transformer
Line-by-line PyTorch implementation of the Transformer architecture
Attention Is All You Need
The 2017 paper that introduced the Transformer architecture, replacing recurrence with self-attention
Backpropagation
The algorithm that enables neural networks to learn by computing gradients efficiently
Neural Machine Translation by Jointly Learning to Align and Translate
The paper that introduced the attention mechanism for sequence-to-sequence models
Batch Normalization
Normalizing layer inputs to accelerate deep network training
BERT: Bidirectional Transformers
Pre-training deep bidirectional representations for NLP
Chain-of-Thought Prompting
Eliciting step-by-step reasoning in language models for complex problem solving
CLIP: Contrastive Language-Image Pre-training
Learning visual concepts from natural language supervision
Quantifying the Rise and Fall of Complexity in Closed Systems
The Coffee Automaton paper formalizing how complexity peaks then declines
The First Law of Complexodynamics
Why complexity rises then falls while entropy only increases
CS231n: CNNs for Visual Recognition
Stanford's foundational course on deep learning for computer vision
Diffusion Models
Generative models that learn to denoise, enabling high-quality image and video synthesis
Deep Speech 2: End-to-End Speech Recognition
Scaling up end-to-end speech recognition with RNNs and CTC
Multi-Scale Context Aggregation by Dilated Convolutions
Expanding receptive fields exponentially without losing resolution or adding parameters
Deep Q-Networks (DQN)
Combining Q-learning with deep neural networks for Atari-level game playing
Dropout: Regularization for Neural Networks
Randomly dropping units during training to prevent overfitting
Generative Adversarial Networks
Two neural networks compete to generate realistic data
Gradient Boosted Decision Trees
Sequential tree ensembles optimized via gradient descent
GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism
Training giant neural networks by pipelining micro-batches across devices
GPT: Generative Pre-Training
Autoregressive language models that learn to predict the next token
In-Context Learning
How large language models learn from examples in the prompt without weight updates
Kolmogorov Complexity and Algorithmic Randomness
The mathematical foundation for measuring information content and randomness
Latent Diffusion Models
High-resolution image generation by diffusing in learned latent spaces
Layer Normalization
Normalizing across features for sequence models and Transformers
Machine Super Intelligence
Shane Legg's PhD thesis formalizing universal intelligence and the AIXI agent
Mamba: State Space Models
Linear-time sequence modeling as an efficient alternative to Transformers
Maximum Likelihood Reinforcement Learning (MaxRL)
A framework that bridges reinforcement learning and maximum likelihood estimation for sampling-based tasks with binary feedback
A Tutorial Introduction to the Minimum Description Length Principle
Grünwald's comprehensive guide to MDL for model selection and learning
Keeping Neural Networks Simple by Minimizing the Description Length of the Weights
Hinton's MDL approach to neural network regularization through noisy weights
Neural Turing Machines
Neural networks augmented with external memory and attention-based read/write heads
Neural Message Passing for Quantum Chemistry
A unified framework for graph neural networks applied to molecular property prediction
NODE (Neural Oblivious Decision Ensembles)
Differentiable decision trees and oblivious ensembles for tabular learning
Pointer Networks
Neural architecture that outputs pointers to input positions, enabling variable-size outputs
Policy Gradient Methods
Directly optimizing policies through gradient ascent on expected returns
Order Matters: Sequence to Sequence for Sets
How input and output ordering affects seq2seq learning on set-structured data
Proximal Policy Optimization (PPO)
A stable, sample-efficient policy gradient algorithm for reinforcement learning
Pre-training
The initial phase of training foundation models on vast amounts of data
Recursive Language Models
A paradigm where LLMs treat context as an environment and recursively call themselves on sub-problems
Reinforcement Learning
Learning optimal behavior through interaction with an environment
A Simple Neural Network Module for Relational Reasoning
Relation Networks for learning to reason about object relationships
Relational Recurrent Neural Networks
RNNs with relational memory that enables reasoning across time
Identity Mappings in Deep Residual Networks
Pre-activation ResNet design that enables training of 1000+ layer networks
The Unreasonable Effectiveness of Recurrent Neural Networks
Andrej Karpathy's influential blog post demonstrating RNN capabilities through character-level generation
RLHF: Reinforcement Learning from Human Feedback
Aligning language models with human preferences through reward modeling
Recurrent Neural Network Regularization
How to apply dropout to LSTMs without disrupting memory dynamics
Scaling Laws for Neural Language Models
Empirical laws governing how language model performance scales with compute, data, and parameters
ResNet
Deep residual learning with skip connections that enabled training of 152+ layer networks
Sequence to Sequence Learning
Encoder-decoder architecture for mapping sequences to sequences
Attention Is All You Need
The Transformer architecture that replaced recurrence with self-attention
Stable Marriage Problem
Finding stable matchings via the Gale–Shapley algorithm
Understanding LSTM Networks
Christopher Olah's visual guide to Long Short-Term Memory networks
Variational Autoencoder (VAE)
Probabilistic generative model with structured latent space
Variational Lossy Autoencoder
Connecting VAEs to lossy compression and the bits-back coding argument
Vision Transformer (ViT)
Applying Transformers directly to image patches for visual recognition
Word2Vec: Word Embeddings
Learning dense vector representations of words from text
World of Bits
Open-domain platform for web-based reinforcement learning agents