Topics

58 articles across 84 tags

Tags

agents agi aixi algorithm alignment architecture attention autoencoder automata benchmark boosting chemistry classic cnn combinatorial complexity compression computation computer-vision course ctc decision-making deep-learning differentiable diffusion distributed efficiency embeddings emergent-abilities encoder-decoder ensemble entropy foundation-models fundamentals game-theory games generative gnn graphs imagenet inference information-theory intelligence language-models latent-space long-context lstm machine-learning matching mdl memory model-selection multimodal neural-networks nlp optimization paper parallelism physics policy-gradient pre-training prompting reasoning regularization reinforcement-learning representation-learning residual rnn scaling segmentation seq2seq sequence-modeling sets speech tabular-data theory training transformer transformers translation trees vae visual-qa web-agents

All Articles

Adaptive learning rates with momentum for deep learning

The deep CNN that won ImageNet 2012 and sparked the deep learning revolution

The Annotated Transformer

Line-by-line PyTorch implementation of the Transformer architecture

Attention Is All You Need

The 2017 paper that introduced the Transformer architecture

Backpropagation

The algorithm that enables neural networks to learn by computing gradients efficiently

Neural Machine Translation by Jointly Learning to Align and Translate

The paper that introduced the attention mechanism for sequence-to-sequence models

BERT: Bidirectional Transformers

Pre-training deep bidirectional representations for NLP

Batch Normalization

Normalizing layer inputs to accelerate deep network training

Chain-of-Thought Prompting

Eliciting step-by-step reasoning in language models for complex problem solving

CLIP: Contrastive Language-Image Pre-training

Learning visual concepts from natural language supervision

Quantifying the Rise and Fall of Complexity in Closed Systems

The Coffee Automaton paper formalizing how complexity peaks then declines

The First Law of Complexodynamics

Why complexity rises then falls while entropy only increases

CS231n: CNNs for Visual Recognition

Stanford's foundational course on deep learning for computer vision

Deep Speech 2: End-to-End Speech Recognition

Scaling up end-to-end speech recognition with RNNs and CTC

Diffusion Models

Generative models that learn to denoise, enabling high-quality image and video synthesis

Multi-Scale Context Aggregation by Dilated Convolutions

Expanding receptive fields exponentially without losing resolution or adding parameters

Deep Q-Networks (DQN)

Combining Q-learning with deep neural networks for Atari-level game playing

Dropout: Regularization for Neural Networks

Randomly dropping units during training to prevent overfitting

Generative Adversarial Networks

Two neural networks compete to generate realistic data

Gradient Boosted Decision Trees

Sequential tree ensembles optimized via gradient descent

GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism

Training giant neural networks by pipelining micro-batches across devices

GPT: Generative Pre-Training

Autoregressive language models that learn to predict the next token

In-Context Learning

How large language models learn from examples in the prompt without weight updates

Kolmogorov Complexity and Algorithmic Randomness

Measuring how short the best description of an object can be

Layer Normalization

Normalizing each example across its features

Latent Diffusion Models

High-resolution image generation by diffusing in learned latent spaces

Machine Super Intelligence

Shane Legg's PhD thesis formalizing universal intelligence and the AIXI agent

Mamba: State Space Models

A sequence model that keeps a running state instead of attending to every token pair

Maximum Likelihood Reinforcement Learning (MaxRL)

A recent idea for training models on pass-fail tasks when sampling matters

A Tutorial Introduction to the Minimum Description Length Principle

Grünwald's comprehensive guide to MDL for model selection and learning

Keeping Neural Networks Simple by Minimizing the Description Length of the Weights

Hinton's MDL approach to neural network regularization through noisy weights

Neural Message Passing for Quantum Chemistry

A unified framework for graph neural networks applied to molecular property prediction

Neural Turing Machines

Neural networks augmented with external memory and attention-based read/write heads

NODE (Neural Oblivious Decision Ensembles)

Differentiable decision trees and oblivious ensembles for tabular learning

Order Matters: Sequence to Sequence for Sets

How input and output ordering affects seq2seq learning on set-structured data

Pointer Networks

Neural architecture that outputs pointers to input positions, enabling variable-size outputs

Policy Gradient Methods

Directly optimizing policies through gradient ascent on expected returns

Proximal Policy Optimization (PPO)

A stable, sample-efficient policy gradient algorithm for reinforcement learning

The stage where a model learns broad patterns from a very large dataset

Recursive Language Models

A paradigm where LLMs treat context as an environment and recursively call themselves on sub-problems

Reinforcement Learning

Learning by trial and error through rewards

A Simple Neural Network Module for Relational Reasoning

Relation Networks for learning to reason about object relationships

Identity Mappings in Deep Residual Networks

Pre-activation ResNet design that enables training of 1000+ layer networks

Relational Recurrent Neural Networks

RNNs with relational memory that enables reasoning across time

Deep residual learning with skip connections that enabled training of 152+ layer networks

RLHF: Reinforcement Learning from Human Feedback

Teaching language models to prefer responses that people rank higher

The Unreasonable Effectiveness of Recurrent Neural Networks

Andrej Karpathy's influential blog post demonstrating RNN capabilities through character-level generation

Recurrent Neural Network Regularization

How to apply dropout to LSTMs without disrupting memory dynamics

Sequence to Sequence Learning

Encoder-decoder architecture for mapping sequences to sequences

Scaling Laws for Neural Language Models

Why bigger models, more data, and more compute lead to predictable gains

Stable Marriage Problem

Finding a stable matching with the Gale-Shapley deferred acceptance algorithm

Self-attention models that process sequences in parallel

Understanding LSTM Networks

Christopher Olah's visual guide to Long Short-Term Memory networks

Variational Autoencoder (VAE)

Probabilistic generative model with structured latent space

Vision Transformer (ViT)

Applying Transformers directly to image patches for visual recognition

Variational Lossy Autoencoder

Understanding VAEs as compression systems with a rate-distortion trade-off

Word2Vec: Word Embeddings

Learning dense vector representations of words from text

Open-domain platform for web-based reinforcement learning agents