#nlp

15 pages

Line-by-line PyTorch implementation of the Transformer architecture

The 2017 paper that introduced the Transformer architecture

The paper that introduced the attention mechanism for sequence-to-sequence models

Pre-training deep bidirectional representations for NLP

Eliciting step-by-step reasoning in language models for complex problem solving

Learning visual concepts from natural language supervision

Autoregressive language models that learn to predict the next token

How large language models learn from examples in the prompt without weight updates

Teaching language models to prefer responses that people rank higher

Andrej Karpathy's influential blog post demonstrating RNN capabilities through character-level generation

Encoder-decoder architecture for mapping sequences to sequences

Why bigger models, more data, and more compute lead to predictable gains

Self-attention models that process sequences in parallel

Christopher Olah's visual guide to Long Short-Term Memory networks

Learning dense vector representations of words from text