← All tags

#nlp

15 pages

The Annotated Transformer

Line-by-line PyTorch implementation of the Transformer architecture

Attention Is All You Need

The 2017 paper that introduced the Transformer architecture

Neural Machine Translation by Jointly Learning to Align and Translate

The paper that introduced the attention mechanism for sequence-to-sequence models

BERT: Bidirectional Transformers

Pre-training deep bidirectional representations for NLP

Chain-of-Thought Prompting

Eliciting step-by-step reasoning in language models for complex problem solving

CLIP: Contrastive Language-Image Pre-training

Learning visual concepts from natural language supervision

GPT: Generative Pre-Training

Autoregressive language models that learn to predict the next token

In-Context Learning

How large language models learn from examples in the prompt without weight updates

RLHF: Reinforcement Learning from Human Feedback

Teaching language models to prefer responses that people rank higher

The Unreasonable Effectiveness of Recurrent Neural Networks

Andrej Karpathy's influential blog post demonstrating RNN capabilities through character-level generation

Sequence to Sequence Learning

Encoder-decoder architecture for mapping sequences to sequences

Scaling Laws for Neural Language Models

Why bigger models, more data, and more compute lead to predictable gains

Transformer

Self-attention models that process sequences in parallel

Understanding LSTM Networks

Christopher Olah's visual guide to Long Short-Term Memory networks

Word2Vec: Word Embeddings

Learning dense vector representations of words from text