#transformer

5 pages

Line-by-line PyTorch implementation of the Transformer architecture

The 2017 paper that introduced the Transformer architecture

Pre-training deep bidirectional representations for NLP

Autoregressive language models that learn to predict the next token

Self-attention models that process sequences in parallel