#attention

10 pages

Line-by-line PyTorch implementation of the Transformer architecture

The 2017 paper that introduced the Transformer architecture

The paper that introduced the attention mechanism for sequence-to-sequence models

Neural networks augmented with external memory and attention-based read/write heads

How input and output ordering affects seq2seq learning on set-structured data

Neural architecture that outputs pointers to input positions, enabling variable-size outputs

Relation Networks for learning to reason about object relationships

RNNs with relational memory that enables reasoning across time

Self-attention models that process sequences in parallel

Applying Transformers directly to image patches for visual recognition