The Annotated Transformer
Line-by-line PyTorch implementation of the Transformer architecture
Neural Machine Translation by Jointly Learning to Align and Translate
The paper that introduced the attention mechanism for sequence-to-sequence models
Neural Turing Machines
Neural networks augmented with external memory and attention-based read/write heads
Order Matters: Sequence to Sequence for Sets
How input and output ordering affects seq2seq learning on set-structured data
Pointer Networks
Neural architecture that outputs pointers to input positions, enabling variable-size outputs
A Simple Neural Network Module for Relational Reasoning
Relation Networks for learning to reason about object relationships
Relational Recurrent Neural Networks
RNNs with relational memory that enables reasoning across time
Attention Is All You Need
The Transformer architecture that replaced recurrence with self-attention