RNNs with relational memory that enables reasoning across time
Relational Recurrent Neural Networks combines the temporal processing of RNNs with the relational reasoning of attention mechanisms. The result is a memory system where memories can interact with each other.
Motivation
Standard LSTMs have a fixed memory cell. But complex reasoning requires:
- Multiple pieces of information stored simultaneously
- Interactions between stored memories
- Dynamic retrieval based on relationships
Relational Memory Core (RMC)
The RMC maintains multiple memory slots that interact via attention:
where MHDPA is Multi-Head Dot Product Attention.
Key Innovation
Memories attend to each other, not just to inputs:
This allows reasoning about relationships between stored facts.
Interactive Demo
Watch memory slots interact via attention:
Relational Memory Core
Gating Mechanism
Like LSTMs, RMC uses gates for stable updates:
This prevents catastrophic forgetting of important memories.
Architecture
Input → Linear projection → Concatenate with memories
→ Multi-head self-attention over all slots
→ MLP (residual)
→ Gated update
→ Output from attended memories
Results
Language Modeling (WikiText-103)
| Model | Perplexity |
|---|---|
| LSTM | 48.7 |
| Transformer | 44.1 |
| Relational Memory | 31.6 |
Program Evaluation (Nth Farthest)
Task: Given N objects, find the Nth farthest from a query.
| Model | Accuracy |
|---|---|
| LSTM | 17% |
| DNC | 37% |
| RMC | 91% |
Why It Works
- Multiple memories: Can store several facts
- Memory interaction: Facts can “talk” to each other
- Attention routing: Dynamic retrieval based on relevance
- Temporal integration: Processes sequences naturally
Connection to Transformers
RMC anticipated key Transformer ideas:
- Multi-head attention
- Residual connections
- Layer normalization
The main difference: RMC processes sequences recurrently, while Transformers process in parallel.
Key Paper
- Relational Recurrent Neural Networks — Santoro et al. (2018)
https://arxiv.org/abs/1806.01822