Neural Turing Machines

Neural Turing Machines (NTMs) augment neural networks with external memory, allowing them to learn algorithms that require explicit storage and retrieval. They represent a step toward neural networks that can reason like computers.

Motivation

Standard neural networks have limited memory capacity—stored implicitly in weights and activations. Computers, by contrast, have explicit addressable memory. NTMs bridge this gap.

Architecture

An NTM consists of:

Controller: Neural network (LSTM or feedforward)
Memory Bank: $N \times M$ matrix of memory locations
Read Head: Retrieves information from memory
Write Head: Modifies memory contents

Reading from Memory

The read head produces attention weights $w_t$ over memory locations:

r_t = \sum_{i=1}^{N} w_t(i) \cdot M_t(i)

The read vector $r_t$ is a weighted sum of memory rows.

Writing to Memory

Writing combines erase and add operations:

\tilde{M}_t(i) = M_{t-1}(i) \cdot [1 - w_t(i) \cdot e_t]

M_t(i) = \tilde{M}_t(i) + w_t(i) \cdot a_t

The erase vector $e_t$ clears, the add vector $a_t$ writes new content.

Addressing Mechanisms

NTMs use a sophisticated attention mechanism with four stages:

1. Content Addressing

Compare a key vector $k_t$ to memory rows using cosine similarity:

w_t^c(i) = \frac{\exp(\beta_t \cdot K(k_t, M_t(i)))}{\sum_j \exp(\beta_t \cdot K(k_t, M_t(j)))}

2. Interpolation

Blend content-based weights with previous weights:

w_t^g = g_t \cdot w_t^c + (1 - g_t) \cdot w_{t-1}

3. Convolutional Shift

Allow location-based addressing via circular convolution.

4. Sharpening

Focus attention with a sharpening parameter $\gamma_t$ .

Interactive Demo

Explore memory read/write operations:

Neural Turing Machine

Memory Bank

Attention Weights

10%

21%

19%

18%

15%

12%

Addressing Pipeline

Content Addressing

Interpolation

Shift

Sharpen

Read Head

Retrieves weighted sum of memory rows based on content similarity

Write Head

Erases then adds to memory locations based on attention weights

What NTMs Can Learn

The paper demonstrated learning of:

Copy: Reproduce an input sequence
Associative Recall: Retrieve values by key
N-Gram modeling: Track recent symbols
Priority Sort: Sort by priority values

Legacy

NTMs pioneered ideas now central to modern AI:

External memory augmentation
Differentiable attention mechanisms
Content-based retrieval

These concepts evolved into Memory Networks, Differentiable Neural Computers (DNC), and influenced Transformer attention.

Key Paper

Neural Turing Machines — Graves, Wayne, Danihelka (2014)
https://arxiv.org/abs/1410.5401