Neural Turing Machines

Neural networks augmented with external memory and attention-based read/write heads

Neural Turing Machines (NTMs) augment neural networks with external memory, allowing them to learn algorithms that require explicit storage and retrieval. They represent a step toward neural networks that can reason like computers.

Motivation

Standard neural networks have limited memory capacity—stored implicitly in weights and activations. Computers, by contrast, have explicit addressable memory. NTMs bridge this gap.

Architecture

An NTM consists of:

  1. Controller: Neural network (LSTM or feedforward)
  2. Memory Bank: N×MN \times M matrix of memory locations
  3. Read Head: Retrieves information from memory
  4. Write Head: Modifies memory contents

Reading from Memory

The read head produces attention weights wtw_t over memory locations:

rt=i=1Nwt(i)Mt(i)r_t = \sum_{i=1}^{N} w_t(i) \cdot M_t(i)

The read vector rtr_t is a weighted sum of memory rows.

Writing to Memory

Writing combines erase and add operations:

M~t(i)=Mt1(i)[1wt(i)et]\tilde{M}_t(i) = M_{t-1}(i) \cdot [1 - w_t(i) \cdot e_t] Mt(i)=M~t(i)+wt(i)atM_t(i) = \tilde{M}_t(i) + w_t(i) \cdot a_t

The erase vector ete_t clears, the add vector ata_t writes new content.

Addressing Mechanisms

NTMs use a sophisticated attention mechanism with four stages:

1. Content Addressing

Compare a key vector ktk_t to memory rows using cosine similarity:

wtc(i)=exp(βtK(kt,Mt(i)))jexp(βtK(kt,Mt(j)))w_t^c(i) = \frac{\exp(\beta_t \cdot K(k_t, M_t(i)))}{\sum_j \exp(\beta_t \cdot K(k_t, M_t(j)))}

2. Interpolation

Blend content-based weights with previous weights:

wtg=gtwtc+(1gt)wt1w_t^g = g_t \cdot w_t^c + (1 - g_t) \cdot w_{t-1}

3. Convolutional Shift

Allow location-based addressing via circular convolution.

4. Sharpening

Focus attention with a sharpening parameter γt\gamma_t.

Interactive Demo

Explore memory read/write operations:

Neural Turing Machine

Memory Bank
Attention Weights
1%
4%
27%
6%
1%
17%
23%
20%
Addressing Pipeline
Content Addressing
Interpolation
Shift
Sharpen
Read Head
Retrieves weighted sum of memory rows based on content similarity
Write Head
Erases then adds to memory locations based on attention weights

What NTMs Can Learn

The paper demonstrated learning of:

  • Copy: Reproduce an input sequence
  • Associative Recall: Retrieve values by key
  • N-Gram modeling: Track recent symbols
  • Priority Sort: Sort by priority values

Legacy

NTMs pioneered ideas now central to modern AI:

  • External memory augmentation
  • Differentiable attention mechanisms
  • Content-based retrieval

These concepts evolved into Memory Networks, Differentiable Neural Computers (DNC), and influenced Transformer attention.

Key Paper

Found an error or want to contribute? Edit on GitHub