Pointer Networks

Neural architecture that outputs pointers to input positions, enabling variable-size outputs

Pointer Networks solve a fundamental limitation of sequence-to-sequence models: they can only output from a fixed vocabulary. Pointer Networks instead output pointers to input positions, enabling tasks where output size depends on input.

The Problem

Standard seq2seq models produce outputs from a fixed dictionary:

P(yiy1,...,yi1,x)=softmax(Whi)P(y_i | y_1, ..., y_{i-1}, x) = \text{softmax}(W \cdot h_i)

But what if the output should reference input positions? For convex hull, the output is indices into the input—vocabulary size equals input size.

The Pointer Mechanism

Instead of projecting to a fixed vocabulary, use attention over inputs as the output:

uji=vTtanh(W1ej+W2di)u_j^i = v^T \tanh(W_1 e_j + W_2 d_i) P(yiy1,...,yi1,x)=softmax(ui)P(y_i | y_1, ..., y_{i-1}, x) = \text{softmax}(u^i)

The attention weights αij\alpha_{ij} directly become the output probability over input positions.

Architecture

  1. Encoder: Process input sequence (x1,...,xn)(x_1, ..., x_n) to get representations (e1,...,en)(e_1, ..., e_n)
  2. Decoder: At each step, produce hidden state did_i
  3. Pointer: Compute attention over encoder states, output highest-attention position

Interactive Demo

Watch a Pointer Network solve convex hull by pointing to input coordinates:

Pointer Networks

01234567
Pointer Output
7
1
3
0
5
6
Output is a sequence of pointers to input positions
Variable Output Size
Output length depends on input—impossible with fixed vocabulary
Attention as Output
Attention weights become the output distribution over inputs

Applications

Convex Hull

Given points, output the subset forming the convex hull. Output size varies with input geometry.

Delaunay Triangulation

Given points, output triangles. Number of triangles depends on point configuration.

Traveling Salesman Problem

Approximate TSP by learning to output city visitation order.

Sorting

Learn to sort sequences by outputting indices in sorted order.

Why Pointers Matter

Standard Seq2SeqPointer Network
Fixed vocabularyInput-dependent vocabulary
Can’t reference inputOutput references input
Fixed output sizeVariable output size

Key Equations

Encoder (bidirectional LSTM):

ej=[hj;hj]e_j = [\overrightarrow{h}_j; \overleftarrow{h}_j]

Decoder with attention:

di=LSTM(di1,[yi1;ci1])d_i = \text{LSTM}(d_{i-1}, [y_{i-1}; c_{i-1}])

Pointer distribution:

P(yi=j)=exp(uji)kexp(uki)P(y_i = j) = \frac{\exp(u_j^i)}{\sum_k \exp(u_k^i)}

Influence

Pointer Networks introduced the idea of using attention as output, which influenced:

  • Copy mechanisms in text generation
  • Pointer-generator networks for summarization
  • Graph neural network outputs

Key Paper

Found an error or want to contribute? Edit on GitHub