A Simple Neural Network Module for Relational Reasoning

A Simple Neural Network Module for Relational Reasoning introduced Relation Networks (RNs)—a simple but powerful architecture for learning to reason about relationships between objects.

The Problem

Standard neural networks struggle with relational reasoning:

“Is object A larger than object B?”
“What is between the red and blue objects?”
“Are there more circles than squares?”

These require comparing pairs of entities—not easily captured by standard architectures.

Relation Networks

The key insight: explicitly consider all pairs of objects:

\text{RN}(O) = f_\phi\left(\sum_{i,j} g_\theta(o_i, o_j)\right)

where:

$o_i, o_j$ are object representations
$g_\theta$ processes each pair (the “relation” function)
$f_\phi$ aggregates all pairwise relations

Interactive Demo

Explore relational reasoning on simple visual scenes:

Relational Reasoning

Select object pair:

Visual QA

What color is the object nearest to the red circle?

Answer: Blue

Relation Network Formula

RN(O) = f_φ(Σ_i,j g_θ(o_i, o_j))

Consider all pairs of objects, process each pair with g, aggregate with f.

Why Pairs Matter

For $n$ objects, RN considers all $n^2$ pairs. This:

Captures relations regardless of object order
Scales to variable numbers of objects
Avoids hardcoding specific relations

Architecture Details

For visual QA:

CNN extracts feature map from image
Objects = spatial locations in feature map
Question embedding concatenated to each pair
g network (MLP) processes each $(o_i, o_j, q)$ triple
Sum over all pairs
f network (MLP) produces answer

Results on CLEVR

CLEVR is a visual reasoning benchmark with questions like “What size is the cylinder that is left of the brown metal thing?”

Model	Accuracy
CNN + LSTM	42.7%
CNN + LSTM + Attention	68.5%
Relation Network	95.5%
Human	92.6%

RNs achieved superhuman performance!

Key Properties

Permutation invariant: Summing over pairs is order-independent

Relation-centric: Explicitly models pairwise interactions

Data efficient: Strong inductive bias for relational tasks

Beyond Vision

RNs also improved:

Text QA (bAbI dataset)
Physical reasoning (predicting dynamics)
Graph problems (when combined with GNNs)

Connection to Attention

Self-attention can be viewed as a form of relation network:

\text{Attention}(Q, K, V) \approx \sum_{i,j} \text{softmax}(q_i \cdot k_j) \cdot v_j

Both aggregate pairwise interactions.

Key Paper

A Simple Neural Network Module for Relational Reasoning — Santoro et al. (2017)
https://arxiv.org/abs/1706.01427