Machine Super Intelligence

Shane Legg's PhD thesis formalizing universal intelligence and the AIXI agent

Machine Super Intelligence is Shane Legg’s PhD thesis that provides a rigorous mathematical definition of intelligence and proves that the AIXI agent is optimally intelligent.

Defining Intelligence

Legg proposes a formal definition:

Υ(π)=μE2K(μ)Vμπ\Upsilon(\pi) = \sum_{\mu \in E} 2^{-K(\mu)} V_\mu^\pi

where:

  • π\pi is an agent (policy)
  • μ\mu is an environment
  • K(μ)K(\mu) is Kolmogorov complexity of μ\mu
  • VμπV_\mu^\pi is the expected reward of π\pi in μ\mu

Intelligence is average performance across all computable environments, weighted by simplicity.

The AIXI Agent

AIXI is the theoretically optimal agent:

at=argmaxatotrtμE2K(μ)μ(otrtath<t)a^*_t = \arg\max_{a_t} \sum_{o_t r_t} \sum_{\mu \in E} 2^{-K(\mu)} \mu(o_t r_t | a_t h_{<t})

At each step, AIXI:

  1. Considers all possible environments (weighted by complexity)
  2. Computes expected reward for each action
  3. Chooses the action maximizing expected future reward

Interactive Demo

Explore the key concepts from the thesis:

Machine Super Intelligence

Universal Intelligence
Υ(π) = Σμ 2^(-K(μ)) V_μ^π
Performance weighted by environment simplicity
Intelligence Hierarchy
Narrow AI
Human
AGI
AIXI
AIXI is theoretically optimal but incomputable
Key Insight
Intelligence can be formalized as the ability to achieve goals across a wide range of environments.
Practical Limit
AIXI requires solving the halting problem—real systems must approximate.

Solomonoff Induction

The prediction component of AIXI uses Solomonoff’s universal prior:

P(x)=p:U(p)=x2pP(x) = \sum_{p: U(p) = x*} 2^{-|p|}

The probability of observing xx is the sum over all programs that output xx, weighted by their brevity.

Key Results

Theorem (Optimality): AIXI is the most intelligent agent:

Υ(AIXI)Υ(π)π\Upsilon(\text{AIXI}) \geq \Upsilon(\pi) \quad \forall \pi

No other agent achieves higher expected performance across all environments.

Theorem (Incomputability): AIXI cannot be computed:

The universal prior requires solving the halting problem. Real systems must approximate.

The Compression-Intelligence Connection

A key insight: compression and prediction are equivalent.

K(x1:n)logP(x1:n)K(x_{1:n}) \approx -\log P(x_{1:n})

A good predictor is a good compressor, and vice versa. This connects AIXI to practical language models.

Practical Approximations

Real AI systems approximate AIXI through:

  • Bounded computation: Limited search depth
  • Finite environments: Specific domain knowledge
  • Learned priors: Neural networks instead of Solomonoff

Modern LLMs can be viewed as crude AIXI approximations trained on text.

Why Ilya Included This

This thesis provides:

  1. Theoretical grounding: Rigorous definition of intelligence
  2. Ultimate benchmark: AIXI as the theoretical ceiling
  3. Design principles: Compression, prediction, and universality

Understanding the theoretical optimum illuminates what practical systems are approximating.

Implications

  • Intelligence can be formalized mathematically
  • Optimal intelligence requires universal prediction
  • Real AI must make tractability/optimality tradeoffs
  • Scaling leads toward AIXI-like behavior

Key Resource

Found an error or want to contribute? Edit on GitHub