Chain-of-Thought Prompting

Eliciting step-by-step reasoning in language models for complex problem solving

Chain-of-Thought (CoT) Prompting dramatically improves LLM performance on reasoning tasks by asking the model to show its work—generating intermediate steps before the final answer.

The Problem

Standard prompting fails on multi-step reasoning:

Q: Roger has 5 tennis balls. He buys 2 cans of 3 balls each. How many does he have?
A: 11 ✗ (LLM might output wrong answer directly)

The Solution: Show Your Work

CoT prompting includes reasoning steps:

Q: Roger has 5 tennis balls. He buys 2 cans of 3 balls each. How many does he have?
A: Roger starts with 5 balls. 2 cans × 3 balls = 6 balls. 5 + 6 = 11 balls. ✓

Why It Works

By generating intermediate steps, the model:

  1. Decomposes complex problems into simpler sub-problems
  2. Maintains state through the reasoning process
  3. Reduces error by checking each step
  4. Allocates compute proportional to problem difficulty

Two Approaches

Few-Shot CoT

Provide examples with reasoning chains:

Prompt=[(x1,r1,y1),,(xk,rk,yk),xtest]\text{Prompt} = [(x_1, r_1, y_1), \ldots, (x_k, r_k, y_k), x_{test}]

where rir_i are the reasoning chains.

Zero-Shot CoT

Simply append “Let’s think step by step”:

Q: [problem]
A: Let's think step by step.

This single phrase unlocks reasoning in large models.

Interactive Visualization

Compare standard vs chain-of-thought prompting:

Chain-of-Thought Demo

Problem:

A store has 23 apples. If 8 are sold in the morning and 12 more arrive in the afternoon, how many apples are there?

Standard Prompting:
Q: A store has 23 apples. If 8 are sold in the morning and 12 more arrive in the afternoon, how many apples are there?
A: 27
Model jumps directly to answer (may be correct, may not be for harder problems)
18%
Standard (GSM8K)
57%
Chain-of-Thought

Performance Gains

TaskStandardChain-of-Thought
GSM8K (math)18%57%
MultiArith33%93%
StrategyQA65%73%

Results for PaLM 540B

Emergent Ability

CoT benefits appear mainly at scale:

Accuracy gainlog(model size)\text{Accuracy gain} \propto \log(\text{model size})

Small models may produce incoherent chains; large models generate meaningful reasoning.

Self-Consistency

Improve further by sampling multiple chains and voting:

y^=argmaxyi=1n1[yi=y]\hat{y} = \arg\max_y \sum_{i=1}^{n} \mathbf{1}[y_i = y]

Generate nn reasoning paths, extract answers, take majority vote.

Variants

MethodKey Idea
CoTShow reasoning steps
Zero-shot CoT”Let’s think step by step”
Self-ConsistencySample multiple chains, vote
Tree of ThoughtsExplore branching reasoning paths
ReActInterleave reasoning and actions

When to Use CoT

Effective for:

  • Math word problems
  • Multi-step reasoning
  • Logical deduction
  • Code generation

Less helpful for:

  • Simple factual questions
  • Single-step tasks
  • Creative writing
Found an error or want to contribute? Edit on GitHub