Eliciting step-by-step reasoning in language models for complex problem solving
Chain-of-Thought (CoT) Prompting dramatically improves LLM performance on reasoning tasks by asking the model to show its work—generating intermediate steps before the final answer.
The Problem
Standard prompting fails on multi-step reasoning:
Q: Roger has 5 tennis balls. He buys 2 cans of 3 balls each. How many does he have?
A: 11 ✗ (LLM might output wrong answer directly)
The Solution: Show Your Work
CoT prompting includes reasoning steps:
Q: Roger has 5 tennis balls. He buys 2 cans of 3 balls each. How many does he have?
A: Roger starts with 5 balls. 2 cans × 3 balls = 6 balls. 5 + 6 = 11 balls. ✓
Why It Works
By generating intermediate steps, the model:
- Decomposes complex problems into simpler sub-problems
- Maintains state through the reasoning process
- Reduces error by checking each step
- Allocates compute proportional to problem difficulty
Two Approaches
Few-Shot CoT
Provide examples with reasoning chains:
where are the reasoning chains.
Zero-Shot CoT
Simply append “Let’s think step by step”:
Q: [problem]
A: Let's think step by step.
This single phrase unlocks reasoning in large models.
Interactive Visualization
Compare standard vs chain-of-thought prompting:
Chain-of-Thought Demo
A store has 23 apples. If 8 are sold in the morning and 12 more arrive in the afternoon, how many apples are there?
A: 27
Performance Gains
| Task | Standard | Chain-of-Thought |
|---|---|---|
| GSM8K (math) | 18% | 57% |
| MultiArith | 33% | 93% |
| StrategyQA | 65% | 73% |
Results for PaLM 540B
Emergent Ability
CoT benefits appear mainly at scale:
Small models may produce incoherent chains; large models generate meaningful reasoning.
Self-Consistency
Improve further by sampling multiple chains and voting:
Generate reasoning paths, extract answers, take majority vote.
Variants
| Method | Key Idea |
|---|---|
| CoT | Show reasoning steps |
| Zero-shot CoT | ”Let’s think step by step” |
| Self-Consistency | Sample multiple chains, vote |
| Tree of Thoughts | Explore branching reasoning paths |
| ReAct | Interleave reasoning and actions |
When to Use CoT
Effective for:
- Math word problems
- Multi-step reasoning
- Logical deduction
- Code generation
Less helpful for:
- Simple factual questions
- Single-step tasks
- Creative writing