Iterated Amplification

TheoryScalable Oversight
Suggest Edit
Iterated Amplification
TypeAlignment Approach
Proposed ByPaul Christiano
Year2018
StatusTheoretical

Iterated Amplification (IDA) is a proposed alignment technique that trains AI systems by recursively decomposing hard problems into easier subproblems that humans can solve or verify.

Core Idea

The key insight is that a human assisted by multiple copies of a slightly-weaker AI can solve harder problems than the AI alone. By iterating this process:

  1. Start with a weak AI assistant
  2. Human + multiple AI copies solve harder problems
  3. Train a new AI to imitate this "amplified" human
  4. Repeat with the improved AI

Amplification Process

When facing a complex question, the human can:

  • Break it into simpler subquestions
  • Delegate subquestions to AI copies
  • Synthesize the answers
  • The result is better than any single AI could produce

Distillation

After amplification produces good answers, a new AI is trained to produce similar answers directly, without the expensive decomposition process. This "distilled" AI becomes the assistant for the next round.

Advantages

  • Maintains human oversight at each step
  • Can potentially scale to superhuman tasks
  • Each iteration is safe if previous iteration was safe
  • Provides a path to aligned superintelligence

Challenges

  • Decomposition may not always be possible
  • Errors might compound across iterations
  • Practical implementation remains difficult
  • May be too slow compared to other training methods

See Also

Last updated: November 28, 2025