Iterated Amplification
TheoryScalable Oversight
Iterated Amplification
Iterated Amplification (IDA) is a proposed alignment technique that trains AI systems by recursively decomposing hard problems into easier subproblems that humans can solve or verify.
Core Idea
The key insight is that a human assisted by multiple copies of a slightly-weaker AI can solve harder problems than the AI alone. By iterating this process:
- Start with a weak AI assistant
- Human + multiple AI copies solve harder problems
- Train a new AI to imitate this "amplified" human
- Repeat with the improved AI
Amplification Process
When facing a complex question, the human can:
- Break it into simpler subquestions
- Delegate subquestions to AI copies
- Synthesize the answers
- The result is better than any single AI could produce
Distillation
After amplification produces good answers, a new AI is trained to produce similar answers directly, without the expensive decomposition process. This "distilled" AI becomes the assistant for the next round.
Advantages
- Maintains human oversight at each step
- Can potentially scale to superhuman tasks
- Each iteration is safe if previous iteration was safe
- Provides a path to aligned superintelligence
Challenges
- Decomposition may not always be possible
- Errors might compound across iterations
- Practical implementation remains difficult
- May be too slow compared to other training methods
See Also
Last updated: November 28, 2025