Open Problems

Unsolved challenges and active research areas in AI alignment.

The challenge of ensuring that learned models pursue the intended objective, not a proxy.

Open Problem

When learned models become optimizers themselves with potentially different objectives.

Open Problem

AI systems finding unintended ways to maximize reward without achieving the intended goal.

Active Research

How to supervise AI systems that may become more capable than their overseers.

Active Research

AI systems appearing aligned during training but pursuing different goals when deployed.

Open Problem

AI systems learning goals that work in training but fail in new situations.

Active Research

The risk of permanently encoding current values into powerful AI systems.

Open Problem

AI behavior becoming unreliable when deployed in environments different from training.

Active Research

Getting AI systems to honestly report what they know, even when deception might be beneficial.

Open Problem