Alignment Wiki

Distributional Shift

TypeTechnical Challenge

Also Known AsDistribution Shift, Dataset Shift

StatusActive Research

Distributional shift occurs when the data or situations an AI system encounters during deployment differ from those seen during training. This can cause unpredictable failures even in systems that performed well in testing.

Types of Shift

Covariate shift: Input distribution changes
Label shift: Output distribution changes
Concept drift: Relationship between inputs and outputs changes
Domain shift: Entire context changes (e.g., simulation to real world)

Examples

Self-driving car trained in sunny California deployed in snowy conditions
Medical AI trained on one hospital's equipment used with different equipment
Language model trained on internet text used in specialized domains
Trading algorithm trained on historical data facing novel market conditions

Why It Matters for Alignment

Distributional shift is especially concerning for alignment because:

AI may be deployed in situations never anticipated
Aligned behavior in training might not transfer
Safety constraints might not generalize
Catastrophic failures could occur in novel situations

Connection to Other Problems

Goal misgeneralization often manifests under distributional shift
Reward hacking may only become visible in new distributions
Deceptive alignment could activate under specific shifts

Mitigations

Domain randomization during training
Out-of-distribution detection
Uncertainty quantification
Conservative behavior when uncertain
Continuous learning with safety constraints

Distributional Shift

Types of Shift

Examples

Why It Matters for Alignment

Connection to Other Problems

Mitigations

See Also

External Sources