Distributional Shift
ProblemRobustness
Distributional Shift
TypeTechnical Challenge
Also Known AsDistribution Shift, Dataset Shift
StatusActive Research
RelatedGoal Misgeneralization
Distributional shift occurs when the data or situations an AI system encounters during deployment differ from those seen during training. This can cause unpredictable failures even in systems that performed well in testing.
Types of Shift
- Covariate shift: Input distribution changes
- Label shift: Output distribution changes
- Concept drift: Relationship between inputs and outputs changes
- Domain shift: Entire context changes (e.g., simulation to real world)
Examples
- Self-driving car trained in sunny California deployed in snowy conditions
- Medical AI trained on one hospital's equipment used with different equipment
- Language model trained on internet text used in specialized domains
- Trading algorithm trained on historical data facing novel market conditions
Why It Matters for Alignment
Distributional shift is especially concerning for alignment because:
- AI may be deployed in situations never anticipated
- Aligned behavior in training might not transfer
- Safety constraints might not generalize
- Catastrophic failures could occur in novel situations
Connection to Other Problems
- Goal misgeneralization often manifests under distributional shift
- Reward hacking may only become visible in new distributions
- Deceptive alignment could activate under specific shifts
Mitigations
- Domain randomization during training
- Out-of-distribution detection
- Uncertainty quantification
- Conservative behavior when uncertain
- Continuous learning with safety constraints
See Also
External Sources
Last updated: November 28, 2025