Value Lock-in
Value lock-in refers to the risk that powerful AI systems could permanently encode a particular set of values into civilization's trajectory, preventing future value changes even if humanity later decides those values were wrong or incomplete.
The Problem
If we build AI systems that are:
- Very powerful (can shape the future)
- Optimizing for specific values
- Self-preserving or value-preserving
Then those values might become permanent, even if:
- We got the values wrong
- Human values naturally evolve
- We discover better moral frameworks
- Circumstances change in ways we couldn't anticipate
Historical Analogy
Past civilizations locked in values through institutions, laws, and cultural practices. But these could eventually be changed through revolution, reform, or cultural shift. Sufficiently powerful AI might make such changes impossible.
Why It's Concerning
- Our current moral knowledge is incomplete
- Different cultures have different values
- Values appropriate now may not be appropriate later
- No single group should dictate humanity's permanent values
- Moral progress requires ability to change
Tension with Alignment
There's a fundamental tension: we want AI to preserve good values (alignment), but we also want the ability to update values over time. Too much corrigibility might prevent beneficial value changes; too little might enable harmful lock-in.
Possible Approaches
- Design AI to preserve optionality rather than specific values
- Build in mechanisms for value updating
- Avoid creating systems that resist modification
- Ensure diverse input into AI value specification
- Procedural values (how to decide) rather than object-level values (what to decide)