Alignment Wiki

Value Lock-in

TypeExistential Risk

Also Known AsValue Ossification

StatusOpen Problem

Time HorizonLong-term

Value lock-in refers to the risk that powerful AI systems could permanently encode a particular set of values into civilization's trajectory, preventing future value changes even if humanity later decides those values were wrong or incomplete.

The Problem

If we build AI systems that are:

Very powerful (can shape the future)
Optimizing for specific values
Self-preserving or value-preserving

Then those values might become permanent, even if:

We got the values wrong
Human values naturally evolve
We discover better moral frameworks
Circumstances change in ways we couldn't anticipate

Historical Analogy

Past civilizations locked in values through institutions, laws, and cultural practices. But these could eventually be changed through revolution, reform, or cultural shift. Sufficiently powerful AI might make such changes impossible.

Why It's Concerning

Our current moral knowledge is incomplete
Different cultures have different values
Values appropriate now may not be appropriate later
No single group should dictate humanity's permanent values
Moral progress requires ability to change

Tension with Alignment

There's a fundamental tension: we want AI to preserve good values (alignment), but we also want the ability to update values over time. Too much corrigibility might prevent beneficial value changes; too little might enable harmful lock-in.

Possible Approaches

Design AI to preserve optionality rather than specific values
Build in mechanisms for value updating
Avoid creating systems that resist modification
Ensure diverse input into AI value specification
Procedural values (how to decide) rather than object-level values (what to decide)

Value Lock-in

The Problem

Historical Analogy

Why It's Concerning

Tension with Alignment

Possible Approaches

See Also

External Sources