Value Lock-in

ProblemLong-term Risk
Suggest Edit
Value Lock-in
TypeExistential Risk
Also Known AsValue Ossification
StatusOpen Problem
Time HorizonLong-term

Value lock-in refers to the risk that powerful AI systems could permanently encode a particular set of values into civilization's trajectory, preventing future value changes even if humanity later decides those values were wrong or incomplete.

The Problem

If we build AI systems that are:

  • Very powerful (can shape the future)
  • Optimizing for specific values
  • Self-preserving or value-preserving

Then those values might become permanent, even if:

  • We got the values wrong
  • Human values naturally evolve
  • We discover better moral frameworks
  • Circumstances change in ways we couldn't anticipate

Historical Analogy

Past civilizations locked in values through institutions, laws, and cultural practices. But these could eventually be changed through revolution, reform, or cultural shift. Sufficiently powerful AI might make such changes impossible.

Why It's Concerning

  • Our current moral knowledge is incomplete
  • Different cultures have different values
  • Values appropriate now may not be appropriate later
  • No single group should dictate humanity's permanent values
  • Moral progress requires ability to change

Tension with Alignment

There's a fundamental tension: we want AI to preserve good values (alignment), but we also want the ability to update values over time. Too much corrigibility might prevent beneficial value changes; too little might enable harmful lock-in.

Possible Approaches

  • Design AI to preserve optionality rather than specific values
  • Build in mechanisms for value updating
  • Avoid creating systems that resist modification
  • Ensure diverse input into AI value specification
  • Procedural values (how to decide) rather than object-level values (what to decide)

See Also

Last updated: November 28, 2025