Jan Leike

PersonResearcher
Suggest Edit
Jan Leike
RoleAlignment Lead, Anthropic
PreviousCo-lead Superalignment, OpenAI
Known ForRLHF, Scalable Oversight
EducationPhD, ANU

Jan Leike is an AI safety researcher currently serving as Alignment Lead at Anthropic. Previously, he co-led the Superalignment team at OpenAI alongside Ilya Sutskever.

Career

DeepMind (2017-2021)

Leike worked on AI safety research at DeepMind, contributing to foundational work on RLHF and reward modeling.

OpenAI (2021-2024)

At OpenAI, Leike co-led the Superalignment team, which was formed to address the challenge of aligning superintelligent AI systems. He departed OpenAI in 2024 along with several other safety researchers.

Anthropic (2024-present)

Leike joined Anthropic as Alignment Lead, overseeing the company's alignment research efforts.

Key Contributions

  • RLHF paper: Co-author on the foundational 2017 paper
  • Scalable oversight: Research on supervising AI systems more capable than humans
  • Reward modeling: Work on learning reward functions from human preferences
  • AI safety via debate: Research on using debate for alignment

Research Interests

  • Scalable alignment techniques
  • Reward learning and preference modeling
  • Safe exploration in RL
  • Recursive reward modeling

See Also

Last updated: November 27, 2025