Jan Leike
PersonResearcher
Jan Leike
RoleAlignment Lead, Anthropic
PreviousCo-lead Superalignment, OpenAI
Known ForRLHF, Scalable Oversight
EducationPhD, ANU
Jan Leike is an AI safety researcher currently serving as Alignment Lead at Anthropic. Previously, he co-led the Superalignment team at OpenAI alongside Ilya Sutskever.
Career
DeepMind (2017-2021)
Leike worked on AI safety research at DeepMind, contributing to foundational work on RLHF and reward modeling.
OpenAI (2021-2024)
At OpenAI, Leike co-led the Superalignment team, which was formed to address the challenge of aligning superintelligent AI systems. He departed OpenAI in 2024 along with several other safety researchers.
Anthropic (2024-present)
Leike joined Anthropic as Alignment Lead, overseeing the company's alignment research efforts.
Key Contributions
- RLHF paper: Co-author on the foundational 2017 paper
- Scalable oversight: Research on supervising AI systems more capable than humans
- Reward modeling: Work on learning reward functions from human preferences
- AI safety via debate: Research on using debate for alignment
Research Interests
- Scalable alignment techniques
- Reward learning and preference modeling
- Safe exploration in RL
- Recursive reward modeling
See Also
Last updated: November 27, 2025