Alignment Wiki

Jan Leike

RoleAlignment Lead, Anthropic

PreviousCo-lead Superalignment, OpenAI

Known ForRLHF, Scalable Oversight

EducationPhD, ANU

Jan Leike is an AI safety researcher currently serving as Alignment Lead at Anthropic. Previously, he co-led the Superalignment team at OpenAI alongside Ilya Sutskever.

Career

DeepMind (2017-2021)

Leike worked on AI safety research at DeepMind, contributing to foundational work on RLHF and reward modeling.

OpenAI (2021-2024)

At OpenAI, Leike co-led the Superalignment team, which was formed to address the challenge of aligning superintelligent AI systems. He departed OpenAI in 2024 along with several other safety researchers.

Anthropic (2024-present)

Leike joined Anthropic as Alignment Lead, overseeing the company's alignment research efforts.

Key Contributions

RLHF paper: Co-author on the foundational 2017 paper
Scalable oversight: Research on supervising AI systems more capable than humans
Reward modeling: Work on learning reward functions from human preferences
AI safety via debate: Research on using debate for alignment

Research Interests

Scalable alignment techniques
Reward learning and preference modeling
Safe exploration in RL
Recursive reward modeling