Paul Christiano

PersonResearcher
Suggest Edit
Paul Christiano
RoleARC Founder
Known ForRLHF
PreviousResearcher, OpenAI
EducationPhD, UC Berkeley

Paul Christiano is an AI safety researcher and founder of the Alignment Research Center (ARC). He is known for his foundational work on reinforcement learning from human feedback (RLHF) and theoretical alignment research.

Career

OpenAI (2017-2021)

At OpenAI, Christiano worked on alignment research, including developing RLHF techniques that became foundational for training language models like ChatGPT and Claude. He co-authored the seminal Deep RL from Human Preferences paper with Jan Leike and others.

Alignment Research Center (2021-present)

Christiano founded ARC to pursue theoretical alignment research and develop evaluations for AI safety. The organization works on problems like eliciting latent knowledge and evaluating dangerous AI capabilities.

Key Contributions

  • RLHF: Co-developed reinforcement learning from human feedback
  • Iterated Amplification: Proposed method for scalable AI alignment
  • Eliciting Latent Knowledge (ELK): Research agenda for getting AI to report what it knows
  • AI Evaluations: Framework for testing dangerous capabilities

Research Interests

  • Scalable oversight of AI systems
  • Theoretical foundations of alignment
  • AI safety evaluations
  • Forecasting AI development

Key Papers

See Also

Last updated: November 27, 2025