Paul Christiano
PersonResearcher
Paul Christiano is an AI safety researcher and founder of the Alignment Research Center (ARC). He is known for his foundational work on reinforcement learning from human feedback (RLHF) and theoretical alignment research.
Career
OpenAI (2017-2021)
At OpenAI, Christiano worked on alignment research, including developing RLHF techniques that became foundational for training language models like ChatGPT and Claude. He co-authored the seminal Deep RL from Human Preferences paper with Jan Leike and others.
Alignment Research Center (2021-present)
Christiano founded ARC to pursue theoretical alignment research and develop evaluations for AI safety. The organization works on problems like eliciting latent knowledge and evaluating dangerous AI capabilities.
Key Contributions
- RLHF: Co-developed reinforcement learning from human feedback
- Iterated Amplification: Proposed method for scalable AI alignment
- Eliciting Latent Knowledge (ELK): Research agenda for getting AI to report what it knows
- AI Evaluations: Framework for testing dangerous capabilities
Research Interests
- Scalable oversight of AI systems
- Theoretical foundations of alignment
- AI safety evaluations
- Forecasting AI development
Key Papers
- "Deep Reinforcement Learning from Human Preferences" (2017)
- "Supervising Strong Learners by Amplifying Weak Experts" (2018)
- "Eliciting Latent Knowledge" (2021)
See Also
External Sources
Last updated: November 27, 2025