Alignment Wiki

Paul Christiano

RoleARC Founder

Known ForRLHF

PreviousResearcher, OpenAI

EducationPhD, UC Berkeley

Paul Christiano is an AI safety researcher and founder of the Alignment Research Center (ARC). He is known for his foundational work on reinforcement learning from human feedback (RLHF) and theoretical alignment research.

Career

OpenAI (2017-2021)

At OpenAI, Christiano worked on alignment research, including developing RLHF techniques that became foundational for training language models like ChatGPT and Claude. He co-authored the seminal Deep RL from Human Preferences paper with Jan Leike and others.

Alignment Research Center (2021-present)

Christiano founded ARC to pursue theoretical alignment research and develop evaluations for AI safety. The organization works on problems like eliciting latent knowledge and evaluating dangerous AI capabilities.

Key Contributions

RLHF: Co-developed reinforcement learning from human feedback
Iterated Amplification: Proposed method for scalable AI alignment
Eliciting Latent Knowledge (ELK): Research agenda for getting AI to report what it knows
AI Evaluations: Framework for testing dangerous capabilities

Research Interests

Scalable oversight of AI systems
Theoretical foundations of alignment
AI safety evaluations
Forecasting AI development

Key Papers

"Deep Reinforcement Learning from Human Preferences" (2017)
"Supervising Strong Learners by Amplifying Weak Experts" (2018)
"Eliciting Latent Knowledge" (2021)