Alignment Wiki

Anthropic

TypeAI Safety Company

Founded2021

HeadquartersSan Francisco, CA

Key ProductClaude

Anthropic is an AI safety company founded in 2021 by former OpenAI employees, including Dario Amodei and Daniela Amodei. The company focuses on building reliable, interpretable, and steerable AI systems.

Overview

Anthropic was founded with the mission of building AI systems that are safe and beneficial. The company takes a research-driven approach, publishing significant work on AI safety while also developing commercial products. Many key figures in AI safety work at Anthropic, including Jan Leike who leads alignment research.

Key Research Areas

Constitutional AI

Anthropic developed Constitutional AI, a training method that uses a set of principles to guide AI behavior, reducing reliance on human feedback for each individual output. This builds on earlier RLHF work.

Interpretability

The company has a dedicated interpretability team that works on understanding how neural networks represent and process information internally. Key publications include Towards Monosemanticity and Scaling Monosemanticity.

Alignment Research

Anthropic conducts research on various alignment challenges, including scalable oversight, honesty, and preventing harmful outputs.

Products

Claude

Claude is Anthropic's AI assistant, designed to be helpful, harmless, and honest. It is trained using Constitutional AI and RLHF techniques.

Notable Publications

"Constitutional AI: Harmlessness from AI Feedback" (2022)
"Towards Monosemanticity" (2023)
"The Claude Model Card and Evaluations" (2023)
"Scaling Monosemanticity" (2024)

Key People

Dario Amodei - CEO
Daniela Amodei - President
Chris Olah - Interpretability Research Lead
Jan Leike - Alignment Lead