Anthropic
Anthropic is an AI safety company founded in 2021 by former OpenAI employees, including Dario Amodei and Daniela Amodei. The company focuses on building reliable, interpretable, and steerable AI systems.
Overview
Anthropic was founded with the mission of building AI systems that are safe and beneficial. The company takes a research-driven approach, publishing significant work on AI safety while also developing commercial products. Many key figures in AI safety work at Anthropic, including Jan Leike who leads alignment research.
Key Research Areas
Constitutional AI
Anthropic developed Constitutional AI, a training method that uses a set of principles to guide AI behavior, reducing reliance on human feedback for each individual output. This builds on earlier RLHF work.
Interpretability
The company has a dedicated interpretability team that works on understanding how neural networks represent and process information internally. Key publications include Towards Monosemanticity and Scaling Monosemanticity.
Alignment Research
Anthropic conducts research on various alignment challenges, including scalable oversight, honesty, and preventing harmful outputs.
Products
Claude
Claude is Anthropic's AI assistant, designed to be helpful, harmless, and honest. It is trained using Constitutional AI and RLHF techniques.
Notable Publications
- "Constitutional AI: Harmlessness from AI Feedback" (2022)
- "Towards Monosemanticity" (2023)
- "The Claude Model Card and Evaluations" (2023)
- "Scaling Monosemanticity" (2024)
Key People
- Dario Amodei - CEO
- Daniela Amodei - President
- Chris Olah - Interpretability Research Lead
- Jan Leike - Alignment Lead