Sam Bowman
PersonResearcher
Sam Bowman
RoleAnthropic Research Scientist
Known ForNLP Evaluation, Safety Benchmarks
PreviousProfessor, NYU
EducationPhD, Stanford
Sam Bowman is a researcher at Anthropicspecializing in natural language processing and AI evaluation. He is known for creating influential benchmarks for measuring language model capabilities and safety.
Career
New York University (2016-2023)
As a professor at NYU, Bowman led the ML² (Machine Learning for Language) lab. His group developed important benchmarks including SNLI, MultiNLI, and contributed to SuperGLUE.
Anthropic (2023-present)
At Anthropic, Bowman works on model evaluation, safety benchmarks, and understanding language model capabilities and limitations.
Key Contributions
- SNLI/MultiNLI: Standard natural language inference benchmarks
- SuperGLUE: Contributed to difficult language understanding benchmark
- Evaluation Methods: Techniques for measuring model capabilities
- Safety Benchmarks: Methods for testing dangerous capabilities
- Sycophancy Research: Studying when models tell users what they want to hear
Research Focus
Bowman's research addresses critical questions:
- How do we measure what language models can actually do?
- When do models exhibit dangerous capabilities?
- How can we detect deception or sycophancy?
- What evaluation methods scale to more capable systems?
Selected Publications
- "A large annotated corpus for learning natural language inference" (2015)
- "A Broad-Coverage Challenge Corpus for Sentence Understanding" (2018)
- "Towards Detecting Whether Language Models Can Be Trusted" (2023)
- "Language Models Don't Always Say What They Think" (2023)
See Also
External Sources
Last updated: November 28, 2025