Sam Bowman

PersonResearcher
Suggest Edit
Sam Bowman
RoleAnthropic Research Scientist
Known ForNLP Evaluation, Safety Benchmarks
PreviousProfessor, NYU
EducationPhD, Stanford

Sam Bowman is a researcher at Anthropicspecializing in natural language processing and AI evaluation. He is known for creating influential benchmarks for measuring language model capabilities and safety.

Career

New York University (2016-2023)

As a professor at NYU, Bowman led the ML² (Machine Learning for Language) lab. His group developed important benchmarks including SNLI, MultiNLI, and contributed to SuperGLUE.

Anthropic (2023-present)

At Anthropic, Bowman works on model evaluation, safety benchmarks, and understanding language model capabilities and limitations.

Key Contributions

  • SNLI/MultiNLI: Standard natural language inference benchmarks
  • SuperGLUE: Contributed to difficult language understanding benchmark
  • Evaluation Methods: Techniques for measuring model capabilities
  • Safety Benchmarks: Methods for testing dangerous capabilities
  • Sycophancy Research: Studying when models tell users what they want to hear

Research Focus

Bowman's research addresses critical questions:

  • How do we measure what language models can actually do?
  • When do models exhibit dangerous capabilities?
  • How can we detect deception or sycophancy?
  • What evaluation methods scale to more capable systems?

Selected Publications

  • "A large annotated corpus for learning natural language inference" (2015)
  • "A Broad-Coverage Challenge Corpus for Sentence Understanding" (2018)
  • "Towards Detecting Whether Language Models Can Be Trusted" (2023)
  • "Language Models Don't Always Say What They Think" (2023)

See Also

Last updated: November 28, 2025