Neel Nanda

PersonResearcher
Suggest Edit
Neel Nanda
RoleResearch Scientist, Google DeepMind
Known ForMechanistic Interpretability
PreviousAnthropic
CreatedTransformerLens library

Neel Nanda is a leading researcher in mechanistic interpretability, focused on reverse-engineering how transformer language models work. He is known for creating educational resources that have helped many enter the field.

Career

Anthropic (2022)

Nanda worked on Anthropic's interpretability team, contributing to research on understanding neural network internals.

Google DeepMind (2022-present)

At DeepMind, Nanda continues mechanistic interpretability research, investigating how language models implement specific algorithms and capabilities.

Key Contributions

  • TransformerLens: Open-source library for transformer interpretability research
  • 200 Concrete Problems: Curated list of tractable interpretability projects
  • Grokking Research: Understanding how models suddenly generalize
  • Educational Content: Videos, tutorials, and workshops on mech interp
  • Induction Heads: Research on in-context learning mechanisms

TransformerLens

TransformerLens is a library designed to make it easy to do mechanistic interpretability research on GPT-2 style transformers. It provides tools for:

  • Accessing model activations at any layer
  • Performing activation patching experiments
  • Analyzing attention patterns
  • Studying circuit-level behavior

Community Building

Nanda has been instrumental in growing the mechanistic interpretability community:

  • Runs workshops and reading groups
  • Creates beginner-friendly tutorials
  • Maintains lists of open problems
  • Active on social media explaining research

See Also

Last updated: November 28, 2025