Neel Nanda
PersonResearcher
Neel Nanda
RoleResearch Scientist, Google DeepMind
Known ForMechanistic Interpretability
PreviousAnthropic
CreatedTransformerLens library
Neel Nanda is a leading researcher in mechanistic interpretability, focused on reverse-engineering how transformer language models work. He is known for creating educational resources that have helped many enter the field.
Career
Anthropic (2022)
Nanda worked on Anthropic's interpretability team, contributing to research on understanding neural network internals.
Google DeepMind (2022-present)
At DeepMind, Nanda continues mechanistic interpretability research, investigating how language models implement specific algorithms and capabilities.
Key Contributions
- TransformerLens: Open-source library for transformer interpretability research
- 200 Concrete Problems: Curated list of tractable interpretability projects
- Grokking Research: Understanding how models suddenly generalize
- Educational Content: Videos, tutorials, and workshops on mech interp
- Induction Heads: Research on in-context learning mechanisms
TransformerLens
TransformerLens is a library designed to make it easy to do mechanistic interpretability research on GPT-2 style transformers. It provides tools for:
- Accessing model activations at any layer
- Performing activation patching experiments
- Analyzing attention patterns
- Studying circuit-level behavior
Community Building
Nanda has been instrumental in growing the mechanistic interpretability community:
- Runs workshops and reading groups
- Creates beginner-friendly tutorials
- Maintains lists of open problems
- Active on social media explaining research
See Also
External Sources
Last updated: November 28, 2025