Redwood Research

OrganizationAI SafetyResearch LabNonprofit
Suggest Edit
Redwood Research
TypeAI Safety Research Lab
Founded2021
FoundersNate Thomas, Buck Shlegeris
HeadquartersBerkeley, CA
FocusApplied alignment research

Redwood Research is a nonprofit AI safety research lab focused on applied alignment research. The organization works on practical techniques for making AI systems safer, with a focus on empirical approaches rather than theoretical frameworks.

Overview

Founded by researchers from the effective altruism community, Redwood Research takes a hands-on approach to alignment. They work directly with language models to develop and test safety techniques that can be applied to current systems.

Key Research Areas

Adversarial Training

Redwood developed techniques for training models to be robust against adversarial inputs, including methods for preventing language models from producing harmful outputs even when prompted to do so.

Causal Scrubbing

The lab developed "causal scrubbing," a technique for rigorously testing claims about how neural networks work internally. This contributes to the broader field ofinterpretability.

Activation Engineering

Research on directly manipulating model activations to control behavior, related to steering vectors and representation engineering approaches.

Notable Projects

  • Adversarial training for language models
  • Causal scrubbing methodology
  • Injury classifier project
  • Representation engineering research

See Also

External Sources

Last updated: November 28, 2025