Redwood Research
Redwood Research is a nonprofit AI safety research lab focused on applied alignment research. The organization works on practical techniques for making AI systems safer, with a focus on empirical approaches rather than theoretical frameworks.
Overview
Founded by researchers from the effective altruism community, Redwood Research takes a hands-on approach to alignment. They work directly with language models to develop and test safety techniques that can be applied to current systems.
Key Research Areas
Adversarial Training
Redwood developed techniques for training models to be robust against adversarial inputs, including methods for preventing language models from producing harmful outputs even when prompted to do so.
Causal Scrubbing
The lab developed "causal scrubbing," a technique for rigorously testing claims about how neural networks work internally. This contributes to the broader field ofinterpretability.
Activation Engineering
Research on directly manipulating model activations to control behavior, related to steering vectors and representation engineering approaches.
Notable Projects
- Adversarial training for language models
- Causal scrubbing methodology
- Injury classifier project
- Representation engineering research