AlignmentWiki — Zero Sum & AI Alignment Research

METR

TypeAI Safety Evaluations

Founded2023

FocusDangerous capability evaluations

PreviouslyARC Evaluations

METR (Model Evaluation & Threat Research), formerly ARC Evaluations, is an organization focused on evaluating AI systems for dangerous capabilities. Founded by Beth Barnes, METR develops and conducts evaluations to assess AI risks before and during deployment.

Overview

METR spun out of ARC to focus specifically on the evaluation problem. The organization works with AI labs to assess whether models have dangerous capabilities like autonomous replication, deception, or the ability to acquire resources.

Key Work

Dangerous Capability Evaluations

METR develops standardized tests to assess whether AI systems can perform potentially dangerous tasks like hacking, manipulation, or autonomous operation. These evaluations help labs understand model capabilities before deployment.

Red Teaming

The organization conducts adversarial testing of AI systems, attempting to elicit harmful behaviors or find ways systems could be misused.

Task Frameworks

METR has developed frameworks for assessing AI agent capabilities on realistic tasks, including coding, research, and autonomous operation.

Industry Collaboration

METR has conducted evaluations for major AI labs including OpenAI, Anthropic, and Google DeepMind, providing independent assessment of model capabilities before major releases.

METR (Model Evaluation & Threat Research)

Overview

Key Work

Dangerous Capability Evaluations

Red Teaming

Task Frameworks

Industry Collaboration

See Also

External Sources