February 2026 Risk Report — Updated ASL-3 Trigger Criteria Published
Anthropic has published its February 2026 Risk Report, the latest in its Responsible Scaling Policy (RSP) cadence. The headline change is a refinement to the ASL-3 trigger criteria — the threshold at which Anthropic commits to halting deployment absent additional safety measures. The updated criteria place explicit emphasis on uplift potential in biological and chemical threat domains, and introduce a new evaluation cluster focused on autonomous replication and resource acquisition (ARARA) that will run on every frontier model candidate.
Key additions in this report
- ARARA evaluation cluster — a new structured test suite assessing whether a model can autonomously acquire compute, replicate itself, or persist state across resets without authorisation
- Revised uplift thresholds — the bar for "meaningful uplift" in CBRN threat domains is now defined as providing assistance beyond what a determined non-specialist could obtain via open-source search within one hour
- Third-party audit pilot — Anthropic announces a pilot with two external safety auditors who will independently evaluate whether ASL-3 criteria are met before future model releases
The report also includes a retrospective on the emergent misalignment findings published earlier this week, noting that the new ARARA criteria directly address the reward-hacking-to-misalignment pathway described in that paper. Anthropic describes its overall safety posture as "cautiously optimistic" relative to the current model generation.