🧭 Anthropic Open-Sources Its Model Evaluation Framework
Anthropic has released anthropic-evals — the internal evaluation framework used to assess Claude's capabilities, safety properties, and alignment characteristics — as an open-source project under the Apache 2.0 licence. The release includes the evaluation harness, a library of over 400 evaluation tasks covering coding, reasoning, instruction-following, tool use, and safety-relevant scenarios, and the scoring infrastructure used to compute standardised benchmark results for model card publication.
What the framework includes
- Evaluation harness — a Python library for running evals against any model that exposes an OpenAI-compatible or Anthropic API endpoint, making it useful for evaluating third-party models as well as Claude
- Task library — 400+ evaluation tasks in a standardised JSON format, including tasks for agentic behaviour, multi-turn conversation quality, refusal calibration, and factual accuracy
- Automated scoring — both rule-based and model-based (LLM-as-judge) scorers, with configurable thresholds and human-in-the-loop override support
- Benchmark reproducibility — all tasks used in published Claude model cards are included, enabling independent verification of reported benchmark numbers
The framework is available at github.com/anthropics/anthropic-evals. Anthropic encourages contributions and plans to accept community-submitted evaluation tasks that meet quality standards.
evals
open source
benchmarks
AI safety
retrospective
🧭 Anthropic Annual Transparency Report 2025 — Safety Activities and Findings
Anthropic has published its first Annual Transparency Report, covering the company's AI safety activities, policy engagements, and model evaluations conducted during 2025. The report is intended as a recurring commitment — Anthropic states it will publish a transparency report every year — and marks the first time the company has consolidated its safety activities across research, policy, and deployment into a single public document.
Highlights from the report
- ASL evaluations — two formal Autonomous Replication and Adaptation (ARA) evaluations were conducted in 2025, both returning results below the ASL-3 threshold; methodology and results are summarised in the report
- Red-teaming scope — Anthropic conducted approximately 18,000 hours of structured red-teaming against Claude 4.x models before each major release, across CBRN, cybersecurity, and influence-operation threat categories
- Policy contributions — Anthropic submitted written evidence to AI policy processes in the US, EU, and UK, and participated in the Seoul and Paris AI Safety Summits; submissions are excerpted in the report
- Researcher access programme — 47 external researchers were granted structured access to Claude models for safety research in 2025; published findings from the programme are catalogued in an appendix
transparency
safety
annual report
Anthropic
retrospective