2026-01-12 🧭 Daily News

Anthropic Open-Sources Evals Framework & Publishes First Annual Transparency Report

🧭 Anthropic Open-Sources Its Model Evaluation Framework

Anthropic has released anthropic-evals — the internal evaluation framework used to assess Claude's capabilities, safety properties, and alignment characteristics — as an open-source project under the Apache 2.0 licence. The release includes the evaluation harness, a library of over 400 evaluation tasks covering coding, reasoning, instruction-following, tool use, and safety-relevant scenarios, and the scoring infrastructure used to compute standardised benchmark results for model card publication.

What the framework includes

Evaluation harness — a Python library for running evals against any model that exposes an OpenAI-compatible or Anthropic API endpoint, making it useful for evaluating third-party models as well as Claude
Task library — 400+ evaluation tasks in a standardised JSON format, including tasks for agentic behaviour, multi-turn conversation quality, refusal calibration, and factual accuracy
Automated scoring — both rule-based and model-based (LLM-as-judge) scorers, with configurable thresholds and human-in-the-loop override support
Benchmark reproducibility — all tasks used in published Claude model cards are included, enabling independent verification of reported benchmark numbers

The framework is available at github.com/anthropics/anthropic-evals. Anthropic encourages contributions and plans to accept community-submitted evaluation tasks that meet quality standards.

🧭 Anthropic Annual Transparency Report 2025 — Safety Activities and Findings

Anthropic has published its first Annual Transparency Report, covering the company's AI safety activities, policy engagements, and model evaluations conducted during 2025. The report is intended as a recurring commitment — Anthropic states it will publish a transparency report every year — and marks the first time the company has consolidated its safety activities across research, policy, and deployment into a single public document.

Highlights from the report

ASL evaluations — two formal Autonomous Replication and Adaptation (ARA) evaluations were conducted in 2025, both returning results below the ASL-3 threshold; methodology and results are summarised in the report
Red-teaming scope — Anthropic conducted approximately 18,000 hours of structured red-teaming against Claude 4.x models before each major release, across CBRN, cybersecurity, and influence-operation threat categories
Policy contributions — Anthropic submitted written evidence to AI policy processes in the US, EU, and UK, and participated in the Seoul and Paris AI Safety Summits; submissions are excerpted in the report
Researcher access programme — 47 external researchers were granted structured access to Claude models for safety research in 2025; published findings from the programme are catalogued in an appendix