← Back to all entries
2026-01-14 🧭 Daily News

Amazon Bedrock Adds Cross-Region Inference & Anthropic API Posts 30% Latency Improvement

Amazon Bedrock Adds Cross-Region Inference & Anthropic API Posts 30% Latency Improvement — visual for 2026-01-14

🧭 Amazon Bedrock Cross-Region Inference — Automatic Routing to Lowest-Latency Region

Amazon Web Services has added cross-region inference support for Claude models on Amazon Bedrock. The feature allows customers to create a cross-region inference profile that automatically routes each request to the AWS region with the best available capacity and lowest current latency — without requiring the application to manage region selection or failover logic explicitly. For production deployments that currently hard-code a single Bedrock region, the new profile acts as a transparent drop-in replacement with meaningful availability and latency improvements during regional load spikes.

How cross-region inference works

Amazon Bedrock AWS infrastructure availability retrospective

🧭 Anthropic API Latency Down 30% Since November — Infrastructure Update

Anthropic has shared that time-to-first-token latency on the Claude API has improved by approximately 30% since November 2025, the result of a series of infrastructure changes rolled out over the past ten weeks. The improvements come from three parallel initiatives: inference kernel optimisations that reduce per-token compute time on current-generation hardware, routing changes that reduce geographic hop count between the API gateway and inference nodes, and a pre-warming strategy that reduces cold-start overhead for new connection establishment.

The improvement is most pronounced at the p95 level — developers operating in tail-latency-sensitive contexts will see the largest gains. Anthropic notes the changes required no API-level modification and are live for all existing integrations. The team states that further infrastructure work is ongoing with additional latency reductions targeted for Q1 2026.

For developers benchmarking latency: If you established baseline measurements before November 2025, the current API performs materially better. It is worth re-running any latency-based model selection evaluations to ensure routing decisions still reflect current performance characteristics.

API performance infrastructure latency retrospective