Amazon Bedrock Cross-Region Inference — Automatic Routing to Lowest-Latency Region
Amazon Web Services has added cross-region inference support for Claude models on Amazon Bedrock. The feature allows customers to create a cross-region inference profile that automatically routes each request to the AWS region with the best available capacity and lowest current latency — without requiring the application to manage region selection or failover logic explicitly. For production deployments that currently hard-code a single Bedrock region, the new profile acts as a transparent drop-in replacement with meaningful availability and latency improvements during regional load spikes.
How cross-region inference works
- Inference profiles — a new Bedrock resource type that specifies an ordered list of regions and a routing policy (latency-optimised or priority-ordered); the ARN of the profile replaces the model ARN in API calls
- No application changes beyond the ARN — the request and response schema are identical to standard single-region invocations; clients do not need to handle routing logic
- Data residency controls — administrators can restrict the allowed regions list within an inference profile to maintain data residency requirements while still benefiting from cross-region failover within the permitted set
- Unified logging — CloudWatch logs aggregate across all regions used by the profile, with a
regionfield added to each log entry for attribution