🧭 Claude Service Outage — What Happened and How to Build for Resilience
On the morning of March 11, Claude experienced a service disruption affecting Claude.ai web, desktop, and mobile interfaces as well as Claude Code sessions. DownDetector reports peaked at over 1,400 simultaneous reports, with users encountering login failures and degraded response times. Crucially, the Claude API remained fully operational throughout the incident. Anthropic acknowledged the disruption promptly via its status page and resolved the issue within approximately two hours. No data loss was reported.
Building resilient Claude-powered applications
The fact that the API stayed up while the web UI went down is a useful reminder about where to invest reliability engineering effort:
- Use the API, not the web UI, for production workloads — web interface outages don't propagate to API consumers
- Implement exponential backoff — transient errors during partial outages respond well to retry logic with jitter
- Monitor the Anthropic status page at status.anthropic.com — it's updated in near real-time during incidents
- Design for graceful degradation — if Claude is unavailable, what does your application do? Cache the last response, surface a human fallback, or queue the request for later?
- Set meaningful timeouts — don't let a slow Claude response block your entire application thread
Practical pattern: wrap your Claude API calls in a circuit breaker. After N consecutive failures, open the circuit and route to a fallback path rather than hammering a degraded endpoint. Libraries like pybreaker (Python) or opossum (Node.js) make this straightforward.
reliability
best practices
API
resilience
retrospective
🧭 Code Review Tool — The $15–$25 Per-Review Cost Sparks ROI Debate
Two days after the Claude Code Review tool launched, developer teams started doing the maths — and the numbers are generating debate. Anthropic reported that 54% of pull requests now receive substantive automated review comments (up from 16% with previous approaches), but the per-review cost of $15–$25 and an average completion time of around 20 minutes are prompting engineering teams to think carefully about where automated review adds genuine value versus where it's an expensive way to surface low-priority style feedback.
Where the ROI calculation works — and where it doesn't
- High-value cases: security-sensitive code, complex business logic, PRs touching shared infrastructure — where a missed issue has real downstream cost
- Lower-value cases: trivial UI tweaks, documentation-only changes, hotfixes where speed matters more than depth
- The 20-minute window: for fast-moving teams merging many small PRs per day, the wait time may slow CI pipelines more than the review value justifies
- Selective triggering: several teams are experimenting with label-based triggers — only run the full multi-agent review on PRs tagged
review:deep
Bottom line: treat the Code Review tool like a senior contractor — valuable for the right job, expensive to use on everything. Define a clear policy for which PR types trigger it before rolling out broadly, or costs will scale faster than benefits.
Claude Code
code review
best practices
cost management
retrospective
🧭 Google Deepens Pentagon Engagement as Anthropic Steps Back
CNBC reported on March 10–11 that Google was actively deepening its own Pentagon AI engagement in the wake of Anthropic's lawsuit. The reporting highlighted a structural consequence of the standoff: when one AI provider draws principled lines around certain use cases, defence contracts don't disappear — they redirect to providers willing to take them. For developers and enterprise architects building on AI platforms, the episode raises a practical question about supplier risk that goes beyond the immediate legal dispute.
What this means for enterprise AI strategy
- Supplier differentiation is real: different AI providers have materially different policies on autonomous decision-making, weapons applications, and surveillance. These differences matter for regulated industries
- Multi-model architectures reduce lock-in risk: teams running Claude, Gemini, and GPT-4 in parallel for different tasks are better insulated from single-supplier policy or availability events
- Evaluate the policy, not just the capability: before committing a critical workflow to any AI provider, read their published usage policy and model spec — understand what they will and won't do
enterprise
AI strategy
policy
multi-model
retrospective