← Back to all entries
2025-12-30 💡 Tips 'n' Tricks

Speed Up Your Claude Workflows Before the New Year

Speed Up Your Claude Workflows Before the New Year — visual for 2025-12-30

💡 Streaming & Time-to-First-Token — Making Claude Feel Instant

One of the most impactful UX improvements you can make to a Claude-powered product costs nothing in tokens and requires only a few lines of code: enable streaming. Without streaming, your UI shows a spinner until Claude finishes generating the entire response, then displays it all at once — a wait that can stretch to 30+ seconds for long outputs. With streaming enabled, the first token appears within a second or two, and the response flows progressively. Users perceive streaming interfaces as dramatically faster even when the total generation time is identical.

Streaming best practices

with client.messages.stream(
    model="claude-haiku-4-5",   # Haiku has lowest TTFT
    max_tokens=1024,
    messages=[{"role": "user", "content": prompt}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
Model selection for latency-critical paths

Claude Haiku has the lowest time-to-first-token of the production models. For real-time chat, autocomplete, or any user-facing feature where latency is visible, default to Haiku and escalate to Sonnet or Opus only when capability requires it.

streaming latency TTFT UX retrospective

💡 Batch Processing — Run High-Volume Claude Workloads at 50% Off

If you have a workload that doesn't need to complete in real time — document summarisation queues, nightly classification runs, bulk data extraction, evaluation pipelines — the Message Batches API is the obvious tool to reach for. Batch requests are processed asynchronously and priced at approximately 50% of the standard API rate. You submit a batch of up to 10,000 requests in a single call, poll for completion, then retrieve all results. For high-volume workflows, this halves your Claude infrastructure costs with essentially no code complexity added.

When to use batch vs. real-time

Idempotency tip

Each request in a batch takes a custom_id. Set these to meaningful, stable identifiers (e.g. a document hash) so that if your batch partially fails and you need to retry, you can submit only the failed items — avoiding double-processing and double billing.

batch API cost optimisation throughput async retrospective