All posts

Bedrock in production: notes from the March meetup

A short recap of our GenAI on AWS meetup. Costs, latency, prompts, guardrails, and the stuff nobody warns you about.

Bedrock in production: notes from the March meetup

Our March meetup brought together a few engineers who have actually shipped Bedrock-powered apps. Here are the bits I keep thinking about a week later.

Cost is sneakier than you think

The headline price per token is the easy number. The real costs tend to be:

  • Retries on rate limits that silently double your spend
  • Embedding generation on every doc, every update
  • Eval runs during prompt iteration, which add up fast

One team mentioned that 40% of their Bedrock bill was eval, not production traffic. Which is wild.

Latency budgets matter

If your app shows a streaming response, users will tolerate around 1.5s to the first token and stop caring much after that. If it’s batch (say, summarisation), they’re a lot less patient than you’d guess. Set hard timeouts and degrade gracefully.

Guardrails: start simple

The temptation is to layer four guardrails before going live. Don’t. Start with one (the AWS-managed content filter is fine), measure false positives, and add more only when you have data telling you that you need to.

Prompts are code

Treat them like code. Version-controlled. Reviewed. Tested. The teams that treat prompts as throwaway strings are the ones who get paged at 2am because someone “improved” the prompt and broke 30% of outputs.


Photos and the speaker decks are linked on the events page.