Real-time agent observability for Provision's runtime
Provision is a managed cloud runtime for AI agents — sandboxed browsers, agent inboxes, channel integrations, all behind a single API. We built the observability layer that turns "the agent failed somewhere" into a replayable trace with cost, tool calls and browser actions, cutting mean-time-to-debug by 45%.
- MTTD reduction
- ~45%
- Spans/sec ingested
- ~120K
- Replay fidelity
- 100%
- Shipped in
- 7 weeks
Tokens
12,481
input + output
Cost
$0.084
planning $0.05 · tools $0.03
Tool calls
9
7 browser · 2 inbox
p95 step
420ms
rolling 100 steps
Replans
1
step 11
Span timeline
Live viewport
step 14 · sandboxed ChromeCost ledger
↓ 18% vs avgThe brief
Provision's product is the place agents run — managed compute, a real Chrome, an actual inbox, channel integrations. The product their customers actually buy is the ability to ship agents to production without a six-month detour through infrastructure. Which works until something goes wrong. Then customers were debugging blind, with logs that did not include the right context and no way to replay what happened.
The bet: a first-class observability and replay layer would convert Provision's runtime from "agents work, mostly" into "agents work, and when they don't, you know exactly why."
What we built
An OpenTelemetry-shaped tracing layer that captures every step of an agent run as a span — LLM call, tool call, browser action, inbox read, plan revision — with timing, token cost, and tool arguments attached. Spans stream into a purpose-built time-series store sized for agent workloads (steps are bursty; tokens are expensive; cardinality is high).
The replay layer is the most-loved feature. Every browser action and every inbox interaction is captured well enough that a customer can re-play the run frame-by-frame — watch the Chrome viewport, hear-the-llm-think, see exactly which tool argument was wrong. The replay is byte-identical to the original; we built a custom WebSocket protocol to ferry the captured frames into the customer's browser without rehydrating the entire run from cold storage.
Per-step cost surfacing closed a question every Provision customer had been asking: which step was expensive? The answer is now a hover.
Outcome
Mean-time-to-debug on production agent failures dropped ~45%, with the biggest wins on multi-step failures where the customer previously had to reconstruct the sequence by hand. The system ingests ~120,000 spans per second at peak. Replay fidelity is 100% — every captured run can be played back exactly.
Time from design partner pick to GA was seven weeks. The observability layer is now a primary selling point in Provision's enterprise demos.
"We were debugging blind for the first six months. Now we know exactly where the agent decided wrong, what it cost, and what it should have done instead."
Keep exploring.
Book a callA modern policyholder portal for a 75-year-old insurance co-op
Co-operators is one of Canada's largest insurance co-operatives — 600+ locations, generations of customer trust, and a legacy self-service experience built around PDFs and phone trees. We rebuilt the policyholder portal around real-time claim status, digital ID cards, and a single-tap "file a new claim" flow that absorbs roughly a third of the inbound contact-centre volume.
- Inbound call reduction
- ~28%
- Time-to-claim filed
- 4 min
An "Ask your vault" AI layer for wealth-management advisors
FutureVault sits at the centre of how RIAs organise client documents — estate, tax, insurance, identity. We built a retrieval-grounded AI layer on top, letting an advisor ask plain-English questions across an entire household's document history and get cited answers in seconds instead of digging through folders.
- Document lookup
- 3.2× faster
- Citation accuracy
- ~96%