AI Infrastructure / Agent Platform2025

Real-time agent observability for Provision's runtime

Provision is a managed cloud runtime for AI agents — sandboxed browsers, agent inboxes, channel integrations, all behind a single API. We built the observability layer that turns "the agent failed somewhere" into a replayable trace with cost, tool calls and browser actions, cutting mean-time-to-debug by 45%.

MTTD reduction: ~45%
Spans/sec ingested: ~120K
Replay fidelity: 100%
Shipped in: 7 weeks

provision/ agent · planner-v8 · run abe1f3

● Running14 / 22 steps · 8.2s

Tokens

12,481

input + output

Cost

$0.084

planning $0.05 · tools $0.03

Tool calls

7 browser · 2 inbox

p95 step

420ms

rolling 100 steps

Replans

step 11

Span timeline

filter:allerrorstools

#CallDurCost

#14

browser.fill(form, "jrivera@...")

180ms$0.00…

#13

browser.click("Submit")

220ms$0.00✓

#12

llm.respond()

1.8s$0.022✓

#11

llm.plan(retry, attempt=2)

1.2s$0.012↻

#10

tool.search(docs, q=...)

320ms$0.003✓

#09

inbox.read(latest)

480ms$0.00✓

#08

llm.observe()

720ms$0.008✓

#07

browser.click("New lead")

190ms$0.00✓

#06

browser.navigate('/leads')

640ms$0.00✓

#05

llm.plan(initial)

1.4s$0.018✓

#04

inbox.search("new RFP")

280ms$0.00✓

Live viewport

step 14 · sandboxed Chrome

crm.acme.com/leads/new

Company

Acme Holdings

Contact

Janelle Rivera

jrivera@...

CancelSubmit

Replay00:08.2 / 00:12.4

llmtoolbrowserinbox

Cost ledger

↓ 18% vs avg

gpt-4o · planner$0.052

62%

haiku · observer$0.022

26%

tools · embeddings$0.008

tools · search$0.002

The brief

Provision's product is the place agents run — managed compute, a real Chrome, an actual inbox, channel integrations. The product their customers actually buy is the ability to ship agents to production without a six-month detour through infrastructure. Which works until something goes wrong. Then customers were debugging blind, with logs that did not include the right context and no way to replay what happened.

The bet: a first-class observability and replay layer would convert Provision's runtime from "agents work, mostly" into "agents work, and when they don't, you know exactly why."

What we built

An OpenTelemetry-shaped tracing layer that captures every step of an agent run as a span — LLM call, tool call, browser action, inbox read, plan revision — with timing, token cost, and tool arguments attached. Spans stream into a purpose-built time-series store sized for agent workloads (steps are bursty; tokens are expensive; cardinality is high).

The replay layer is the most-loved feature. Every browser action and every inbox interaction is captured well enough that a customer can re-play the run frame-by-frame — watch the Chrome viewport, hear-the-llm-think, see exactly which tool argument was wrong. The replay is byte-identical to the original; we built a custom WebSocket protocol to ferry the captured frames into the customer's browser without rehydrating the entire run from cold storage.

Per-step cost surfacing closed a question every Provision customer had been asking: which step was expensive? The answer is now a hover.

Outcome

Mean-time-to-debug on production agent failures dropped ~45%, with the biggest wins on multi-step failures where the customer previously had to reconstruct the sequence by hand. The system ingests ~120,000 spans per second at peak. Replay fidelity is 100% — every captured run can be played back exactly.

Time from design partner pick to GA was seven weeks. The observability layer is now a primary selling point in Provision's enterprise demos.

"We were debugging blind for the first six months. Now we know exactly where the agent decided wrong, what it cost, and what it should have done instead."

Keep exploring.

Book a call

Insurance / Financial Services

2025

A modern policyholder portal for a 75-year-old insurance co-op

Co-operators is one of Canada's largest insurance co-operatives — 600+ locations, generations of customer trust, and a legacy self-service experience built around PDFs and phone trees. We rebuilt the policyholder portal around real-time claim status, digital ID cards, and a single-tap "file a new claim" flow that absorbs roughly a third of the inbound contact-centre volume.

Inbound call reduction: ~28%
Time-to-claim filed: 4 min

Read the full case study

WealthTech / Enterprise Document AI

2025

An "Ask your vault" AI layer for wealth-management advisors

FutureVault sits at the centre of how RIAs organise client documents — estate, tax, insurance, identity. We built a retrieval-grounded AI layer on top, letting an advisor ask plain-English questions across an entire household's document history and get cited answers in seconds instead of digging through folders.

Document lookup: 3.2× faster
Citation accuracy: ~96%

Read the full case study