LLM Workbench · Protocol v1.0.0
Ship LLM agents you can debug, fork, and replay.
LLM Workbench turns each run of your agent into a tamper-evident, model-agnostic, human-gated bundle: trace events, artifacts, gates, and cost — signed, exportable, and replayable.
0 runs persisted · v1.0.0 · Proprietary · Source
constellation
Built for agents, auditors, and the engineers between them.
A reference plane that is as legible to curl as it is to Cursor — same protocol, same bundles, different consumers.
live telemetry (illustrative)
Every run emits a structured spine.
Events stream into the bundle as the DAG advances: gates, model I/O, artifacts, integrity — all machine-addressable before a human touches the UI.
The near future isn't only prompt engineering — it's audit trails for cognition: runs you can diff, sign, and replay.
one import changes
What you write changes by one import.
Drop in tracedGenerateText from @llm-workbench/ai-sdk and every call becomes a structured trace event, persisted into the run bundle, gated by your workflow policy. Click any line on the right to preview the events.
1import { generateText } from 'ai';2import { openai } from '@ai-sdk/openai';3 4const result = await generateText({5 model: openai('gpt-4o-mini'),6 system: 'You are the DeLorean flight computer.',7 prompt: 'Power needed to hit 88 mph: ' + plan,8});9 10console.log(result.text);11// every call is opaque to your platform12// no run id, no spans, no human gate, no replayClick a line on the right to preview which trace events the runtime would emit. Every event is durable, replayable, and exported in the run bundle.
Step into the playground, or drive the same contract from your agent.
Every surface — UI, HTTP, MCP — agrees on the same run bundle. Pick yours and start persisting reality.
0 runs on this plane