v1.0.0 · Open Source · Apache 2.0

Unit Testing for the
Voice AI Era

Simulate thousands of concurrent calls. Detect hallucinations instantly. Score latency down to the millisecond. Works with any voice agent.

Get Started See Features

$ pip install git+https://github.com/unforkopensource-org/decibench.git

Built-in Evaluators

Connectors Shipped

Testing Modes

Telemetry Calls

How it works

Write YAML. Run tests. Ship.

Decibench synthesizes caller audio, calls your agent over any protocol, transcribes the response, and scores it across 10 metrics — all from one command.

# 1. Install

$ pip install git+https://github.com/unforkopensource-org/decibench.git

# 2. Test the built-in demo agent (zero config)

$ decibench run target=demo suite=quick

# 3. Test YOUR agent

$ decibench run target=ws://localhost:8080/ws suite=standard

# 4. View results in the dashboard

$ decibench serve

✓ Dashboard running at http://localhost:8100

Testing Modes

Three levels of depth.

Deterministic

Exact string matching, regex, keyword checks. Sub-millisecond. Runs entirely locally with zero API costs.

FREE · ~ms per test

Semantic

LLM-as-Judge scores accuracy, compliance, and hallucination rates. Works with GPT-4o, Claude, Gemini, or Ollama.

~$0.01/call · ~2s per test

RAG-Augmented

Upload your knowledge base. Decibench auto-generates adversarial test suites that actively try to break your agent.

~$0.03/call · ~5s per test

Evaluators

10 metrics. Every call.

Every call is automatically scored across all applicable metrics. No configuration needed.

Latency

p50 / p90 / p95 / TTFB

WER / CER

Word & character error rates

Hallucination

LLM-graded factual accuracy

Task Completion

Did the agent achieve the goal?

Compliance

Mandatory disclosures & disclaimers

Interruption

Barge-in handling robustness

Silence

Dead air detection

MOS / STOI

Audio quality & intelligibility

Composite Score

Weighted aggregate — single number

Connectors

Works with your stack.

No SDK to install in your agent. Decibench connects to your agent — not the other way around.

WebSocket

ws://

✅

ElevenLabs

elevenlabs://

✅

Twilio Mock

twilio://

✅

HTTP

http://

✅

Process

exec:"…"

✅

Vapi

vapi://

🧪

Retell

retell://

🧪

LiveKit

—

📋

Stop testing manually.
Start shipping with confidence.

Decibench is free, open source, and ready for production.

View on GitHub Back to Unfork

Unit Testing for theVoice AI Era