When checked, the server emits VadAnalysisFrame messages (~50 Hz)
with raw confidence/volume signals. State transitions
(VadStateEvent) are emitted regardless.
Inference Configuration
TTS Configuration
Note: Get your API key and voice IDs from elevenlabs.io.
Popular models: eleven_turbo_v2_5 (fast),
eleven_multilingual_v2 (quality).
No audio loaded.
High Quality: Sentence-level batching with full acoustic context (ICL).
Best voice similarity and prosody, higher latency.
Audio Capture
▼
Not capturing
Packets sent: 0
Bytes sent: 0
Data rate: 0 KB/s
Replay Recording
▼
Load a recording (mp3 / wav / opus / m4a), drag a range on the waveform,
and replay just that range through the same WebSocket session that mic
capture uses. With VAD frame telemetry enabled, the server's confidence
and volume signals are drawn on top of the waveform — useful for
debugging "did VAD trigger on this segment?".
states: Si=Silence St=Starting Sp=Speech En=Ending backbuffer (audio captured before trigger)
Hover the waveform to inspect a point in time.
VAD: — | conf=—
| vol=— | last packet=—
Text Input
▼
Inference Control
▼
Idle
Note: Use this to manually trigger a response, such as generating a greeting
at the start of a conversation or forcing a response without waiting for VAD.
Direct Speech
▼
Note: Speaks text directly via TTS without going through the LLM.
Any active inference is interrupted. When "include in history" is checked, the LLM
treats this as something it said.
Conversation Query
▼
Note: Runs a one-shot inference with the conversation history.
Does not modify the conversation. Useful for summarization, action items, etc.
Audio Playback
▼
Idle
Chunks received: 0
Bytes received: 0
Queue size: 0
Tool Calling 0
▼
Tool Definitions
⚡ Pending Tool Calls
Tool Call History
No tool calls yet
Tool Handler Script
▼
Write a JavaScript function to automatically handle tool calls.
The function receives name (tool name) and
parameters (object), and should return a string result.