Corvidae Debug Interface

Connection & Session

▼

Disconnected

WebSocket URL

Auto-initialize session after connect

Input Audio Line

Sample Rate (Hz)

Channel Count

Sample Format

Output Audio Line

Sample Rate (Hz)

Channel Count

Sample Format

VAD Configuration

Confidence Threshold (0.0-1.0)

Minimum Volume (0.0-1.0)

Start Duration (seconds)

Stop Duration (seconds)

Backbuffer Duration (seconds)

Enable per-frame VAD telemetry (debug)

When checked, the server emits VadAnalysisFrame messages (~50 Hz) with raw confidence/volume signals. State transitions (VadStateEvent) are emitted regardless.

Inference Configuration

System Prompt

TTS Configuration

TTS Provider

API Key (cached in browser)

Voice ID

Model ID (optional)

Location

Voice Settings Enable

Note: Get your API key and voice IDs from elevenlabs.io. Popular models: eleven_turbo_v2_5 (fast), eleven_multilingual_v2 (quality).

Audio Capture

▼

Not capturing

Audio Statistics

Packets sent: 0
Bytes sent: 0
Data rate: 0 KB/s

Replay Recording

▼

Load a recording (mp3 / wav / opus / m4a), drag a range on the waveform, and replay just that range through the same WebSocket session that mic capture uses. With VAD frame telemetry enabled, the server's confidence and volume signals are drawn on top of the waveform — useful for debugging "did VAD trigger on this segment?".

Audio file

                    No file loaded.
                

🖱 drag = select range shift+drag = pan wheel = zoom at cursor dbl-click = reset zoom view: —

states: Si=Silence St=Starting Sp=Speech En=Ending backbuffer (audio captured before trigger)

                    Hover the waveform to inspect a point in time.
                

Range start (s)

Range end (s)

Inference Trigger Mode

                    VAD: — | conf=—
                    | vol=— | last packet=—
                

Text Input

▼

Inference Trigger Mode

Message

Inference Control

▼

Idle

Extra Instructions (optional)

Note: Use this to manually trigger a response, such as generating a greeting at the start of a conversation or forcing a response without waiting for VAD.

Direct Speech

▼

Text to Speak

Include in history (LLM will know this was spoken)

Note: Speaks text directly via TTS without going through the LLM. Any active inference is interrupted. When "include in history" is checked, the LLM treats this as something it said.

Conversation Query

▼

System Prompt Override (optional)

Instructions (optional)

Result:

Note: Runs a one-shot inference with the conversation history. Does not modify the conversation. Useful for summarization, action items, etc.

Audio Playback

▼

Idle

Playback Statistics

Chunks received: 0
Bytes received: 0
Queue size: 0

Volume

Report playback position on interrupt (enables precise context truncation)

Tool Calling 0

▼

Tool Definitions

Tool Definitions (JSON array)

Current Tools: 0

⚡ Pending Tool Calls

Tool Call History

No tool calls yet

Tool Handler Script

▼

Write a JavaScript function to automatically handle tool calls.
The function receives name (tool name) and parameters (object), and should return a string result.

Enable Auto-Handler Disabled

Handler Function

Test Handler

Tool Name

Parameters (JSON)

Chat History

▼

Await transcriptions

User Transcriptions

▼

Event Log

▼