Quick, copy-pasteable presets for common deployment scenarios. Adjust to suit your trunks and network.
See also: docs/Configuration-Reference.md for detailed option effects and ranges.
barge_in:
enabled: true
initial_protection_ms: 300
min_ms: 280
energy_threshold: 1200
cooldown_ms: 800
post_tts_end_protection_ms: 250
streaming:
jitter_buffer_ms: 90
min_start_ms: 280
low_watermark_ms: 180Why: Lower energy threshold and min_ms make barge-in more responsive. Keep jitter and warm-up modest.
barge_in:
enabled: true
initial_protection_ms: 400
min_ms: 450
energy_threshold: 2200
cooldown_ms: 1200
post_tts_end_protection_ms: 300
vad:
webrtc_aggressiveness: 1
webrtc_end_silence_frames: 35
min_utterance_duration_ms: 2600
streaming:
jitter_buffer_ms: 140
min_start_ms: 350
low_watermark_ms: 240Why: Higher thresholds reduce false barge-ins; larger buffers handle jitter; slightly longer utterances improve STT.
barge_in:
enabled: true
initial_protection_ms: 250
min_ms: 280
energy_threshold: 1400
cooldown_ms: 600
post_tts_end_protection_ms: 250
streaming:
jitter_buffer_ms: 80
min_start_ms: 250
low_watermark_ms: 160Why: Minimal buffering and faster barge-in. Expect higher risk of underruns and occasional self-echo on some trunks.
barge_in:
enabled: true
initial_protection_ms: 450
min_ms: 500
energy_threshold: 2000
cooldown_ms: 1200
post_tts_end_protection_ms: 300
streaming:
jitter_buffer_ms: 150
min_start_ms: 380
low_watermark_ms: 260
provider_grace_ms: 600Why: Larger buffers and conservative barge-in minimize glitches at the cost of slightly slower starts.
Enable on the provider to improve turn-taking (optional):
providers:
openai_realtime:
# ...
turn_detection:
type: "server_vad"
silence_duration_ms: 200
threshold: 0.5
prefix_padding_ms: 200Use with or without local VAD. If both run, prefer conservative local VAD (e.g., longer end-silence) to avoid clashes.
downstream_mode: stream: best UX, requires stable network; tunestreaming.*buffers.downstream_mode: file: more tolerant to jitter and provider hiccups, at the cost of response latency.
; Dialplan handshake (optional but recommended)
exten => s,n,Set(AI_TRANSPORT_FORMAT=slin16) ; or ulaw
exten => s,n,Set(AI_TRANSPORT_RATE=8000) ; 8k/16k/24k- If the dialplan omits those variables, the engine falls back to
config.audiosocket.formatand logs the default it is using. - The engine auto-aligns downstream streaming targets; when a provider’s config disagrees, it will log an actionable warning (with the exact YAML keys to fix) and expose
ai_agent_codec_alignment{call_id,provider}as0in/metrics. - To keep providers in PCM16 internally while the AudioSocket leg remains μ-law, enable:
streaming:
egress_force_mulaw: trueThis converts outbound streaming audio back to μ-law/8 kHz right before it is written to Asterisk, regardless of provider output.
- RMS/DC offset diagnostics for each stage are published as
ai_agent_audio_rms{stage=...}andai_agent_audio_dc_offset{stage=...}so you can alert on silent or biased audio before customers notice.
v4.0 introduces modular pipelines where STT, LLM, and TTS are separate. Tuning considerations differ from monolithic providers.
Optimal Settings for 3-7 second response time:
pipelines:
local_hybrid:
stt: local_stt
llm: openai_llm
tts: local_tts
options:
stt:
chunk_ms: 320 # Balanced latency vs context
streaming: true
stream_format: "pcm16_16k" # Match internal format (no transcoding)
llm:
model: "gpt-4o-mini" # Fast, cost-effective
temperature: 0.7
max_tokens: 150 # Keep responses concise
tts:
format:
encoding: ulaw
sample_rate: 8000
vad:
enabled: true
mode: "webrtc"
webrtc_aggressiveness: 2 # Normal (good for Vosk)
min_speech_duration_ms: 250
silence_duration_ms: 700 # Balance responsiveness vs cutting off
speech_pad_ms: 300
barge_in:
enabled: true
min_ms: 400 # Prevent false interrupts during silence
energy_threshold: 1800Why: Vosk performs better with slightly longer audio chunks. gpt-4o-mini is faster than gpt-4o (500ms vs 1.5s). Pipeline mode requires downstream_mode: file.
Hardware Note: Modern CPU (2020+) required for acceptable latency. See HARDWARE_REQUIREMENTS.md.
Optimal Settings for <1.5 second response time:
default_provider: "openai_realtime"
providers:
openai_realtime:
enabled: true
turn_detection:
type: "server_vad" # Use OpenAI's server-side VAD
silence_duration_ms: 200 # Quick turn detection
threshold: 0.5
prefix_padding_ms: 200
vad:
enabled: true
webrtc_aggressiveness: 1 # CRITICAL: Low for OpenAI (prevents conflicts)
silence_duration_ms: 1000 # Let server VAD handle turn detection
barge_in:
enabled: false # Server-side turn handling sufficient
streaming:
jitter_buffer_ms: 90 # Low latency streaming
min_start_ms: 250
low_watermark_ms: 180
downstream_mode: "stream" # Real-time streaming for best UXWhy: OpenAI Realtime has server-side VAD that handles turn detection. Keep local VAD passive (aggressiveness: 1) to avoid conflicts. Streaming mode provides lowest latency.
Critical: webrtc_aggressiveness: 1 is essential - higher values cause audio clipping with OpenAI's VAD.
Optimal Settings for 1-2 second response time:
default_provider: "deepgram"
providers:
deepgram:
enabled: true
model: "nova-2-general"
tts_model: "aura-asteria-en"
continuous_input: true # Stream audio continuously
input_encoding: "linear16" # Best quality (PCM16)
input_sample_rate_hz: 8000
output_encoding: "mulaw"
output_sample_rate_hz: 8000
barge_in:
enabled: false # Deepgram handles turn-taking
vad:
enabled: true
webrtc_aggressiveness: 1 # Low (Deepgram has server-side VAD)
silence_duration_ms: 800
streaming:
jitter_buffer_ms: 100
min_start_ms: 300
low_watermark_ms: 200
downstream_mode: "file" # More robust for DeepgramWhy: Deepgram Voice Agent includes "Think" stage for reasoning. file mode is more robust than stream with Deepgram's multi-stage processing. continuous_input: true required for real-time conversation.
Cost: ~$0.03/min (cheaper than OpenAI Realtime)
| Configuration | Response Time | Cost/min | Tuning Complexity | Best For |
|---|---|---|---|---|
| OpenAI Realtime | 0.5-1.5s | ~$0.06 | Low (server VAD) | Speed, simplicity |
| Deepgram | 1-2s | ~$0.03 | Low (server VAD) | Cost, reasoning |
| Local Hybrid | 3-7s | ~$0.002 | Medium (local VAD) | Privacy, cost |
Use Prometheus metrics to validate tuning:
# Turn response latency (target: <2s p95)
histogram_quantile(0.95, rate(ai_agent_turn_response_seconds_bucket[5m]))
# Underflow rate (target: 0 per call)
rate(ai_agent_stream_underflow_events_total[5m]) / rate(ai_agent_streaming_sessions_total[5m])
# Barge-in effectiveness (if enabled)
rate(ai_agent_barge_in_triggered_total[5m])
See MONITORING_GUIDE.md for dashboard setup and alert thresholds.