-
Notifications
You must be signed in to change notification settings - Fork 124
Description
Problem
The use_provider_vad setting under vad: in ai-agent.yaml is a global toggle that disables local Enhanced VAD + WebRTC VAD for ALL providers when enabled. This creates a conflict:
- OpenAI Realtime: Has true end-to-end VAD + barge-in (
has_native_vad=True,has_native_barge_in=True). Local VAD is redundant and can interfere.use_provider_vad=Trueis ideal. - Google Live: Has server-side VAD for turn detection, but no AEC (acoustic echo cancellation). On telephony, gating sends silence during TTS, so Google's VAD can't detect user speech for barge-in.
local_vad_fallback(Enhanced VAD + WebRTC) is the only working barge-in mechanism.use_provider_vad=Truecripples it. - Deepgram: Text-based output, no echo risk. Local VAD fallback works well.
- Local/Pipeline: Requires local VAD entirely.
Evidence from Testing (Feb 28, 2026)
Two test calls on 10.44.0.103 with Google Live + ExternalMedia:
| Setting | Call ID | Barge-in criteria | Confidence | Risk |
|---|---|---|---|---|
use_provider_vad: true |
1772321249.201 | 1 of 1 (energy only) | 0.0 | High false-positive risk |
use_provider_vad: false |
1772321376.205 | 4 of 4 (VAD+energy+confidence+WebRTC) | 0.838–1.0 | Low, robust |
With provider VAD ON, Enhanced VAD and WebRTC VAD are not initialized (self.vad_manager = None, self.webrtc_vad = None), so barge-in degrades to energy-only detection — a single-point-of-failure that will false-trigger on background noise or speakerphone echo.
Research Findings
- Google Gemini Live API has no built-in AEC (confirmed via official docs, Perplexity, Exa research)
- Google sends
serverContent.interrupted = trueas its native barge-in signal, but this only works when Google receives real audio (not silence from gating) - In telephony without client-side AEC, gating is necessary to prevent echo false triggers
- Pipecat uses WebRTC AEC preprocessing; Android uses
AcousticEchoCanceler— we have neither on Asterisk telephony
Proposed Solution
Make VAD mode per-provider instead of global. Two approaches:
Option A: Runtime per-call decision (minimal config change)
Keep local VAD always initialized. At runtime in _maybe_provider_barge_in_fallback, check the active provider's ProviderCapabilities:
# Skip local VAD fallback only for providers that truly handle barge-in natively
caps = provider.get_capabilities()
if caps.has_native_barge_in and caps.has_native_vad:
return # Provider handles everything
# Otherwise run local VAD as usualOption B: Per-context/per-provider YAML config
contexts:
demo_google_live:
vad:
use_provider_vad: false # Override: Google needs local VAD
demo_openai:
vad:
use_provider_vad: true # OpenAI handles its own VADOption C: Automatic based on ProviderCapabilities
Engine checks has_native_barge_in at call start and automatically enables/disables local VAD per call. No config needed — capabilities drive behavior.
Current State
use_provider_vadis defined insrc/config.py:491as a globalVADConfigfield- Read once at engine startup (
src/engine.py:415-416) - Controls initialization of
self.vad_manager(EnhancedVADManager) andself.webrtc_vad ProviderCapabilities(has_native_vad,has_native_barge_in) exist but are not wired into VAD init or runtime fallback logic- Admin UI exposes it as a global toggle on the VAD settings page
Files Involved
src/engine.py— VAD initialization +_maybe_provider_barge_in_fallbacksrc/config.py—VADConfigmodeladmin_ui/frontend/src/pages/— VAD settings UIadmin_ui/backend/main.py— config API
Acceptance Criteria
- Local VAD (Enhanced + WebRTC) stays active for Google Live calls regardless of setting
- OpenAI Realtime can opt into provider-managed VAD without affecting other providers
- Barge-in criteria remains multi-signal (≥2 of 4) for providers using local VAD
- Admin UI reflects per-provider or automatic VAD mode
- No regression on existing barge-in behavior for any provider