Skip to content

Per-Provider VAD Configuration (replace global use_provider_vad toggle) #255

@hkjarral

Description

@hkjarral

Problem

The use_provider_vad setting under vad: in ai-agent.yaml is a global toggle that disables local Enhanced VAD + WebRTC VAD for ALL providers when enabled. This creates a conflict:

  • OpenAI Realtime: Has true end-to-end VAD + barge-in (has_native_vad=True, has_native_barge_in=True). Local VAD is redundant and can interfere. use_provider_vad=True is ideal.
  • Google Live: Has server-side VAD for turn detection, but no AEC (acoustic echo cancellation). On telephony, gating sends silence during TTS, so Google's VAD can't detect user speech for barge-in. local_vad_fallback (Enhanced VAD + WebRTC) is the only working barge-in mechanism. use_provider_vad=True cripples it.
  • Deepgram: Text-based output, no echo risk. Local VAD fallback works well.
  • Local/Pipeline: Requires local VAD entirely.

Evidence from Testing (Feb 28, 2026)

Two test calls on 10.44.0.103 with Google Live + ExternalMedia:

Setting Call ID Barge-in criteria Confidence Risk
use_provider_vad: true 1772321249.201 1 of 1 (energy only) 0.0 High false-positive risk
use_provider_vad: false 1772321376.205 4 of 4 (VAD+energy+confidence+WebRTC) 0.838–1.0 Low, robust

With provider VAD ON, Enhanced VAD and WebRTC VAD are not initialized (self.vad_manager = None, self.webrtc_vad = None), so barge-in degrades to energy-only detection — a single-point-of-failure that will false-trigger on background noise or speakerphone echo.

Research Findings

  • Google Gemini Live API has no built-in AEC (confirmed via official docs, Perplexity, Exa research)
  • Google sends serverContent.interrupted = true as its native barge-in signal, but this only works when Google receives real audio (not silence from gating)
  • In telephony without client-side AEC, gating is necessary to prevent echo false triggers
  • Pipecat uses WebRTC AEC preprocessing; Android uses AcousticEchoCanceler — we have neither on Asterisk telephony

Proposed Solution

Make VAD mode per-provider instead of global. Two approaches:

Option A: Runtime per-call decision (minimal config change)

Keep local VAD always initialized. At runtime in _maybe_provider_barge_in_fallback, check the active provider's ProviderCapabilities:

# Skip local VAD fallback only for providers that truly handle barge-in natively
caps = provider.get_capabilities()
if caps.has_native_barge_in and caps.has_native_vad:
    return  # Provider handles everything
# Otherwise run local VAD as usual

Option B: Per-context/per-provider YAML config

contexts:
  demo_google_live:
    vad:
      use_provider_vad: false  # Override: Google needs local VAD
  demo_openai:
    vad:
      use_provider_vad: true   # OpenAI handles its own VAD

Option C: Automatic based on ProviderCapabilities

Engine checks has_native_barge_in at call start and automatically enables/disables local VAD per call. No config needed — capabilities drive behavior.

Current State

  • use_provider_vad is defined in src/config.py:491 as a global VADConfig field
  • Read once at engine startup (src/engine.py:415-416)
  • Controls initialization of self.vad_manager (EnhancedVADManager) and self.webrtc_vad
  • ProviderCapabilities (has_native_vad, has_native_barge_in) exist but are not wired into VAD init or runtime fallback logic
  • Admin UI exposes it as a global toggle on the VAD settings page

Files Involved

  • src/engine.py — VAD initialization + _maybe_provider_barge_in_fallback
  • src/config.pyVADConfig model
  • admin_ui/frontend/src/pages/ — VAD settings UI
  • admin_ui/backend/main.py — config API

Acceptance Criteria

  • Local VAD (Enhanced + WebRTC) stays active for Google Live calls regardless of setting
  • OpenAI Realtime can opt into provider-managed VAD without affecting other providers
  • Barge-in criteria remains multi-signal (≥2 of 4) for providers using local VAD
  • Admin UI reflects per-provider or automatic VAD mode
  • No regression on existing barge-in behavior for any provider

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions