Skip to content

Voice Agent TTS audio renders as static despite correct config (request_id: fb4f1cce-d56e-49fd-9fd1-941c9f5e916b) #443

@brnpoor

Description

@brnpoor

I'm using the Deepgram Voice Agent API in my production application (ANARA, an AI personal assistant). The connection establishes successfully, settings are applied, and audio frames arrive correctly—but all output plays as full-spectrum static instead of intelligible speech.

Environment

  • Browser: Chrome (latest stable) on Windows 11
  • SDK: @deepgram/sdk (latest)
  • Integration: WebSocket to wss://agent.deepgram.com/v1/agent/converse with token subprotocol auth
  • Voice Model: aura-2-helena-en
  • Audio Config: linear16, 16 kHz, container "none"

Steps to Reproduce

  1. Open browser developer console
  2. Connect WebSocket to Voice Agent endpoint using token subprotocol auth
  3. Send Settings payload with configuration below (see attached voice-agent-config.json)
  4. Send microphone audio stream as Int16 PCM @ 16 kHz
  5. Receive binary frames and play via Web Audio API (AudioContext, copyToChannel, BufferSource)

Expected: Helena voice responds clearly
Actual: All playback is broadband static/noise

Key Evidence

  • Request ID: fb4f1cce-d56e-49fd-9fd1-941c9f5e916b
  • Control Test Result: Ran Deepgram /v1/speak endpoint with the same Helena voice → WAV output is crystal clear (SHA256: D26DB80EED6C06E10D4D44BD15451926DBB7349382F000B70DA7510193BF731C)
  • Diagnostics Collected:
    • AudioContext sample rate (48 kHz, resampled from requested 16 kHz)
    • First 32 bytes of binary payload logged
    • Playback chain includes DC-block high-pass filter @ 20 Hz
    • No decoding errors or exceptions on the client side
    • All PCM frames decode without throwing, no NaN/Infinity samples

Root Cause Analysis

The static is isolated to the Voice Agent streaming pipeline, not:

  • The Helena TTS model itself (proven by clean /v1/speak output)
  • Web Audio API playback (no errors, correct chain)
  • Browser configuration (sample rate resampling handled)
  • Network or frame delivery (all frames arrive on schedule)

What I Need

  1. Internal trace for request fb4f1cce-d56e-49fd-9fd1-941c9f5e916b
  2. Confirmation that Voice Agent + Helena voice configuration is valid
  3. Guidance on next debugging steps or notification if this is a known issue

Urgency

This is blocking production launch. Any guidance would be greatly appreciated.

Attachments

Please attach these files when posting this issue:

  1. voice-agent-session-log.txt – timeline, environment, session details
  2. browser-audio-diagnostics.txt – AudioContext stats, console logs, playback observations
  3. voice-agent-config.json – exact Settings payload sent (credentials removed)
  4. tts-control-test.txt – proof that TTS endpoint outputs clean audio
  5. client-instrumentation-snippet.txt – code snippet showing how diagnostics were captured (no proprietary logic)
  6. helena.wav – reference clean TTS audio for comparison

================================================================================
FILES TO ATTACH (all located in support/deepgram/ folder):

✓ voice-agent-session-log.txt
✓ browser-audio-diagnostics.txt
✓ voice-agent-config.json
✓ tts-control-test.txt
✓ client-instrumentation-snippet.txt
✓ helena.wav (from repo root)

================================================================================

browser-audio-diagnostics.txt

client-instrumentation-snippet.txt

GITHUB_ISSUE_READY_TO_POST.txt

tts-control-test.txt

voice-agent-config.json

voice-agent-session-log.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions