Skip to content

[sarvam tts] output_audio_codec="wav" causes "Invalid WAV file: missing RIFF/WAVE" errorΒ #5267

@rs0125

Description

@rs0125

Bug Description

Note: This occurred a few times and I have not been able to reproduce it consistently. Filing anyway as the root cause is clear from the traceback and source code inspection.

When using the Sarvam TTS plugin with output_audio_codec="wav", the Sarvam API returns raw PCM bytes instead of a proper WAV container file. The plugin sets mime_type to audio/wav, which causes LiveKit's internal decoder to expect a RIFF/WAVE header β€” but since none is present, it crashes immediately on the first audio chunk.

Expected Behavior

Either the plugin should prepend a valid RIFF/WAVE header to the raw PCM bytes returned by Sarvam before pushing to the emitter, or wav should be removed from ALLOWED_OUTPUT_AUDIO_CODECS / raise a clear error explaining it is unsupported in this context.

Reproduction Steps

import sarvam

OurTTS = sarvam.TTS(
    target_language_code="hi-IN",
    model="bulbul:v3",
    speaker="ritu",
    pace=1.0,
    temperature=0.8,
    output_audio_codec="wav"  # triggers the bug
)


Run any agent that synthesizes speech using the above TTS instance.

Operating System

Fedora Linux

Models Used

TTS: Sarvam bulbul:v3

Package Versions

livekit-agents β€” latest from GitHub
livekit-plugins-sarvam β€” latest from GitHub
Python 3.13

Session/Room/Call IDs

No response

Proposed Solution

Prepend a RIFF/WAVE header to the raw PCM data when output_audio_codec == "wav" before pushing to the emitter. This needs to be done in two places:

ChunkedStream._run (non-streaming path)
SynthesizeStream._handle_audio_message (WebSocket streaming path)


Workaround: Use output_audio_codec="mp3" β€” Sarvam returns a proper mp3 bitstream for this codec and it works correctly end-to-end.

Additional Context

Additional Context
The issue is that wav is listed in ALLOWED_OUTPUT_AUDIO_CODECS in tts.py and accepted without error, giving users no indication it will fail at runtime. The Sarvam API itself returns raw PCM (no container) for the wav format. This is a Sarvam API behaviour that the plugin needs to account for.

Screenshots and Recordings

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions