-
Notifications
You must be signed in to change notification settings - Fork 3k
[sarvam tts] output_audio_codec="wav" causes "Invalid WAV file: missing RIFF/WAVE" errorΒ #5267
Description
Bug Description
Note: This occurred a few times and I have not been able to reproduce it consistently. Filing anyway as the root cause is clear from the traceback and source code inspection.
When using the Sarvam TTS plugin with output_audio_codec="wav", the Sarvam API returns raw PCM bytes instead of a proper WAV container file. The plugin sets mime_type to audio/wav, which causes LiveKit's internal decoder to expect a RIFF/WAVE header β but since none is present, it crashes immediately on the first audio chunk.
Expected Behavior
Either the plugin should prepend a valid RIFF/WAVE header to the raw PCM bytes returned by Sarvam before pushing to the emitter, or wav should be removed from ALLOWED_OUTPUT_AUDIO_CODECS / raise a clear error explaining it is unsupported in this context.
Reproduction Steps
import sarvam
OurTTS = sarvam.TTS(
target_language_code="hi-IN",
model="bulbul:v3",
speaker="ritu",
pace=1.0,
temperature=0.8,
output_audio_codec="wav" # triggers the bug
)
Run any agent that synthesizes speech using the above TTS instance.Operating System
Fedora Linux
Models Used
TTS: Sarvam bulbul:v3
Package Versions
livekit-agents β latest from GitHub
livekit-plugins-sarvam β latest from GitHub
Python 3.13Session/Room/Call IDs
No response
Proposed Solution
Prepend a RIFF/WAVE header to the raw PCM data when output_audio_codec == "wav" before pushing to the emitter. This needs to be done in two places:
ChunkedStream._run (non-streaming path)
SynthesizeStream._handle_audio_message (WebSocket streaming path)
Workaround: Use output_audio_codec="mp3" β Sarvam returns a proper mp3 bitstream for this codec and it works correctly end-to-end.Additional Context
Additional Context
The issue is that wav is listed in ALLOWED_OUTPUT_AUDIO_CODECS in tts.py and accepted without error, giving users no indication it will fail at runtime. The Sarvam API itself returns raw PCM (no container) for the wav format. This is a Sarvam API behaviour that the plugin needs to account for.
Screenshots and Recordings
