fix(sarvam-tts): wrap raw PCM in RIFF/WAVE header when wav codec returns headerless bytes by IgnazioDS · Pull Request #5280 · livekit/agents

IgnazioDS · 2026-03-30T18:30:12Z

Summary

When output_audio_codec="wav" is set, the Sarvam API can return raw PCM bytes without a RIFF/WAVE header. The plugin then calls output_emitter.initialize(mime_type="audio/wav"), which causes downstream decoders to expect a RIFF/WAVE header — crashing with "Invalid WAV file: missing RIFF/WAVE".

Root cause

mime_type = f"audio/{self._opts.output_audio_codec}" correctly signals audio/wav, but base64.b64decode(b64) yields raw PCM bytes when Sarvam omits the container header.

Fix

Added a _pcm_to_wav() helper that prepends the standard RIFF/WAVE header (computed from sample_rate, num_channels, bit_depth=16) to raw PCM data.

REST path (Synthesize._run):

Collect all base64-decoded chunks
If bytes don't start with b"RIFF", wrap them with _pcm_to_wav()
Push the complete WAV to the emitter

Streaming WebSocket path (SynthesizeStream):

When output_audio_codec == "wav", buffer raw PCM chunks in _wav_buffer (instead of pushing individual chunks, which can't form valid standalone WAV frames)
On event_type == "final", assemble the buffer into a complete WAV and push before calling output_emitter.end_input()

The b"RIFF" check on the assembled bytes makes the fix safe for Sarvam API responses that do include a proper WAV header.

Test plan

output_audio_codec="wav" with bulbul:v3 no longer raises "Invalid WAV file: missing RIFF/WAVE"
output_audio_codec="mp3" (default) is unaffected
Streaming path emits audio correctly after "final" event with wav codec
_pcm_to_wav() produces a parseable WAV file (standard RIFF header verified)

…rns headerless bytes

CLAassistant · 2026-03-30T18:30:19Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 609115a9e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-30T18:33:01Z

livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/tts.py

+            if self._opts.output_audio_codec == "wav":
+                # Buffer raw PCM; the complete WAV container will be pushed once
+                # the "final" event is received (see _handle_event_message).
+                self._wav_buffer.append(audio_bytes)


Flush buffered WAV audio on non-final stream termination

For output_audio_codec == "wav", _handle_audio_message now only appends chunks to _wav_buffer, and the buffer is emitted only in _handle_event_message when event_type == "final". If send_completion_event=False (a supported option wired into the WebSocket URL) or the server closes the socket without sending a final event, _run still calls end_input() but never pushes buffered audio, so the stream can complete with missing/empty output. Please add a fallback flush path when the WS loop exits without a final event.

Useful? React with 👍 / 👎.

devin-ai-integration

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-03-30T18:34:55Z

livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/tts.py

+            if self._opts.output_audio_codec == "wav":
+                # Buffer raw PCM; the complete WAV container will be pushed once
+                # the "final" event is received (see _handle_event_message).
+                self._wav_buffer.append(audio_bytes)


🔴 Streaming wav audio silently dropped when WebSocket closes without "final" event

The new wav buffering logic in _handle_audio_message (line 1070-1073) appends audio chunks to self._wav_buffer instead of pushing them to the emitter. The buffer is only flushed in _handle_event_message when a "final" event is received (line 1122-1132). However, when send_completion_event=False is configured (a user-facing option at tts.py:395), the server never sends a "final" event — the WebSocket simply closes after all audio is streamed. The recv_task breaks out of its loop on the WS close at line 937, _run_ws completes normally, and the buffered audio is never flushed. The _run method's finally block (line 831-833) calls output_emitter.end_input() which closes the emitter's write channel, permanently discarding the buffered data.

Before this PR, audio chunks were pushed to the emitter immediately regardless of codec, so audio was always delivered. This is a regression that causes silent audio data loss.

Prompt for agents

In livekit-plugins/livekit-plugins-sarvam/livekit/plugins/sarvam/tts.py, the _wav_buffer is only flushed when a "final" event is received in _handle_event_message (lines 1122-1132). When send_completion_event=False, the server never sends this event, so the buffer is never flushed. To fix this, add a fallback flush of the wav buffer at the end of _run_ws (around lines 983-987, in the inner finally block after asyncio.gather completes). After the gracefully_cancel call and before setting tasks to None, add: if self._wav_buffer: all_pcm = b"".join(self._wav_buffer) self._wav_buffer.clear() if not all_pcm.startswith(b"RIFF"): all_pcm = _pcm_to_wav( all_pcm, self._opts.speech_sample_rate, 1 ) output_emitter.push(all_pcm) This ensures that even when the "final" event is never received (e.g. send_completion_event=False, or unexpected WS close), any buffered wav audio is still emitted.

Was this helpful? React with 👍 or 👎 to provide feedback.

fix(sarvam-tts): wrap raw PCM in RIFF/WAVE header when wav codec retu…

609115a

…rns headerless bytes

chatgpt-codex-connector bot reviewed Mar 30, 2026

View reviewed changes

devin-ai-integration bot reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sarvam-tts): wrap raw PCM in RIFF/WAVE header when wav codec returns headerless bytes#5280

fix(sarvam-tts): wrap raw PCM in RIFF/WAVE header when wav codec returns headerless bytes#5280
IgnazioDS wants to merge 1 commit intolivekit:mainfrom
IgnazioDS:fix/sarvam-wav-raw-pcm

IgnazioDS commented Mar 30, 2026

Uh oh!

CLAassistant commented Mar 30, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 30, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

IgnazioDS commented Mar 30, 2026

Summary

Root cause

Fix

Test plan

Uh oh!

CLAassistant commented Mar 30, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants