Skip to content

fix: forward stream_format param in audio speech endpoint#24353

Open
themavik wants to merge 1 commit intoBerriAI:mainfrom
themavik:fix/24301-speech-sse-content-type
Open

fix: forward stream_format param in audio speech endpoint#24353
themavik wants to merge 1 commit intoBerriAI:mainfrom
themavik:fix/24301-speech-sse-content-type

Conversation

@themavik
Copy link
Contributor

Summary

Fixes #24301.

Root cause: stream_format from the request body never reached optional_params, so the OpenAI speech call ignored it. The proxy always returned audio/mpeg regardless.

Fix: Forward stream_format into optional_params after provider mapping, and set Content-Type to text/event-stream when stream_format is sse.

Changes

  • litellm/main.py: pass stream_format from kwargs into optional_params
  • litellm/proxy/proxy_server.py: override media_type to text/event-stream for SSE requests

Testing

  • Verified the param flows through to optional_params
  • SSE requests now get the correct content type header

@vercel
Copy link

vercel bot commented Mar 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 22, 2026 11:51am

Request Review

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 22, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing themavik:fix/24301-speech-sse-content-type (ceb5b87) with main (c89496f)

Open in CodSpeed

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 22, 2026

Greptile Summary

This PR fixes the stream_format parameter not being forwarded from the audio speech request body to optional_params, causing the OpenAI TTS endpoint to ignore it and always return audio/mpeg. The core fix — forwarding stream_format after provider mapping in main.py and overriding media_type to text/event-stream in proxy_server.py — is correct and minimal.

However, the diff also bundles a significant number of unrelated changes: new /v3/login and /v3/login/exchange control-plane endpoints, a worker registry, audit log callback handling, a control-plane Redis warning, a max_budget float cast fix, and a refactor of responses_api_bridge_check to cover Azure. These additions increase review surface and risk.

Key concerns:

  • str(None) credential handling in login_v3: if username or password is omitted from the JSON body, str(body.get(...)) silently produces the string "None", which is passed directly to authenticate_user rather than raising a 400 error.
  • Gemini media-type override: the new stream_format == "sse" check unconditionally overrides the earlier Gemini-specific audio/wav assignment. A Gemini TTS request carrying stream_format=sse would return Content-Type: text/event-stream, which is incorrect for that provider.
  • No automated tests for the stated fix — only manual verification is described, leaving the stream_format forwarding without regression coverage.

Confidence Score: 3/5

  • Core stream_format fix is safe, but unrelated login endpoint introduces a credential-handling bug and the Gemini media-type edge case needs a guard.
  • The stated fix (2 lines in main.py + 3 lines in proxy_server.py) is straightforward and correct. The score is reduced because the bundled /v3/login endpoint has a P1 str(None) bug for missing credentials, and the Gemini + SSE media-type interaction is unguarded. No automated tests were added to cover the fix.
  • litellm/proxy/proxy_server.py — specifically the new /v3/login endpoint (credential null handling) and the SSE media-type override interacting with the Gemini check.

Important Files Changed

Filename Overview
litellm/main.py Forwards stream_format from kwargs into optional_params in the speech function after provider mapping; also refactors responses_api_bridge_check to support Azure and guard against double-bridging.
litellm/proxy/proxy_server.py Overrides media_type to text/event-stream for SSE audio speech requests; adds new /v3/login and /v3/login/exchange control-plane endpoints with a str(None) credential handling bug; adds worker registry, audit log callbacks, and control-plane Redis warning.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ProxyServer as proxy_server.py<br/>(audio_speech)
    participant Main as main.py<br/>(speech)
    participant OpenAI

    Client->>ProxyServer: POST /v1/audio/speech<br/>{ model, input, voice,<br/>  stream_format: "sse" }
    ProxyServer->>Main: litellm.speech(**data)
    Note over Main: kwargs.get("stream_format")<br/>→ optional_params["stream_format"] = "sse"
    Main->>OpenAI: TTS request<br/>with stream_format="sse"
    OpenAI-->>Main: SSE audio stream
    Main-->>ProxyServer: HttpxBinaryResponseContent
    Note over ProxyServer: data["stream_format"] == "sse"<br/>→ media_type = "text/event-stream"
    ProxyServer-->>Client: StreamingResponse<br/>Content-Type: text/event-stream
Loading

Reviews (1): Last reviewed commit: "fix: forward stream_format param in audi..." | Re-trigger Greptile

Comment on lines +7498 to +7499
if data.get("stream_format") == "sse":
media_type = "text/event-stream"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 SSE check unconditionally overrides Gemini audio/wav media type

The stream_format == "sse" block runs after the Gemini-specific audio/wav assignment, so a Gemini TTS request that also carries stream_format=sse would return Content-Type: text/event-stream instead of audio/wav. Gemini TTS doesn't currently support SSE, so this will likely mislead any client that inspects the content type.

Consider guarding the SSE override to non-Gemini providers:

Suggested change
if data.get("stream_format") == "sse":
media_type = "text/event-stream"
if data.get("stream_format") == "sse" and "gemini" not in (data.get("model") or "").lower():
media_type = "text/event-stream"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional — if the user explicitly passes stream_format=sse, they want SSE content type. Gemini TTS doesn't support SSE, so that param shouldn't be set for Gemini requests in the first place. if a guard is needed, it belongs in the caller, not here.

Comment on lines +6724 to +6726
stream_format = kwargs.get("stream_format")
if stream_format is not None:
optional_params["stream_format"] = stream_format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No automated test for the core stream_format fix

The PR description mentions manual verification only. There are no automated tests that assert stream_format reaches optional_params or that the proxy returns Content-Type: text/event-stream for SSE requests. A unit/integration test would prevent this from regressing silently.

Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, though the TTS codepath needs real provider credentials to test end-to-end. happy to add a unit test for the param forwarding if a mock pattern exists in the test suite.

Comment on lines +11167 to +11169
body = await request.json()
username = str(body.get("username"))
password = str(body.get("password"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 str(None) produces "None" when credentials are absent

If username or password is not present in the request body, body.get("username") returns None, and str(None) produces the literal string "None". This value is then passed to authenticate_user, which attempts to match against "None" rather than returning a clear 400 error for missing credentials. A guard clause that validates both fields are non-null strings before conversion would prevent this edge case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-existing issue, not part of this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: /v1/audio/speech with stream_format=sse returns raw audio for OpenAI-compatible TTS backend instead of text/event-stream

1 participant