fix: forward stream_format param in audio speech endpoint#24353
fix: forward stream_format param in audio speech endpoint#24353themavik wants to merge 1 commit intoBerriAI:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes the However, the diff also bundles a significant number of unrelated changes: new Key concerns:
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/main.py | Forwards stream_format from kwargs into optional_params in the speech function after provider mapping; also refactors responses_api_bridge_check to support Azure and guard against double-bridging. |
| litellm/proxy/proxy_server.py | Overrides media_type to text/event-stream for SSE audio speech requests; adds new /v3/login and /v3/login/exchange control-plane endpoints with a str(None) credential handling bug; adds worker registry, audit log callbacks, and control-plane Redis warning. |
Sequence Diagram
sequenceDiagram
participant Client
participant ProxyServer as proxy_server.py<br/>(audio_speech)
participant Main as main.py<br/>(speech)
participant OpenAI
Client->>ProxyServer: POST /v1/audio/speech<br/>{ model, input, voice,<br/> stream_format: "sse" }
ProxyServer->>Main: litellm.speech(**data)
Note over Main: kwargs.get("stream_format")<br/>→ optional_params["stream_format"] = "sse"
Main->>OpenAI: TTS request<br/>with stream_format="sse"
OpenAI-->>Main: SSE audio stream
Main-->>ProxyServer: HttpxBinaryResponseContent
Note over ProxyServer: data["stream_format"] == "sse"<br/>→ media_type = "text/event-stream"
ProxyServer-->>Client: StreamingResponse<br/>Content-Type: text/event-stream
Reviews (1): Last reviewed commit: "fix: forward stream_format param in audi..." | Re-trigger Greptile
| if data.get("stream_format") == "sse": | ||
| media_type = "text/event-stream" |
There was a problem hiding this comment.
SSE check unconditionally overrides Gemini
audio/wav media type
The stream_format == "sse" block runs after the Gemini-specific audio/wav assignment, so a Gemini TTS request that also carries stream_format=sse would return Content-Type: text/event-stream instead of audio/wav. Gemini TTS doesn't currently support SSE, so this will likely mislead any client that inspects the content type.
Consider guarding the SSE override to non-Gemini providers:
| if data.get("stream_format") == "sse": | |
| media_type = "text/event-stream" | |
| if data.get("stream_format") == "sse" and "gemini" not in (data.get("model") or "").lower(): | |
| media_type = "text/event-stream" |
There was a problem hiding this comment.
intentional — if the user explicitly passes stream_format=sse, they want SSE content type. Gemini TTS doesn't support SSE, so that param shouldn't be set for Gemini requests in the first place. if a guard is needed, it belongs in the caller, not here.
| stream_format = kwargs.get("stream_format") | ||
| if stream_format is not None: | ||
| optional_params["stream_format"] = stream_format |
There was a problem hiding this comment.
No automated test for the core
stream_format fix
The PR description mentions manual verification only. There are no automated tests that assert stream_format reaches optional_params or that the proxy returns Content-Type: text/event-stream for SSE requests. A unit/integration test would prevent this from regressing silently.
Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)
There was a problem hiding this comment.
agreed, though the TTS codepath needs real provider credentials to test end-to-end. happy to add a unit test for the param forwarding if a mock pattern exists in the test suite.
| body = await request.json() | ||
| username = str(body.get("username")) | ||
| password = str(body.get("password")) |
There was a problem hiding this comment.
str(None) produces "None" when credentials are absent
If username or password is not present in the request body, body.get("username") returns None, and str(None) produces the literal string "None". This value is then passed to authenticate_user, which attempts to match against "None" rather than returning a clear 400 error for missing credentials. A guard clause that validates both fields are non-null strings before conversion would prevent this edge case.
There was a problem hiding this comment.
pre-existing issue, not part of this change.
Summary
Fixes #24301.
Root cause:
stream_formatfrom the request body never reachedoptional_params, so the OpenAI speech call ignored it. The proxy always returnedaudio/mpegregardless.Fix: Forward
stream_formatintooptional_paramsafter provider mapping, and setContent-Typetotext/event-streamwhenstream_formatissse.Changes
litellm/main.py: passstream_formatfrom kwargs intooptional_paramslitellm/proxy/proxy_server.py: overridemedia_typetotext/event-streamfor SSE requestsTesting
optional_params