Skip to content

fix: ensure role='assistant' in Azure streaming with include_usage#24326

Open
majiayu000 wants to merge 4 commits intoBerriAI:mainfrom
majiayu000:fix/issue-24221-azure-streaming-role-missing
Open

fix: ensure role='assistant' in Azure streaming with include_usage#24326
majiayu000 wants to merge 4 commits intoBerriAI:mainfrom
majiayu000:fix/issue-24221-azure-streaming-role-missing

Conversation

@majiayu000
Copy link
Contributor

Fixes #24221

Relevant issues

Fixes #24221 — LiteLLM proxy doesn't include role for /chat/completions stream=true with Azure OpenAI and stream_options.include_usage=true

What this PR does

Root cause: In streaming_handler.py chunk_creator(), when original_chunk has no choices (Azure's prompt_filter_results chunk) and include_usage=True, the code returns model_response without calling strip_role_from_delta(). This means:

  1. The empty-choices chunk has no role in its delta
  2. __next__/__anext__ then sets sent_first_chunk=True
  3. When the actual first content chunk arrives with role='assistant' from Azure, strip_role_from_delta() sees sent_first_chunk=True and strips the role

Net result: no chunk ever has role='assistant'.

Fix: Call self.strip_role_from_delta(model_response) before returning model_response at line 1559. This is consistent with the other return paths in chunk_creator (lines 895 and 989) that already call strip_role_from_delta.

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Changes

  • litellm/litellm_core_utils/streaming_handler.py: Changed return model_response to return self.strip_role_from_delta(model_response) in the include_usage empty-choices path
  • tests/test_litellm/litellm_core_utils/test_streaming_handler.py: Added test_azure_streaming_role_with_include_usage covering both sync and async iteration with mock Azure chunks

@vercel
Copy link

vercel bot commented Mar 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 22, 2026 5:21am

Request Review

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 21, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing majiayu000:fix/issue-24221-azure-streaming-role-missing (1162925) with main (f5194b5)

Open in CodSpeed

When Azure sends an initial chunk with no choices (prompt_filter_results)
and stream_options.include_usage=True, the chunk was returned without
calling strip_role_from_delta(). This caused sent_first_chunk to be set
True prematurely, so the actual first content chunk had its role stripped.

Call strip_role_from_delta() on the empty-choices return path, consistent
with the other return paths in chunk_creator().

Fixes BerriAI#24221

Signed-off-by: majiayu000 <1835304752@qq.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
Reformat with Black 23.x to match CI's version requirement.
The previous formatting commit used a newer Black version.

Signed-off-by: majiayu000 <1835304752@qq.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
@majiayu000 majiayu000 force-pushed the fix/issue-24221-azure-streaming-role-missing branch from a8582fd to 1162925 Compare March 22, 2026 05:20
@majiayu000 majiayu000 marked this pull request as ready for review March 22, 2026 05:45
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 22, 2026

Greptile Summary

This PR fixes a missing role='assistant' in Azure OpenAI streaming responses when stream_options.include_usage=True by calling self.strip_role_from_delta(model_response) on the early-return path for empty-choices chunks, making it consistent with all other return paths in chunk_creator.

Key changes:

  • streaming_handler.py: One-line fix — return model_responsereturn self.strip_role_from_delta(model_response) in the include_usage empty-choices branch (line 1559). This is consistent with the two other strip_role_from_delta callsites at lines 895 and 989.
  • test_streaming_handler.py: Adds test_azure_streaming_role_with_include_usage (sync + async parameterized) reproducing the exact Azure chunk sequence (prompt_filter_results → first content → finish → usage). All other test-file changes are cosmetic reformatting (Black line-length).
  • audit_logs.py: Purely cosmetic reformatting; zero logic changes.

Correctness notes: model_response_creator() always ensures choices has at least one StreamingChoices entry, so calling strip_role_from_delta on the previously empty-choices path is safe — no IndexError risk. The fix makes the prompt_filter_results chunk the carrier of role='assistant', while the subsequent first real content chunk correctly has its role stripped, which is OpenAI-spec-compliant behavior. The fix has no impact on the final usage-only chunk (comes after content, so sent_first_chunk=True at that point — strip_role_from_delta safely no-ops the pop).

Confidence Score: 4/5

  • Safe to merge — minimal, well-targeted one-line fix with a new mock-only regression test; no network calls, no backwards-incompatible changes.
  • The core fix is a single-line change consistent with existing patterns in the same function. model_response_creator guarantees choices[0] is always present, so no IndexError risk. The only minor concern is that the new test assertion is weaker than ideal (checks "any chunk has role" rather than "first chunk has role, others don't"), but the fix itself is correct and the test does catch the original regression.
  • No files require special attention beyond the minor test-assertion suggestion on test_streaming_handler.py.

Important Files Changed

Filename Overview
litellm/litellm_core_utils/streaming_handler.py One-line fix calling strip_role_from_delta in the include_usage empty-choices path, consistent with other return paths; correctly resolves the Azure prompt_filter_results chunk causing role='assistant' to be skipped.
tests/test_litellm/litellm_core_utils/test_streaming_handler.py Adds test_azure_streaming_role_with_include_usage covering both sync and async paths using pre-canned mock chunks; other changes are purely cosmetic reformatting; test assertion is slightly weaker than ideal (checks any chunk has role, not first chunk).
litellm/proxy/management_helpers/audit_logs.py Purely cosmetic reformatting for line length (Black-style); no logic changes.

Sequence Diagram

sequenceDiagram
    participant Azure as Azure OpenAI
    participant CSW as CustomStreamWrapper
    participant Client as API Consumer

    Note over Azure,Client: stream_options.include_usage=True

    Azure->>CSW: Chunk 1: prompt_filter_results (choices=[])
    Note over CSW: chunk_creator: else branch (no choices)<br/>include_usage=True
    Note over CSW: ✅ FIXED: calls strip_role_from_delta()<br/>sent_first_chunk=False → sets role='assistant', sent_first_chunk=True
    CSW->>Client: {choices:[{delta:{role:'assistant'}}]}

    Azure->>CSW: Chunk 2: first content (role='assistant', content='')
    Note over CSW: strip_role_from_delta()<br/>sent_first_chunk=True → strips role
    CSW->>Client: {choices:[{delta:{content:''}}]}

    Azure->>CSW: Chunk 3: content ('Hello')
    CSW->>Client: {choices:[{delta:{content:'Hello'}}]}

    Azure->>CSW: Chunk 4: finish_reason='stop'
    CSW->>Client: {choices:[{finish_reason:'stop', delta:{}}]}

    Azure->>CSW: Chunk 5: usage chunk (choices=[])
    Note over CSW: include_usage=True, strip_role_from_delta()<br/>sent_first_chunk=True → safe no-op pop
    CSW->>Client: {choices:[…], usage:{…}}
Loading

Last reviewed commit: "style: fix Black for..."

Comment on lines +1980 to +1992
# At least one chunk must have role='assistant' in its delta
has_role = any(
hasattr(c, "choices")
and len(c.choices) > 0
and hasattr(c.choices[0], "delta")
and getattr(c.choices[0].delta, "role", None) == "assistant"
for c in chunks
)
assert (
has_role
), "No chunk contained role='assistant' in delta. " "Chunk deltas: " + str(
[c.choices[0].delta if c.choices else "no choices" for c in chunks]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Consider a stronger assertion for role placement

The current assertion only verifies that at least one chunk in the entire stream carries role='assistant'. After the fix, the role is attached to the empty prompt_filter_results chunk (the first chunk emitted), while the actual first content chunk has its role stripped by strip_role_from_delta. A stricter test would also confirm that the role appears on the correct chunk (the first yielded chunk) and is absent from later content chunks, preventing a future regression where both chunks could accidentally carry the role:

# Verify role appears in exactly the first emitted chunk
assert len(chunks) > 0, "No chunks were yielded"
first_chunk = chunks[0]
assert (
    len(first_chunk.choices) > 0
    and getattr(first_chunk.choices[0].delta, "role", None) == "assistant"
), f"Expected role='assistant' in the first chunk, got: {first_chunk.choices[0].delta if first_chunk.choices else 'no choices'}"

# Verify subsequent chunks do NOT repeat the role
for chunk in chunks[1:]:
    if chunk.choices:
        assert getattr(chunk.choices[0].delta, "role", None) != "assistant", \
            f"Unexpected role='assistant' in non-first chunk: {chunk.choices[0].delta}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: LiteLLM proxy doesn't include "role" for /chat/completions stream=true with Azure OpenAI and stream_options.include_usage=true

1 participant