fix: litellm async calls with stream reports token usage and span status #2480

GustavoCaso · 2025-11-26T18:39:34Z

While testing locally Arize, I noticed that when using the acompletion function with stream=True, stream_options={"include_usage": True} options, the reported traces did not included the count token, and the status was undefined.

I chekced the code and noticed that the token count issue, was that we pass the usage_stats object to the _set_token_counts_from_usage directly. When the function checks if the object has the usage attribute, it return false and exit. Causing the span to not report token count.

I modified the code so the object includes the usage attribute, and called the _set_span_status function.

After those changes the token count and the status is reported:

Note

Ensure acompletion/completion streaming reports token counts and sets span status, and add tests covering async streaming and usage reporting.

Instrumentation (LiteLLM):
- Wrap streaming usage with SimpleNamespace(usage=...) before calling _set_token_counts_from_usage in _finalize_sync_streaming_span and _finalize_streaming_span.
- Set span status via _set_span_status(span, aggregated_output) after async streaming completes.
Tests:
- Add async streaming tests for acompletion validating output value, status OK, and token counts when stream_options={"include_usage": True}.
- Cover context attributes in both standard and usage-included async streaming paths.

^{Written by Cursor Bugbot for commit 78956e8. This will update automatically on new commits. Configure here.}

github-actions · 2025-11-26T18:39:44Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

GustavoCaso · 2025-11-26T18:40:11Z

I have read the CLA Document and I hereby sign the CLA

GustavoCaso · 2025-12-04T07:21:53Z

@codefromthecrypt could I get some feedback for theses changes 😄 ?

codefromthecrypt · 2025-12-04T08:24:36Z

I don't have workflow approval, but I think the main step is to add unit test, or another row in the unit test, for this case.

Main thing is you need to verify the changes are needed, esp the status thing. so you can look at litellm's existing test_completion_sync_streaming which needs to be ported for async

there is some code in litellm for async, but basically always the openai code is in the best shape, as that's the most used. you can steal patterns from this python/instrumentation/openinference-instrumentation-openai/tests/openinference/instrumentation/openai/test_instrumentor.py

then you need to run tests the same way they run in CI

codefromthecrypt · 2025-12-04T11:00:15Z

ps one tip is that to run the tests, enter the python directory and do uvx --with tox-uv tox run -e py313-ci-litellm this will be the main thing that will ensure CI is happy

…pan status

GustavoCaso · 2025-12-07T10:29:34Z

Thanks for the feedback @codefromthecrypt.

I added tests 😄 and validated that are passing using uvx --with tox-uv tox run -e py313-ci-litellm

GustavoCaso · 2025-12-07T10:32:42Z

@caroger would it be possible to get some feedback on these changes.

Thanks in advance 😄

hewliyang · 2026-01-03T15:19:16Z

Bump @caroger. This is a pretty straightforward bug affecting streaming sync completions as well.

Here's a monkeypatch for reference.

def _patch_litellm_streaming_token_counts() -> None:
    """
    Patch openinference-instrumentation-litellm to fix streaming token counts.

    The library has a bug where streaming handlers pass `litellm.Usage` directly to
    `_set_token_counts_from_usage`, but that function expects an object WITH a `.usage`
    attribute. This causes token counts to be missing for all streaming responses.

    Fix pending upstream: https://github.com/Arize-ai/openinference/pull/2480
    """
    try:
        import openinference.instrumentation.litellm as litellm_instr
        from litellm.types.utils import Usage

        original_set_token_counts = litellm_instr._set_token_counts_from_usage

        def patched_set_token_counts(span, result):
            # Streaming handlers pass Usage directly, but function expects object with .usage
            if isinstance(result, Usage):
                result = SimpleNamespace(usage=result)
            return original_set_token_counts(span, result)

        litellm_instr._set_token_counts_from_usage = patched_set_token_counts
    except Exception:
        pass

GustavoCaso requested a review from a team as a code owner November 26, 2025 18:39

github-project-automation bot added this to Instrumentation Nov 26, 2025

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Nov 26, 2025

github-actions bot added a commit that referenced this pull request Nov 26, 2025

@GustavoCaso has signed the CLA in #2480

4e1d772

GustavoCaso changed the title ~~ensure litellm async calls with strem reports token usage and span status~~ fix: litellm async calls with strem reports token usage and span status Nov 26, 2025

GustavoCaso added 2 commits December 7, 2025 11:28

fix: ensure litellm async calls with stream reports token usage and s…

f8b695f

…pan status

add tests

78956e8

GustavoCaso force-pushed the gustavocaso/litellm-stream-async-token-count-and-status branch from 8d4c769 to 78956e8 Compare December 7, 2025 10:28

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Dec 7, 2025

GustavoCaso changed the title ~~fix: litellm async calls with strem reports token usage and span status~~ fix: litellm async calls with stream reports token usage and span status Dec 7, 2025

caroger self-requested a review December 10, 2025 18:18

caroger approved these changes Jan 8, 2026

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 8, 2026

caroger merged commit fce93ef into Arize-ai:main Jan 8, 2026
13 of 15 checks passed

github-project-automation bot moved this to Done in Instrumentation Jan 8, 2026

github-actions bot mentioned this pull request Jan 8, 2026

chore: release main #2596

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: litellm async calls with stream reports token usage and span status #2480

fix: litellm async calls with stream reports token usage and span status #2480

Uh oh!

GustavoCaso commented Nov 26, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

GustavoCaso commented Nov 26, 2025

Uh oh!

GustavoCaso commented Dec 4, 2025

Uh oh!

codefromthecrypt commented Dec 4, 2025

Uh oh!

codefromthecrypt commented Dec 4, 2025

Uh oh!

GustavoCaso commented Dec 7, 2025 •

edited

Loading

Uh oh!

GustavoCaso commented Dec 7, 2025

Uh oh!

hewliyang commented Jan 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: litellm async calls with stream reports token usage and span status #2480

fix: litellm async calls with stream reports token usage and span status #2480

Uh oh!

Conversation

GustavoCaso commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GustavoCaso commented Nov 26, 2025

Uh oh!

GustavoCaso commented Dec 4, 2025

Uh oh!

codefromthecrypt commented Dec 4, 2025

Uh oh!

codefromthecrypt commented Dec 4, 2025

Uh oh!

GustavoCaso commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GustavoCaso commented Dec 7, 2025

Uh oh!

hewliyang commented Jan 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GustavoCaso commented Nov 26, 2025 •

edited

Loading

github-actions bot commented Nov 26, 2025 •

edited

Loading

GustavoCaso commented Dec 7, 2025 •

edited

Loading