Skip to content

Conversation

@GustavoCaso
Copy link
Contributor

@GustavoCaso GustavoCaso commented Nov 26, 2025

While testing locally Arize, I noticed that when using the acompletion function with stream=True, stream_options={"include_usage": True} options, the reported traces did not included the count token, and the status was undefined.

I chekced the code and noticed that the token count issue, was that we pass the usage_stats object to the _set_token_counts_from_usage directly. When the function checks if the object has the usage attribute, it return false and exit. Causing the span to not report token count.

Screenshot 2025-11-26 at 19 22 51

I modified the code so the object includes the usage attribute, and called the _set_span_status function.

After those changes the token count and the status is reported:

Screenshot 2025-11-26 at 19 25 23

Note

Ensure acompletion/completion streaming reports token counts and sets span status, and add tests covering async streaming and usage reporting.

  • Instrumentation (LiteLLM):
    • Wrap streaming usage with SimpleNamespace(usage=...) before calling _set_token_counts_from_usage in _finalize_sync_streaming_span and _finalize_streaming_span.
    • Set span status via _set_span_status(span, aggregated_output) after async streaming completes.
  • Tests:
    • Add async streaming tests for acompletion validating output value, status OK, and token counts when stream_options={"include_usage": True}.
    • Cover context attributes in both standard and usage-included async streaming paths.

Written by Cursor Bugbot for commit 78956e8. This will update automatically on new commits. Configure here.

@GustavoCaso GustavoCaso requested a review from a team as a code owner November 26, 2025 18:39
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Nov 26, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 26, 2025

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@GustavoCaso
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Nov 26, 2025
@GustavoCaso GustavoCaso changed the title ensure litellm async calls with strem reports token usage and span status fix: litellm async calls with strem reports token usage and span status Nov 26, 2025
@GustavoCaso
Copy link
Contributor Author

@codefromthecrypt could I get some feedback for theses changes 😄 ?

@codefromthecrypt
Copy link
Contributor

I don't have workflow approval, but I think the main step is to add unit test, or another row in the unit test, for this case.

Main thing is you need to verify the changes are needed, esp the status thing. so you can look at litellm's existing test_completion_sync_streaming which needs to be ported for async

there is some code in litellm for async, but basically always the openai code is in the best shape, as that's the most used. you can steal patterns from this python/instrumentation/openinference-instrumentation-openai/tests/openinference/instrumentation/openai/test_instrumentor.py

then you need to run tests the same way they run in CI

@codefromthecrypt
Copy link
Contributor

ps one tip is that to run the tests, enter the python directory and do uvx --with tox-uv tox run -e py313-ci-litellm this will be the main thing that will ensure CI is happy

@GustavoCaso GustavoCaso force-pushed the gustavocaso/litellm-stream-async-token-count-and-status branch from 8d4c769 to 78956e8 Compare December 7, 2025 10:28
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Dec 7, 2025
@GustavoCaso
Copy link
Contributor Author

GustavoCaso commented Dec 7, 2025

Thanks for the feedback @codefromthecrypt.

I added tests 😄 and validated that are passing using uvx --with tox-uv tox run -e py313-ci-litellm

@GustavoCaso GustavoCaso changed the title fix: litellm async calls with strem reports token usage and span status fix: litellm async calls with stream reports token usage and span status Dec 7, 2025
@GustavoCaso
Copy link
Contributor Author

@caroger would it be possible to get some feedback on these changes.

Thanks in advance 😄

@caroger caroger self-requested a review December 10, 2025 18:18
@hewliyang
Copy link

Bump @caroger. This is a pretty straightforward bug affecting streaming sync completions as well.

Here's a monkeypatch for reference.

def _patch_litellm_streaming_token_counts() -> None:
    """
    Patch openinference-instrumentation-litellm to fix streaming token counts.

    The library has a bug where streaming handlers pass `litellm.Usage` directly to
    `_set_token_counts_from_usage`, but that function expects an object WITH a `.usage`
    attribute. This causes token counts to be missing for all streaming responses.

    Fix pending upstream: https://github.com/Arize-ai/openinference/pull/2480
    """
    try:
        import openinference.instrumentation.litellm as litellm_instr
        from litellm.types.utils import Usage

        original_set_token_counts = litellm_instr._set_token_counts_from_usage

        def patched_set_token_counts(span, result):
            # Streaming handlers pass Usage directly, but function expects object with .usage
            if isinstance(result, Usage):
                result = SimpleNamespace(usage=result)
            return original_set_token_counts(span, result)

        litellm_instr._set_token_counts_from_usage = patched_set_token_counts
    except Exception:
        pass

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 8, 2026
@caroger caroger merged commit fce93ef into Arize-ai:main Jan 8, 2026
13 of 15 checks passed
@github-actions github-actions bot mentioned this pull request Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants