-
Notifications
You must be signed in to change notification settings - Fork 181
fix: litellm async calls with stream reports token usage and span status #2480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: litellm async calls with stream reports token usage and span status #2480
Conversation
|
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
|
@codefromthecrypt could I get some feedback for theses changes 😄 ? |
|
I don't have workflow approval, but I think the main step is to add unit test, or another row in the unit test, for this case. Main thing is you need to verify the changes are needed, esp the status thing. so you can look at litellm's existing there is some code in litellm for async, but basically always the openai code is in the best shape, as that's the most used. you can steal patterns from this python/instrumentation/openinference-instrumentation-openai/tests/openinference/instrumentation/openai/test_instrumentor.py then you need to run tests the same way they run in CI |
|
ps one tip is that to run the tests, enter the python directory and do |
8d4c769 to
78956e8
Compare
|
Thanks for the feedback @codefromthecrypt. I added tests 😄 and validated that are passing using |
|
@caroger would it be possible to get some feedback on these changes. Thanks in advance 😄 |
|
Bump @caroger. This is a pretty straightforward bug affecting streaming sync completions as well. Here's a monkeypatch for reference. def _patch_litellm_streaming_token_counts() -> None:
"""
Patch openinference-instrumentation-litellm to fix streaming token counts.
The library has a bug where streaming handlers pass `litellm.Usage` directly to
`_set_token_counts_from_usage`, but that function expects an object WITH a `.usage`
attribute. This causes token counts to be missing for all streaming responses.
Fix pending upstream: https://github.com/Arize-ai/openinference/pull/2480
"""
try:
import openinference.instrumentation.litellm as litellm_instr
from litellm.types.utils import Usage
original_set_token_counts = litellm_instr._set_token_counts_from_usage
def patched_set_token_counts(span, result):
# Streaming handlers pass Usage directly, but function expects object with .usage
if isinstance(result, Usage):
result = SimpleNamespace(usage=result)
return original_set_token_counts(span, result)
litellm_instr._set_token_counts_from_usage = patched_set_token_counts
except Exception:
pass |
While testing locally Arize, I noticed that when using the
acompletionfunction withstream=True, stream_options={"include_usage": True}options, the reported traces did not included the count token, and the status was undefined.I chekced the code and noticed that the token count issue, was that we pass the
usage_statsobject to the_set_token_counts_from_usagedirectly. When the function checks if the object has theusageattribute, it return false and exit. Causing the span to not report token count.I modified the code so the object includes the
usageattribute, and called the_set_span_statusfunction.After those changes the token count and the status is reported:
Note
Ensure
acompletion/completionstreaming reports token counts and sets span status, and add tests covering async streaming and usage reporting.usagewithSimpleNamespace(usage=...)before calling_set_token_counts_from_usagein_finalize_sync_streaming_spanand_finalize_streaming_span._set_span_status(span, aggregated_output)after async streaming completes.acompletionvalidating output value, status OK, and token counts whenstream_options={"include_usage": True}.Written by Cursor Bugbot for commit 78956e8. This will update automatically on new commits. Configure here.