Billing for streaming requests #10372

psydok · 2025-04-27T19:40:05Z

psydok
Apr 27, 2025

How do I get the cost of a streaming request during the execution of a request through litellm?
How do I get the cost of streaming requests if I make requests through litellm to openrouter?
Does the cost of requests received through cost_completion correspond to the real cost of the request (openrouter/openai)?

I searched for information in issues, in discussions and in documentation. It feels like cost_completion streaming requests are not charged in any way. But if you try to get the cost through a callback, then there is a cost. But I haven't figured out how callbacks work yet. Can you tell me where the feature that makes the callback work is implemented? I want to understand how response_cost is calculated there.

In code documentation, cost_completion only works with regular requests, I haven't found what to do with streaming (even if each chunk is sent separately, it doesn't work).

matannahmani · 2025-04-28T18:56:38Z

matannahmani
Apr 28, 2025

callback is your best bet, but:

How do I get the cost of streaming requests if I make requests through litellm to openrouter?
is blocked, litellm does not support with acompletion + stream any way to get openrouter usage, or even the generation id correctly.

@krrishdholakia

2 replies

krrishdholakia Apr 28, 2025
Maintainer

litellm does not support with acompletion + stream any way to get openrouter usage, or even the generation id correctly.

What do you mean? @matannahmani

krrishdholakia Apr 28, 2025
Maintainer

Is the point that the chunk id, and openrouter usage isn't respected?

matannahmani · 2025-04-29T03:36:02Z

matannahmani
Apr 29, 2025

Well OpenRouter has two ways to get accurate billing:

using the generationId (basically the response id that they include) and calling with the API key + response id their api the id is included as part of the response stream but litellm set it's own id and override open router internal id (I could not find anyway to get openrouter attached id in the callbacks)
using openrouter new {usage: true} flag, it will then include in the last chunk (usage chunk) that includes the price and accurate token estimates, this again does not arrive to the callback and no easy way to pull it. their docs

Happy to collaborate on this, we at Kodu have migrated our entire agent stack to be built on top of litellm but now we are pretty much stuck because we can't accurately display the token usage in our vscode extension nor we can accurately bill the user due to cost and token estimates miss match (on openrouter and gemini 2.5 when prompt caching is enabled)

cc @krrishdholakia

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Billing for streaming requests #10372

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Billing for streaming requests #10372

Uh oh!

psydok Apr 27, 2025

Replies: 2 comments · 2 replies

Uh oh!

matannahmani Apr 28, 2025

Uh oh!

krrishdholakia Apr 28, 2025 Maintainer

Uh oh!

krrishdholakia Apr 28, 2025 Maintainer

Uh oh!

Uh oh!

matannahmani Apr 29, 2025

psydok
Apr 27, 2025

Replies: 2 comments 2 replies

matannahmani
Apr 28, 2025

krrishdholakia Apr 28, 2025
Maintainer

krrishdholakia Apr 28, 2025
Maintainer

matannahmani
Apr 29, 2025