Is there any way to get token usage count after calling the streaming api of open ai to get response #1769

amitsharma101 · 2025-08-09T18:03:29Z

amitsharma101
Aug 9, 2025

I couldn't find any way to get token usage after calling the streaming api. OpenAI returns the last chunk with the usage data but couldn't find any way to get them using instructor. Here's my code snippet

from pydantic import BaseModel, Field

class MyResponse(BaseModel):
    field_1: str = Field(description="My field 1")

resp = self.client.chat.completions.create_partial(
            model='gpt-4.1-2025-04-1',
            messages=messages,
            response_model=MyResponse,
            temperature=0.25,
            max_tokens=4096,
            frequency_penalty=0.5,
            presence_penalty=0.3,
            stream=True,
            stream_options={"include_usage": True},
        )

Here client is

instructor.from_openai(AsyncOpenAI())

aaravriyer193 · 2025-08-18T12:05:34Z

aaravriyer193
Aug 18, 2025

You’re correct — the OpenAI streaming API does return token usage, but only in the final chunk of the stream. That’s why you don’t see it until the very end. When you call with stream_options={"include_usage": True}, the API will append a last delta that looks something like:

{
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "total_tokens": 579
  }
}

The issue is that when you’re using instructor with create_partial and stream=True, the library is designed to focus entirely on parsing the model’s output into your Pydantic schema (MyResponse). It doesn’t forward the other metadata (like usage, logprobs, etc.) that OpenAI attaches to the stream. So by the time instructor gives you your parsed object, it has effectively discarded the usage information.

That’s why you couldn’t find a way to access it in your current snippet.

If you need token usage, you have two main paths:

Stream manually, then parse
- Use the raw OpenAI client with stream_options={"include_usage": True}.
- Accumulate the streamed text until the final chunk arrives.
- Capture the usage object from that final chunk.
- Once you have the full text, feed it into instructor (or just directly call your Pydantic model with .parse_raw()).
  This way you get both the structured response and the token counts.
Extend instructor’s stream handling
- Instructor wraps chat.completions.create under the hood.
- You could open a PR or locally patch it so that when include_usage=True, the final usage data is preserved and returned together with the parsed model.
- For example, instead of just returning MyResponse, instructor could return (MyResponse, usage) at the end of the stream.

At the moment, there’s no “magic” way to do resp.usage with create_partial(stream=True). Instructor simply isn’t designed to expose that part of the API.

So if token counts are critical for you (e.g. for logging, billing, or token-budgeting), the safest approach is to handle the raw stream yourself and then validate the output with instructor afterwards.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Is there any way to get token usage count after calling the streaming api of open ai to get response #1769

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Is there any way to get token usage count after calling the streaming api of open ai to get response #1769

Uh oh!

amitsharma101 Aug 9, 2025

Replies: 1 comment

Uh oh!

aaravriyer193 Aug 18, 2025

amitsharma101
Aug 9, 2025

aaravriyer193
Aug 18, 2025