-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
System Info
Using Gemini Inference Provider
Information
- The official example scripts
- My own modified scripts
🐛 Describe the bug
Bug is described here: https://discuss.ai.google.dev/t/endpoint-https-generativelanguage-googleapis-com-v1beta-openai-chat-completions-is-not-compliant-with-api-specs/127400
Workarounds suggested:
-
For Custom Implementations: Modify your stream-processing loop to not accumulate the usage field. Instead, only capture the usage data from the very last chunk (or the one where finish_reason is not null).
-
For Effect-TS Users: This specific bug was fixed in version 3.14.1 of @effect/ai-openai. You should update to the latest version to handle "arbitrary length StreamChunkParts."
-
For OpenAI SDK Users: Some developers have found that setting stream_options: { include_usage: false } and instead using a separate token_count metadata call is safer until Google aligns its endpoint with the spec.
Error logs
n/a
Expected behavior
Only return usage tokens in the last chunk. dont overcount from each chunk or return usage in each chunk