Gemini OpenAI API overcounts tokens in streaming mode

### System Info

Using Gemini Inference Provider

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### 🐛 Describe the bug

Bug is described here: https://discuss.ai.google.dev/t/endpoint-https-generativelanguage-googleapis-com-v1beta-openai-chat-completions-is-not-compliant-with-api-specs/127400

Workarounds suggested:

- For Custom Implementations: Modify your stream-processing loop to not accumulate the usage field. Instead, only capture the usage data from the very last chunk (or the one where finish_reason is not null).

- For Effect-TS Users: This specific bug was fixed in version 3.14.1 of @effect/ai-openai. You should update to the latest version to handle "arbitrary length StreamChunkParts."

- For OpenAI SDK Users: Some developers have found that setting stream_options: { include_usage: false } and instead using a separate token_count metadata call is safer until Google aligns its endpoint with the spec.


### Error logs

n/a

### Expected behavior

Only return usage tokens in the last chunk. dont overcount from each chunk or return usage in each chunk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini OpenAI API overcounts tokens in streaming mode #5122

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gemini OpenAI API overcounts tokens in streaming mode #5122

Description

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions