Skip to content

Commit 236e349

Browse files
Merge pull request #346 from Portkey-AI/chore/gemini-thinking
update docs for gemini thinking
2 parents ad42bb8 + 078af20 commit 236e349

File tree

2 files changed

+48
-5
lines changed

2 files changed

+48
-5
lines changed

integrations/llms/gemini.mdx

Lines changed: 47 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -238,8 +238,19 @@ Grounding is invoked by passing the `google_search` tool (for newer models like
238238
If you mix regular tools with grounding tools, vertex might throw an error saying only one tool can be used at a time.
239239
</Warning>
240240

241-
## thinking models
242241

242+
## Extended Thinking (Reasoning Models) (Beta)
243+
244+
<Note>
245+
The assistants thinking response is returned in the `response_chunk.choices[0].delta.content_blocks` array, not the `response.choices[0].message.content` string.
246+
</Note>
247+
248+
Models like `gemini-2.5-flash-preview-04-17` `gemini-2.5-flash-preview-04-17` support [extended thinking](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#claude-3-7-sonnet).
249+
This is similar to openai thinking, but you get the model's reasoning as it processes the request as well.
250+
251+
Note that you will have to set [`strict_open_ai_compliance=False`](/product/ai-gateway/strict-open-ai-compliance) in the headers to use this feature.
252+
253+
### Single turn conversation
243254
<CodeGroup>
244255
```py Python
245256
from portkey_ai import Portkey
@@ -273,6 +284,16 @@ If you mix regular tools with grounding tools, vertex might throw an error sayin
273284
]
274285
)
275286
print(response)
287+
# in case of streaming responses you'd have to parse the response_chunk.choices[0].delta.content_blocks array
288+
# response = portkey.chat.completions.create(
289+
# ...same config as above but with stream: true
290+
# )
291+
# for chunk in response:
292+
# if chunk.choices[0].delta:
293+
# content_blocks = chunk.choices[0].delta.get("content_blocks")
294+
# if content_blocks is not None:
295+
# for content_block in content_blocks:
296+
# print(content_block)
276297
```
277298
```ts NodeJS
278299
import Portkey from 'portkey-ai';
@@ -307,7 +328,18 @@ If you mix regular tools with grounding tools, vertex might throw an error sayin
307328
]
308329
});
309330
console.log(response);
310-
331+
// in case of streaming responses you'd have to parse the response_chunk.choices[0].delta.content_blocks array
332+
// const response = await portkey.chat.completions.create({
333+
// ...same config as above but with stream: true
334+
// });
335+
// for await (const chunk of response) {
336+
// if (chunk.choices[0].delta?.content_blocks) {
337+
// for (const contentBlock of chunk.choices[0].delta.content_blocks) {
338+
// console.log(contentBlock);
339+
// }
340+
// }
341+
// }
342+
}
311343
// Call the function
312344
getChatCompletionFunctions();
313345
```
@@ -349,6 +381,17 @@ If you mix regular tools with grounding tools, vertex might throw an error sayin
349381
});
350382

351383
console.log(response)
384+
// in case of streaming responses you'd have to parse the response_chunk.choices[0].delta.content_blocks array
385+
// const response = await openai.chat.completions.create({
386+
// ...same config as above but with stream: true
387+
// });
388+
// for await (const chunk of response) {
389+
// if (chunk.choices[0].delta?.content_blocks) {
390+
// for (const contentBlock of chunk.choices[0].delta.content_blocks) {
391+
// console.log(contentBlock);
392+
// }
393+
// }
394+
// }
352395
}
353396
await getChatCompletionFunctions();
354397
```
@@ -421,10 +464,10 @@ If you mix regular tools with grounding tools, vertex might throw an error sayin
421464
</CodeGroup>
422465

423466
<Note>
424-
To disable thinking for gemini models like `google.gemini-2.5-flash-preview-04-17`, you are required to explicitly set `budget_tokens` to `0` and `type` to `disabled`.
467+
To disable thinking for gemini models like `gemini-2.5-flash-preview-04-17`, you are required to explicitly set `budget_tokens` to `0`.
425468
```json
426469
"thinking": {
427-
"type": "disabled",
470+
"type": "enabled",
428471
"budget_tokens": 0
429472
}
430473
```

integrations/llms/vertex-ai.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -265,7 +265,7 @@ curl --location 'https://api.portkey.ai/v1/chat/completions' \
265265
<Note>
266266
The assistants thinking response is returned in the `response_chunk.choices[0].delta.content_blocks` array, not the `response.choices[0].message.content` string.
267267

268-
Gemini models do no return their chain-of-thought-messages, so content_blocks are not required for Gemini models.
268+
Gemini models do not support plugging back the reasoning into multi turn conversations, so you don't need to send the thinking message back to the model.
269269
</Note>
270270

271271
Models like `google.gemini-2.5-flash-preview-04-17` `anthropic.claude-3-7-sonnet@20250219` support [extended thinking](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#claude-3-7-sonnet).

0 commit comments

Comments
 (0)