Skip to content

Commit f9d4d6d

Browse files
committed
Better defaults for openai/anthropic/google/bedrock providers
Enable thinking and interleaved thinking by default with a medium/dynamic thinking budget For the bedrock provider, the defaults only apply to anthropic models Signed-off-by: Christopher Petito <chrisjpetito@gmail.com>
1 parent 89c6d98 commit f9d4d6d

File tree

3 files changed

+332
-31
lines changed

3 files changed

+332
-31
lines changed

docs/USAGE.md

Lines changed: 87 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -283,9 +283,11 @@ models:
283283
284284
Determine how much the model should think by setting the `thinking_budget`
285285

286-
- **OpenAI**: use effort levels — `minimal`, `low`, `medium`, `high`
287-
- **Anthropic**: set an integer token budget. Range is 1024–32768; must be strictly less than `max_tokens`.
288-
- **Google (Gemini)**: set an integer token budget. `0` -> disable thinking, `-1` -> dynamic thinking (model decides). Most models: 0–24576 tokens. Gemini 2.5 Pro: 128–32768 tokens (and cannot disable thinking).
286+
- **OpenAI**: use effort levels — `minimal`, `low`, `medium`, `high`. Default: `medium`
287+
- **Anthropic**: set an integer token budget. Range is 1024–32768; must be strictly less than `max_tokens`. Default: `8192` with `interleaved_thinking: true`
288+
- **Google (Gemini 2.5)**: set an integer token budget. `0` -> disable thinking, `-1` -> dynamic thinking (model decides). Default: `-1` (dynamic)
289+
- **Google (Gemini 3)**: use effort levels — `minimal` (Flash only), `low`, `medium`, `high`. Default: `high` for Pro, `medium` for Flash
290+
- **Amazon Bedrock (Claude models)**: set an integer token budget, same as Anthropic. Default: `8192` with `interleaved_thinking: true`
289291

290292
Examples (OpenAI):
291293

@@ -317,7 +319,7 @@ agents:
317319
instruction: you are a helpful assistant that doesn't think very much
318320
```
319321

320-
Examples (Google):
322+
Examples (Google Gemini 2.5 - token-based):
321323

322324
```yaml
323325
models:
@@ -329,7 +331,7 @@ models:
329331
gemini-dynamic:
330332
provider: google
331333
model: gemini-2.5-flash
332-
thinking_budget: -1 # Dynamic thinking (model decides)
334+
thinking_budget: -1 # Dynamic thinking (model decides) - this is the default
333335
334336
gemini-fixed:
335337
provider: google
@@ -342,29 +344,101 @@ agents:
342344
instruction: you are a helpful assistant
343345
```
344346

345-
#### Interleaved Thinking (Anthropic)
347+
Examples (Google Gemini 3 - level-based):
346348

347-
Anthropic's interleaved thinking feature uses the Beta Messages API to provide tool calling during model reasoning. You can control this behavior using the `interleaved_thinking` provider option:
349+
```yaml
350+
models:
351+
# Gemini 3 Pro: supports "low" and "high" levels
352+
gemini-3-pro-high:
353+
provider: google
354+
model: gemini-3-pro
355+
thinking_budget: high # Default for Pro models
356+
357+
gemini-3-pro-low:
358+
provider: google
359+
model: gemini-3-pro
360+
thinking_budget: low
361+
362+
# Gemini 3 Flash: supports "minimal", "low", "medium", "high" levels
363+
gemini-3-flash-medium:
364+
provider: google
365+
model: gemini-3-flash
366+
thinking_budget: medium # Default for Flash models
367+
368+
gemini-3-flash-minimal:
369+
provider: google
370+
model: gemini-3-flash
371+
thinking_budget: minimal
372+
373+
agents:
374+
root:
375+
model: gemini-3-pro-high
376+
instruction: you are a helpful assistant
377+
```
378+
379+
Examples (Amazon Bedrock Claude):
380+
381+
```yaml
382+
models:
383+
bedrock-claude:
384+
provider: amazon-bedrock
385+
model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
386+
# thinking_budget defaults to 8192 and interleaved_thinking defaults to true for Claude models
387+
provider_opts:
388+
region: us-east-1
389+
390+
bedrock-claude-custom:
391+
provider: amazon-bedrock
392+
model: anthropic.claude-sonnet-4-20250514-v1:0
393+
thinking_budget: 16384 # Override default
394+
provider_opts:
395+
region: eu-west-1
396+
interleaved_thinking: true
397+
398+
agents:
399+
root:
400+
model: bedrock-claude
401+
instruction: you are a helpful assistant
402+
```
403+
404+
#### Interleaved Thinking (Anthropic and Bedrock Claude)
405+
406+
Anthropic's interleaved thinking feature uses the Beta Messages API to provide tool calling during model reasoning. This is now enabled by default for both `anthropic` and `amazon-bedrock` (Claude models) providers. You can control this behavior using the `interleaved_thinking` provider option:
348407

349408
```yaml
350409
models:
351410
claude:
352411
provider: anthropic
353412
model: claude-sonnet-4-5-20250929
354-
thinking_budget: 8192 # Optional: defaults to 16384 when interleaved thinking is enabled
413+
# thinking_budget defaults to 8192
414+
# interleaved_thinking defaults to true
355415
provider_opts:
356-
interleaved_thinking: true # Enable interleaved thinking (default: false)
416+
interleaved_thinking: false # Disable if needed
417+
418+
bedrock-claude:
419+
provider: amazon-bedrock
420+
model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
421+
# thinking_budget defaults to 8192 for Claude models
422+
# interleaved_thinking defaults to true for Claude models
423+
provider_opts:
424+
region: us-east-1
425+
interleaved_thinking: false # Disable if needed
357426
```
358427

359428
Notes:
360429

361-
- **OpenAI**: If an invalid effort value is set, the request will fail with a clear error
362-
- **Anthropic**: Values < 1024 or ≥ `max_tokens` are ignored (warning logged). When `interleaved_thinking` is enabled,
363-
Docker `cagent` uses Anthropic's Beta Messages API with a default thinking budget of 16384 tokens if not specified
364-
- **Google**:
430+
- **OpenAI**: If an invalid effort value is set, the request will fail with a clear error. Default: `medium`
431+
- **Anthropic**: Values < 1024 or ≥ `max_tokens` are ignored (warning logged). Default: `thinking_budget: 8192` with `interleaved_thinking: true`
432+
- **Amazon Bedrock (Claude)**: Same behavior as Anthropic. Non-Claude Bedrock models are not affected by defaults
433+
- **Google (Gemini 2.5)**:
365434
- Most models support values between -1 and 24576 tokens. Set to `0` to disable, `-1` for dynamic thinking
366435
- Gemini 2.5 Pro: supports 128–32768 tokens. Cannot be disabled (minimum 128)
367436
- Gemini 2.5 Flash-Lite: supports 512–24576 tokens. Set to `0` to disable, `-1` for dynamic thinking
437+
- Default: `-1` (dynamic thinking)
438+
- **Google (Gemini 3)**:
439+
- Uses effort levels instead of token budgets: `minimal` (Flash only), `low`, `medium`, `high`
440+
- Gemini 3 Pro default: `high`
441+
- Gemini 3 Flash default: `medium`
368442
- For unsupported providers, `thinking_budget` has no effect
369443
- Debug logs include the applied effort (e.g., "OpenAI request using thinking_budget", "Gemini request using thinking_budget")
370444

pkg/model/provider/gemini/client.go

Lines changed: 85 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -308,28 +308,20 @@ func (c *Client) buildConfig() *genai.GenerateContentConfig {
308308
config.PresencePenalty = genai.Ptr(float32(*c.ModelConfig.PresencePenalty))
309309
}
310310

311-
// Apply thinking budget for Gemini models using token-based configuration.
311+
// Apply thinking configuration for Gemini models.
312312
// Per official docs: https://ai.google.dev/gemini-api/docs/thinking
313+
//
314+
// Gemini 2.5 models use token-based configuration (thinkingBudget):
313315
// - Set thinkingBudget to 0 to disable thinking
314316
// - Set thinkingBudget to -1 for dynamic thinking (model decides)
315-
// - Set to a specific value for a fixed token budget,
316-
// maximum is 24576 for all models except Gemini 2.5 Pro (max 32768)
317+
// - Set to a specific value for a fixed token budget
318+
// (max 24576 for most models, 32768 for Gemini 2.5 Pro)
319+
//
320+
// Gemini 3 models use level-based configuration (thinkingLevel):
321+
// - Gemini 3 Pro: "low", "high"
322+
// - Gemini 3 Flash: "minimal", "low", "medium", "high"
317323
if c.ModelConfig.ThinkingBudget != nil {
318-
if config.ThinkingConfig == nil {
319-
config.ThinkingConfig = &genai.ThinkingConfig{}
320-
}
321-
config.ThinkingConfig.IncludeThoughts = true
322-
tokens := c.ModelConfig.ThinkingBudget.Tokens
323-
config.ThinkingConfig.ThinkingBudget = genai.Ptr(int32(tokens))
324-
325-
switch tokens {
326-
case 0:
327-
slog.Debug("Gemini request with thinking disabled", "budget_tokens", tokens)
328-
case -1:
329-
slog.Debug("Gemini request with dynamic thinking", "budget_tokens", tokens)
330-
default:
331-
slog.Debug("Gemini request using thinking_budget", "budget_tokens", tokens)
332-
}
324+
c.applyThinkingConfig(config)
333325
}
334326

335327
if structuredOutput := c.ModelOptions.StructuredOutput(); structuredOutput != nil {
@@ -340,6 +332,81 @@ func (c *Client) buildConfig() *genai.GenerateContentConfig {
340332
return config
341333
}
342334

335+
// applyThinkingConfig applies the appropriate thinking configuration based on model type.
336+
func (c *Client) applyThinkingConfig(config *genai.GenerateContentConfig) {
337+
if config.ThinkingConfig == nil {
338+
config.ThinkingConfig = &genai.ThinkingConfig{}
339+
}
340+
config.ThinkingConfig.IncludeThoughts = true
341+
342+
model := strings.ToLower(c.ModelConfig.Model)
343+
344+
// Gemini 3 models use ThinkingLevel (effort-based)
345+
if strings.HasPrefix(model, "gemini-3-") {
346+
c.applyGemini3ThinkingLevel(config)
347+
return
348+
}
349+
350+
// Gemini 2.5 and other models use ThinkingBudget (token-based)
351+
c.applyGemini25ThinkingBudget(config)
352+
}
353+
354+
// applyGemini3ThinkingLevel applies level-based thinking for Gemini 3 models.
355+
func (c *Client) applyGemini3ThinkingLevel(config *genai.GenerateContentConfig) {
356+
effort := strings.ToLower(c.ModelConfig.ThinkingBudget.Effort)
357+
358+
var level genai.ThinkingLevel
359+
switch effort {
360+
case "minimal":
361+
level = genai.ThinkingLevelMinimal
362+
case "low":
363+
level = genai.ThinkingLevelLow
364+
case "medium":
365+
level = genai.ThinkingLevelMedium
366+
case "high":
367+
level = genai.ThinkingLevelHigh
368+
default:
369+
// If effort is not set but tokens are, fall back to token-based config
370+
if c.ModelConfig.ThinkingBudget.Tokens != 0 {
371+
slog.Warn("Gemini 3 models use thinkingLevel, not thinkingBudget tokens; falling back to token-based config",
372+
"model", c.ModelConfig.Model,
373+
"tokens", c.ModelConfig.ThinkingBudget.Tokens,
374+
)
375+
c.applyGemini25ThinkingBudget(config)
376+
return
377+
}
378+
// Default to high if no valid effort specified
379+
level = genai.ThinkingLevelHigh
380+
slog.Debug("Gemini 3 using default thinking level",
381+
"model", c.ModelConfig.Model,
382+
"level", "high",
383+
)
384+
config.ThinkingConfig.ThinkingLevel = level
385+
return
386+
}
387+
388+
config.ThinkingConfig.ThinkingLevel = level
389+
slog.Debug("Gemini 3 request using thinkingLevel",
390+
"model", c.ModelConfig.Model,
391+
"level", effort,
392+
)
393+
}
394+
395+
// applyGemini25ThinkingBudget applies token-based thinking for Gemini 2.5 and other models.
396+
func (c *Client) applyGemini25ThinkingBudget(config *genai.GenerateContentConfig) {
397+
tokens := c.ModelConfig.ThinkingBudget.Tokens
398+
config.ThinkingConfig.ThinkingBudget = genai.Ptr(int32(tokens))
399+
400+
switch tokens {
401+
case 0:
402+
slog.Debug("Gemini request with thinking disabled", "budget_tokens", tokens)
403+
case -1:
404+
slog.Debug("Gemini request with dynamic thinking", "budget_tokens", tokens)
405+
default:
406+
slog.Debug("Gemini request using thinking_budget", "budget_tokens", tokens)
407+
}
408+
}
409+
343410
// convertToolsToGemini converts tools to Gemini format
344411
func convertToolsToGemini(requestTools []tools.Tool) ([]*genai.Tool, error) {
345412
if len(requestTools) == 0 {

0 commit comments

Comments
 (0)