fix(claude): enable interleaved-thinking beta, decode AMP error gzip, fix budget 400 by Blue-B · Pull Request #1940 · router-for-me/CLIProxyAPI

Blue-B · 2026-03-07T14:39:41Z

Three small fixes found while running Claude models through the proxy

1. interleaved-thinking beta header missing
applyClaudeHeaders never adds the interleaved-thinking-2025-05-14 beta flag, so thinking blocks silently disappear. Added a guard that appends it when absent.

2. AMP proxy swallows gzip-encoded error bodies
ModifyResponse skipped non-2xx responses, so 4xx/5xx bodies with hidden gzip encoding reached the client garbled. Removed the status-code guard.

3. normalizeClaudeBudget leaves impossible budget → 400
When adjustedBudget < minBudget, the old code returned the body unchanged, which the API rejects. Now it sets max_tokens = budgetTokens + 1 so the budget stays valid.

…oding, normalizeClaudeBudget max_tokens 1. Always include interleaved-thinking-2025-05-14 beta header so that thinking blocks are returned correctly for all Claude models. 2. Remove status-code guard in AMP reverse proxy ModifyResponse so that error responses (4xx/5xx) with hidden gzip encoding are decoded properly — prevents garbled error messages reaching the client. 3. In normalizeClaudeBudget, when the adjusted budget falls below the model minimum, set max_tokens = budgetTokens+1 instead of leaving the request unchanged (which causes a 400 from the API).

gemini-code-assist · 2026-03-07T14:39:56Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses three distinct issues related to Claude model interactions and the AMP proxy. It ensures proper application of a critical Claude beta header, improves error handling for gzip-encoded responses in the AMP proxy, and corrects budget normalization logic for Claude requests to prevent API rejection. These changes collectively enhance the reliability and functionality of the Claude integration.

Highlights

Claude Interleaved Thinking Beta: Ensured the 'interleaved-thinking-2025-05-14' beta header is always applied to Claude requests, preventing thinking blocks from silently disappearing.
AMP Proxy Error Handling: Modified the AMP proxy to correctly decode gzip-encoded error responses (4xx/5xx), preventing garbled error messages from reaching the client.
Claude Budget Normalization: Fixed an issue in 'normalizeClaudeBudget' where an adjusted budget below the minimum would result in an invalid request (400 error); now, 'max_tokens' is set to 'budgetTokens + 1' to ensure a valid budget.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

internal/api/modules/amp/proxy.go
- Removed the status code check from the 'ModifyResponse' function, allowing gzip decoding for all HTTP response statuses, including errors.
internal/runtime/executor/claude_executor.go
- Added logic to 'applyClaudeHeaders' to append the 'interleaved-thinking-2025-05-14' beta flag if it is not already present in the 'baseBetas' string.
internal/thinking/provider/claude/apply.go
- Modified 'normalizeClaudeBudget' to explicitly set 'max_tokens' to 'budgetTokens + 1' when the 'adjustedBudget' falls below 'minBudget', instead of returning the unchanged body.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces three fixes for Claude models, enabling the interleaved-thinking beta flag, fixing gzip decoding for AMP proxy error responses, and resolving a budget normalization issue. While these changes address the described functional issues, the fix for gzip-encoded error bodies in the AMP proxy introduces a potential security risk. By removing the status code guard, memory-intensive decompression logic is now exposed to all response types without enforcing size limits, which could lead to Denial of Service (DoS) via memory exhaustion. It is recommended to implement size limits on all response body processing. Additionally, a suggestion is made in internal/runtime/executor/claude_executor.go to improve the robustness of beta flag handling for consistency.

I am having trouble creating individual review comments. Click here to see my feedback.

internal/api/modules/amp/proxy.go (111-114)

The removal of the status code check here allows the proxy to attempt decompression of gzip-encoded bodies for all responses, including error responses (4xx/5xx). While this fixes the issue of garbled error bodies reaching the client, the underlying decompression logic (specifically the io.ReadAll calls on lines 132 and 155) does not enforce any size limits. This exposes the proxy to a Denial of Service (DoS) attack via memory exhaustion if a malicious or compromised upstream sends an extremely large response or a 'gzip bomb'.

By removing this guard, the attack surface is increased to include error paths. To mitigate this risk, you should use io.LimitReader to enforce a maximum acceptable size for the response body before reading it into memory.

internal/runtime/executor/claude_executor.go (835-837)

While this correctly adds the interleaved-thinking beta flag, using strings.Contains for this check is fragile. It could lead to false positives if another beta flag contains this substring (e.g., a hypothetical disable-interleaved-thinking flag). A more robust approach is to parse the comma-separated string into a set of strings and check for the exact beta flag. This pattern is already used for extraBetas later in this function.

Adopting a set-based approach for all beta flag manipulations would improve correctness and maintainability. It would also resolve a minor inconsistency where the oauth check is performed on the original val while this check is on the potentially modified baseBetas.

luispater

Thanks for the fixes. The Claude header change and the AMP gzip handling look good, but I think the budget normalization change still has a correctness gap.

Blocking:

In internal/thinking/provider/claude/apply.go, when adjustedBudget < minBudget the new branch sets max_tokens to budgetTokens + 1. That can exceed the model's completion limit on models where Thinking.Max is higher than MaxCompletionTokens (for example claude-sonnet-4-5-20250929 has Thinking.Max = 128000 but MaxCompletionTokens = 64000).
In that case we still return an invalid request, just with a different shape. I think this path needs to clamp against the model completion max or fall back to reducing/disabling thinking instead of unconditionally raising max_tokens.

Test plan:

Reviewed the diff and related model definitions.
Ran go test ./internal/thinking/... ./internal/api/modules/amp/... ./internal/runtime/executor/....

When adjustedBudget < minBudget, the previous fix blindly set max_tokens = budgetTokens+1 which could exceed MaxCompletionTokens. Now: cap max_tokens at MaxCompletionTokens, recalculate budget, and disable thinking entirely if constraints are unsatisfiable. Add unit tests covering raise, clamp, disable, and no-op scenarios.

Blue-B · 2026-03-09T13:14:01Z

Addressed the blocking feedback — pushed a fix:

normalizeClaudeBudget now clamps max_tokens to MaxCompletionTokens before raising it. If even after clamping the budget still falls below minBudget, thinking is disabled entirely (thinking key removed).

Also added unit tests covering:

Normal raise (budgetTokens+1 within model limit)
Clamp to model max (budgetTokens+1 > MaxCompletionTokens)
Disable thinking (unsatisfiable constraints)
No-op (budget already valid)
Budget adjustment to max_tokens-1

Please take another look when you get a chance.

gemini-code-assist bot reviewed Mar 7, 2026

View reviewed changes

luispater requested changes Mar 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(claude): enable interleaved-thinking beta, decode AMP error gzip, fix budget 400#1940

fix(claude): enable interleaved-thinking beta, decode AMP error gzip, fix budget 400#1940
Blue-B wants to merge 2 commits intorouter-for-me:mainfrom
Blue-B:fix/claude-interleaved-thinking-amp-gzip-budget

Blue-B commented Mar 7, 2026

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

luispater left a comment

Uh oh!

Blue-B commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Blue-B commented Mar 7, 2026

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

internal/api/modules/amp/proxy.go (111-114)

internal/runtime/executor/claude_executor.go (835-837)

Uh oh!

luispater left a comment

Choose a reason for hiding this comment

Uh oh!

Blue-B commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants