fix(claude): enable interleaved-thinking beta, decode AMP error gzip, fix budget 400#1940
Conversation
…oding, normalizeClaudeBudget max_tokens 1. Always include interleaved-thinking-2025-05-14 beta header so that thinking blocks are returned correctly for all Claude models. 2. Remove status-code guard in AMP reverse proxy ModifyResponse so that error responses (4xx/5xx) with hidden gzip encoding are decoded properly — prevents garbled error messages reaching the client. 3. In normalizeClaudeBudget, when the adjusted budget falls below the model minimum, set max_tokens = budgetTokens+1 instead of leaving the request unchanged (which causes a 400 from the API).
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses three distinct issues related to Claude model interactions and the AMP proxy. It ensures proper application of a critical Claude beta header, improves error handling for gzip-encoded responses in the AMP proxy, and corrects budget normalization logic for Claude requests to prevent API rejection. These changes collectively enhance the reliability and functionality of the Claude integration. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces three fixes for Claude models, enabling the interleaved-thinking beta flag, fixing gzip decoding for AMP proxy error responses, and resolving a budget normalization issue. While these changes address the described functional issues, the fix for gzip-encoded error bodies in the AMP proxy introduces a potential security risk. By removing the status code guard, memory-intensive decompression logic is now exposed to all response types without enforcing size limits, which could lead to Denial of Service (DoS) via memory exhaustion. It is recommended to implement size limits on all response body processing. Additionally, a suggestion is made in internal/runtime/executor/claude_executor.go to improve the robustness of beta flag handling for consistency.
I am having trouble creating individual review comments. Click here to see my feedback.
internal/api/modules/amp/proxy.go (111-114)
The removal of the status code check here allows the proxy to attempt decompression of gzip-encoded bodies for all responses, including error responses (4xx/5xx). While this fixes the issue of garbled error bodies reaching the client, the underlying decompression logic (specifically the io.ReadAll calls on lines 132 and 155) does not enforce any size limits. This exposes the proxy to a Denial of Service (DoS) attack via memory exhaustion if a malicious or compromised upstream sends an extremely large response or a 'gzip bomb'.
By removing this guard, the attack surface is increased to include error paths. To mitigate this risk, you should use io.LimitReader to enforce a maximum acceptable size for the response body before reading it into memory.
internal/runtime/executor/claude_executor.go (835-837)
While this correctly adds the interleaved-thinking beta flag, using strings.Contains for this check is fragile. It could lead to false positives if another beta flag contains this substring (e.g., a hypothetical disable-interleaved-thinking flag). A more robust approach is to parse the comma-separated string into a set of strings and check for the exact beta flag. This pattern is already used for extraBetas later in this function.
Adopting a set-based approach for all beta flag manipulations would improve correctness and maintainability. It would also resolve a minor inconsistency where the oauth check is performed on the original val while this check is on the potentially modified baseBetas.
luispater
left a comment
There was a problem hiding this comment.
Thanks for the fixes. The Claude header change and the AMP gzip handling look good, but I think the budget normalization change still has a correctness gap.
Blocking:
- In
internal/thinking/provider/claude/apply.go, whenadjustedBudget < minBudgetthe new branch setsmax_tokenstobudgetTokens + 1. That can exceed the model's completion limit on models whereThinking.Maxis higher thanMaxCompletionTokens(for exampleclaude-sonnet-4-5-20250929hasThinking.Max = 128000butMaxCompletionTokens = 64000). - In that case we still return an invalid request, just with a different shape. I think this path needs to clamp against the model completion max or fall back to reducing/disabling thinking instead of unconditionally raising
max_tokens.
Test plan:
- Reviewed the diff and related model definitions.
- Ran
go test ./internal/thinking/... ./internal/api/modules/amp/... ./internal/runtime/executor/....
When adjustedBudget < minBudget, the previous fix blindly set max_tokens = budgetTokens+1 which could exceed MaxCompletionTokens. Now: cap max_tokens at MaxCompletionTokens, recalculate budget, and disable thinking entirely if constraints are unsatisfiable. Add unit tests covering raise, clamp, disable, and no-op scenarios.
|
Addressed the blocking feedback — pushed a fix: normalizeClaudeBudget now clamps Also added unit tests covering:
Please take another look when you get a chance. |
Three small fixes found while running Claude models through the proxy
1. interleaved-thinking beta header missing
applyClaudeHeaders never adds the interleaved-thinking-2025-05-14 beta flag, so thinking blocks silently disappear. Added a guard that appends it when absent.
2. AMP proxy swallows gzip-encoded error bodies
ModifyResponse skipped non-2xx responses, so 4xx/5xx bodies with hidden gzip encoding reached the client garbled. Removed the status-code guard.
3. normalizeClaudeBudget leaves impossible budget → 400
When adjustedBudget < minBudget, the old code returned the body unchanged, which the API rejects. Now it sets max_tokens = budgetTokens + 1 so the budget stays valid.