-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is it a request payload issue?
[ ] Yes, this is a request payload issue. I am using a client/cURL to send a request payload, but I received an unexpected error.
[x] No, it's another issue.
Describe the bug
When using Claude models (Opus 4.6, Sonnet 4.6) through the Antigravity provider, thinking content is returned as raw text with <thinking> XML tags mixed into type: "text" content blocks, instead of proper type: "thinking" content blocks. Clients like OpenCode that render thinking blocks separately (dark grey, visually separated) cannot distinguish thinking from response text.
This works correctly for Gemini models on the same Antigravity backend (where the backend sets thought: true on thought parts), and also works correctly with providers that return native Anthropic format (e.g., Z.ai for GLM-5).
Root cause: The Antigravity backend for Claude models does not annotate thinking with thought: true in the response parts. Instead, Claude's thinking arrives as regular text wrapped in <thinking>...</thinking> XML tags. Since CLIProxyAPI's response translator relies exclusively on the thought: true flag, the thinking content passes through as regular text.
CLI Type
Antigravity (Google backend for Claude models)
Model Name
claude-opus-4-6-thinking, claude-sonnet-4-6-thinking (via Antigravity)
LLM Client
OpenCode 1.2.15 with @ai-sdk/anthropic SDK
Request Information
# Streaming request with thinking enabled
curl -s http://localhost:8317/v1/messages \
-H "x-api-key: <key>" -H "content-type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{"model":"claude-opus-4-6-thinking","max_tokens":500,"stream":true,"thinking":{"type":"enabled","budget_tokens":8192},"messages":[{"role":"user","content":"Solve: what is the square root of 252557324527? Show your work."}]}'Streaming response shows only content_block_start with "type":"text" and text_delta events — zero thinking blocks or thinking_delta events, even when the model produces thinking content.
When thinking IS produced by the backend, it appears as raw <thinking>...</thinking> XML tags within the text content block, e.g.:
502,550² = 252,556,302,500
252,557,324,527 - 252,556,302,500 = 1,022,027
</thinking>
√252,557,324,527 ≈ 502,550.818
<thinking>
The square root is approximately 502,550.818.
</thinking>
Expected behavior
Thinking content should arrive as separate content blocks:
{
"content": [
{ "type": "thinking", "thinking": "Let me calculate...", "signature": "..." },
{ "type": "text", "text": "√252,557,324,527 ≈ 502,550.818" }
]
}This is what happens with native Anthropic format providers (e.g., Z.ai with GLM-5), where OpenCode renders thinking in dark grey, visually separated from the response.
Code Analysis
The response translator correctly handles thought: true when present — the issue is at the backend layer:
Response translator (internal/translator/antigravity/claude/antigravity_claude_response.go, line 136):
if partResult.Get("thought").Bool() {
// Creates proper content_block_start with type:"thinking"
// Creates thinking_delta events
// ← This code path works correctly (verified with Gemini models)
} else {
// Regular text content → type:"text" blocks
// ← Claude thinking lands here because backend doesn't set thought: true
}Executor (internal/runtime/executor/antigravity_executor.go, lines 652-664):
thought := part.Get("thought").Bool()
if thought || part.Get("text").Exists() {
kind := "text"
if thought {
kind = "thought"
}
// Claude via Antigravity: thought is always false, kind is always "text"
}Data flow comparison:
| Provider | Backend behavior | Result |
|---|---|---|
| Antigravity/Claude | Thinking embedded as <thinking> XML tags in regular text, no thought: true flag |
BROKEN: Raw XML tags in text blocks |
| Antigravity/Gemini | Thinking annotated with thought: true |
OK: Proper thinking blocks |
| Z.ai/GLM-5 | Native Anthropic format with type: "thinking" content blocks |
OK: Proper thinking blocks |
Suggested Fix
Since the Antigravity backend doesn't annotate Claude's thinking with thought: true, CLIProxyAPI could work around this by detecting <thinking> XML tags in text content during response translation:
- In
antigravity_claude_response.go, when processing text parts withoutthought: true, check for<thinking>...</thinking>patterns - Split the text into segments and emit thinking segments as
type: "thinking"content blocks - Gate this behavior on Claude models only (check model name) to avoid false positives
- Handle streaming edge cases where
<thinking>open/close tags may span multiple chunks
Alternatively, if there's a way to configure the Antigravity backend to return thought: true for Claude models (different API parameter or flag), that would be cleaner since the existing handler would work as-is.
OS Type
- OS: Linux (WSL2)
- Kernel: 6.6.87.2-microsoft-standard-WSL2
Additional context
- CLIProxyAPI version: v6.8.35
- Related: feat(thinking): support Claude output_config.effort parameter (Opus 4.6) #1540 (adaptive thinking
output_config.effortparameter) - The request-side thinking config translation works correctly (budget/thinkingLevel are set) — this is purely a response-side issue
- cc @luispater