refactor: clean up Anthropic translation layer (for PR #45 review) #76

Mirrowel · 2026-01-15T17:21:06Z

Summary

Note: This PR is for review/discussion with @FammasMaz regarding PR #45. These changes should be applied to the feature/anthropic-endpoints branch.

Replaces ad-hoc thinking logic with a structured mapping from Anthropic budget_tokens to reasoning_effort levels. This change aligns the translation layer with standard provider capabilities and cleans up deprecated workarounds.

Changes

Implement _budget_to_reasoning_effort() to convert token counts to reasoning levels (low, medium, high, + granular for whitelisted providers)
Remove legacy logic that forced max thinking budget (31999) for Claude Opus models
Remove workaround for injecting "[Continue]" messages into conversation history
Delete unused helper functions in AntigravityProvider (signature validation, content reordering, explicit budget overrides)

Why

The translation layer should be provider-agnostic. Provider-specific logic (thinking budgets, model detection, history workarounds) belongs in the providers themselves, not the translation layer.

Files Changed

File	Changes
`translator.py`	Added budget→level mapping, removed provider-specific workarounds
`antigravity_provider.py`	Removed PR additions (kept `get_antigravity_preprompt_text` for token counting)

Related to: #45

Important

Refactor Anthropic translation layer to map token budgets to reasoning effort levels and remove deprecated logic.

Behavior:
- Implement _budget_to_reasoning_effort() in translator.py to map budget_tokens to reasoning_effort levels.
- Remove legacy logic for max thinking budget in translator.py.
- Remove workaround for injecting "[Continue]" messages in translator.py.
- Delete unused helper functions in AntigravityProvider.
Files Changed:
- translator.py: Added budget to level mapping, removed provider-specific workarounds.
- antigravity_provider.py: Removed PR additions, kept get_antigravity_preprompt_text for token counting.
Misc:
- Update documentation in DOCUMENTATION.md to include Anthropic API compatibility.
- Update README.md to reflect Anthropic-compatible endpoints.

^{This description was created by}^{for 1798e75. You can customize this summary. It will automatically update as commits are pushed.}

…atibility - Add /v1/messages endpoint with Anthropic-format request/response - Support both x-api-key and Bearer token authentication - Implement Anthropic <-> OpenAI format translation for messages, tools, and responses - Add streaming wrapper converting OpenAI SSE to Anthropic SSE events - Handle tool_use blocks with proper stop_reason detection - Fix NoneType iteration bug in tool_calls handling

- Add AnthropicThinkingConfig model and thinking parameter to request - Translate Anthropic thinking config to reasoning_effort for providers - Handle reasoning_content in streaming wrapper (thinking_delta events) - Convert reasoning_content to thinking blocks in non-streaming responses

When no thinking config is provided in the request, Opus models now automatically use reasoning_effort=high with custom_reasoning_budget=True. This ensures Opus 4.5 uses the full 32768 token thinking budget instead of the backend's auto mode (thinkingBudget: -1) which may use less. Opus always uses the -thinking variant regardless, but this change guarantees maximum thinking capacity for better reasoning quality.

…ling - Add validation to ensure maxOutputTokens > thinkingBudget for Claude extended thinking (prevents 400 INVALID_ARGUMENT API errors) - Improve streaming error handling to send proper message_start and content blocks before error event for better client compatibility - Minor code formatting improvements

Track each tool_use block index separately and emit content_block_stop for all blocks (thinking, text, and each tool_use) when stream ends. Fixes Claude Code stopping mid-action due to malformed streaming events.

…nabled - Fixed bug where budget_tokens between 10000-32000 would get ÷4 reduction - Now any explicit thinking request sets custom_reasoning_budget=True - Added logging to show thinking budget, effort level, and custom_budget flag - Simplified budget tier logic (removed redundant >= 32000 check) Before: 31999 tokens requested → 8192 tokens actual (÷4 applied) After: 31999 tokens requested → 32768 tokens actual (full "high" budget)

When using /v1/chat/completions with Opus and reasoning_effort="high" or "medium", automatically set custom_reasoning_budget=true to get full thinking tokens instead of the ÷4 reduced default. This makes the OpenAI endpoint behave consistently with the Anthropic endpoint for Opus models - if you're using Opus with high reasoning, you want the full thinking budget. Adds logging: "🧠 Thinking: auto-enabled custom_reasoning_budget for Opus"

…ponses

…treaming Claude Code and other Anthropic SDK clients require message_start to be sent before any other SSE events. When a stream completed quickly without content chunks, the wrapper would send message_stop without message_start, causing clients to silently discard all output.

Signed-off-by: Moeeze Hassan <[email protected]>

This reverts commit e80645e.

…ing is enabled" This reverts commit 2ee549d.

- Create rotator_library/anthropic_compat module with models, translator, and streaming - Add anthropic_messages() and anthropic_count_tokens() methods to RotatingClient - Simplify main.py endpoints to use library methods - Remove ~762 lines of duplicate code from main.py - Fix: Use UUID instead of time.time() for tool/message IDs (avoids collisions) - Fix: Remove unused accumulated_text/accumulated_thinking variables - Fix: Map top_k parameter from Anthropic to OpenAI format

- Add comment explaining metadata parameter is intentionally not mapped (OpenAI doesn't have an equivalent field) - Use safer regex pattern matching for Opus model detection (avoids false positives like "magnum-opus-model") - Document reasoning budget thresholds and // 4 reduction behavior - Conserve thinking tokens for Opus auto-detection (use // 4 like other models) Only set custom_reasoning_budget=True when user explicitly requests 32000+ tokens

Tool results with images (e.g., from Read tool) were being dropped during Anthropic→OpenAI translation, and not properly converted to Gemini format. - translator.py: Extract image blocks from tool_result content and convert to OpenAI image_url format - antigravity_provider.py: Handle multimodal tool responses by converting image_url to Gemini inlineData format

- Force default Claude thinking budget to 31999 when thinking is enabled - Inject interleaved thinking hint for Claude tool calls - Log request headers and raw/unwrapped Claude responses for debugging - Preserve thinking signatures across Anthropic compat translation - Improve thinking signature validation/filtering in Antigravity provider Signed-off-by: Moeeze Hassan <[email protected]>

Pass through the exact budget_tokens value from the Anthropic request instead of using a hardcoded constant. This allows Claude Code and other clients to control the thinking budget directly. Changes: - translator.py: Pass thinking_budget from request.thinking.budget_tokens - antigravity_provider.py: Accept and use thinking_budget parameter in _get_thinking_config(), falling back to default if not provided Signed-off-by: Moeeze Hassan <[email protected]>

When thinking is enabled but the last assistant message has no thinking block AND no tool calls (simple text response), Claude API rejects with "Expected thinking but found text". Add synthetic user message to start a fresh turn, allowing thinking to be generated naturally. Signed-off-by: Moeeze Hassan <[email protected]>

Require a thinking block before each tool call and after tool results for Claude interleaved thinking. Signed-off-by: Moeeze Hassan <[email protected]>

…config Claude models always return early before reaching the model-specific budgets section, making the `or is_claude` condition dead code.

Logs a debug message when skipping non-data URL images, helping developers troubleshoot why images may not appear in requests.

Google's promptTokenCount INCLUDES cached tokens, but Anthropic's input_tokens EXCLUDES cached tokens. This fix: - Extract cachedContentTokenCount from Google's usageMetadata - Subtract cached tokens from input_tokens in responses - Include cache_read_input_tokens and cache_creation_input_tokens - Apply fix to both streaming and non-streaming responses

1. Session ID for Prompt Caching (High Priority) - Derive stable session ID from first user message hash - Enables prompt caching continuity across conversation turns - Falls back to random ID if no user message found 2. Content Reordering (Medium Priority) - Reorder assistant content blocks: thinking → text → tool_use - Matches Anthropic's expected ordering - Sanitizes thinking blocks by removing cache_control 3. Document/PDF Handling (Low Priority) - Support for 'document' type content blocks - Converts base64/URL documents to OpenAI image_url format - Default media type: application/pdf 4. Gemini Output Token Cap (Low Priority) - Add GEMINI_MAX_OUTPUT_TOKENS constant (16384) - Cap maxOutputTokens for non-Claude models - Prevents errors from exceeding Gemini limits 5. Schema Sanitization Improvements (Low Priority) - Add _score_schema_option() for smarter anyOf/oneOf selection - Add _merge_all_of() to properly merge allOf schemas - Add description hints when flattening union types - Select best option (objects > arrays > primitives > null)

Use structured format with CRITICAL prefix and bullet points to reduce skipped thinking blocks between tool calls. Signed-off-by: Moeeze Hassan <[email protected]>

…imit Instead of silently capping max_tokens, raise a ValueError so Claude Code sees the error and can adjust its request. Fixes 400 INVALID_ARGUMENT errors when clients send max_tokens > 64000 for Claude models.

Signed-off-by: Moeeze Hassan <[email protected]>

- Integrate refactored utilities from dev (gemini_shared_utils, etc.) - Keep Antigravity system prompts (critical for API) - Add Claude interleaved thinking features from feature branch - Ensure interleaved thinking hint comes AFTER Antigravity prompts

- Add explicit_budget parameter to _get_thinking_config - Cap Claude thinking budget at 31999 when explicit budget provided - Pass thinking_budget kwarg from Anthropic translator to provider

Ignore client's budget_tokens value and always use 31999 for Claude via Anthropic routes to ensure full thinking capacity.

…y lacks thinking blocks

Token counting endpoints (/v1/token-count and /v1/messages/count_tokens) were returning inaccurate counts because they didn't include the Antigravity preprompts that get injected during actual API calls. - Add get_antigravity_preprompt_text() helper to expose preprompt text - Update RotatingClient.token_count() to add preprompt tokens for Antigravity provider models Signed-off-by: Moeeze Hassan <[email protected]>

…oints

Resolve conflicts by adopting dev's interleaved thinking implementation and removing duplicate code from this branch.

Remove remaining references to removed interleaved thinking attributes that were brought in during merge.

Bring in latest dev changes including: - Docker support and workflows - Fair cycle rotation and custom usage caps - Smart cooldown waiting and fail-fast logic - Centralized library defaults - Dynamic custom OpenAI-compatible provider system - Interactive connection recovery

…remove legacy hacks Replaces ad-hoc thinking logic with a structured mapping from Anthropic `budget_tokens` to `reasoning_effort` levels. This change aligns the translation layer with standard provider capabilities and cleans up deprecated workarounds. - Implement `_budget_to_reasoning_effort` to convert token counts to reasoning levels (e.g., "low", "medium", "high", "granular"). - Remove legacy logic that forced max thinking budget for Claude Opus models. - Remove workaround for injecting "[Continue]" messages into conversation history. - Delete unused helper functions in `AntigravityProvider` (signature validation, content reordering, and explicit budget overrides).

…ompatibility This change introduces a hierarchical logging structure to better trace requests passing through the translation layer. - Update `TransactionLogger` to support nested directories (`parent_dir`) and custom filenames, allowing internal OpenAI transactions to be logged as children of the original Anthropic request. - Implement full response reconstruction in `anthropic_streaming_wrapper` to accumulate and log the final state of streaming interactions (including thinking blocks and tool calls). - Modify `RotatingClient` to pass logging context down to the translation layer. - Switch `proxy_app` to use `RawIOLogger` when enabled for better debugging of the proxy boundary.

The previous implementation using `delta.get("tool_calls", [])` would return `None` if the provider explicitly sent `"tool_calls": null`, bypassing the default value. This change ensures `tool_calls` always resolves to a list using the `or []` pattern, preventing potential errors during iteration.

Updates project documentation to reflect the new Anthropic API compatibility features: - **README.md**: Add setup guides for Claude Code and Anthropic Python SDK, plus API endpoint details. - **DOCUMENTATION.md**: Add deep dive into the `anthropic_compat` architecture, including translation logic and streaming behavior. - **Library Docs**: Document `anthropic_messages` and `anthropic_count_tokens` methods in `rotator_library`.

FammasMaz added 30 commits December 20, 2025 22:26

fix(anthropic): properly close all content blocks in streaming wrapper

e35f3f0

Track each tool_use block index separately and emit content_block_stop for all blocks (thinking, text, and each tool_use) when stream ends. Fixes Claude Code stopping mid-action due to malformed streaming events.

fix(anthropic): add missing uuid import for /v1/messages endpoint

4ec92ec

fix(anthropic): add missing JSONResponse import for non-streaming res…

758b4b5

…ponses

feat: add /context endpoint for anthropic routes

de88557

Signed-off-by: Moeeze Hassan <[email protected]>

Revert "feat(openai): auto-enable full thinking budget for Opus"

beed0bc

This reverts commit e80645e.

Revert "fix(anthropic): always set custom_reasoning_budget when think…

2c93a68

…ing is enabled" This reverts commit 2ee549d.

fix(anthropic): strengthen interleaved thinking hint

0bb8a52

Require a thinking block before each tool call and after tool results for Claude interleaved thinking. Signed-off-by: Moeeze Hassan <[email protected]>

fix(antigravity): remove unreachable is_claude condition in thinking …

991a8e3

…config Claude models always return early before reaching the model-specific budgets section, making the `or is_claude` condition dead code.

fix(antigravity): add debug logging for non-data URL images

354ac17

Logs a debug message when skipping non-data URL images, helping developers troubleshoot why images may not appear in requests.

fix(antigravity): make interleaved thinking hint more explicit

dc19691

Use structured format with CRITICAL prefix and bullet points to reduce skipped thinking blocks between tool calls. Signed-off-by: Moeeze Hassan <[email protected]>

experimental: try to be more explicit about must think instruction

bbc1060

Signed-off-by: Moeeze Hassan <[email protected]>

feat(anthropic): respect explicit thinking_budget from Anthropic routes

d4ad8af

- Add explicit_budget parameter to _get_thinking_config - Cap Claude thinking budget at 31999 when explicit budget provided - Pass thinking_budget kwarg from Anthropic translator to provider

feat(anthropic): always use max thinking budget (31999) for Claude

9d568fe

Ignore client's budget_tokens value and always use 31999 for Claude via Anthropic routes to ensure full thinking capacity.

FammasMaz and others added 7 commits January 8, 2026 14:19

fix(anthropic): inject [Continue] for fresh thinking turn when histor…

67ffea5

…y lacks thinking blocks

Merge remote-tracking branch 'origin/dev' into feature/anthropic-endp…

4aa703f

…oints

Merge origin/dev into feature/anthropic-endpoints

9d4799e

Resolve conflicts by adopting dev's interleaved thinking implementation and removing duplicate code from this branch.

fix(antigravity): remove stale interleaved thinking references

49d2e47

Remove remaining references to removed interleaved thinking attributes that were brought in during merge.

Mirrowel mentioned this pull request Jan 15, 2026

Anthropic endpoint for claude code #45

Merged

Mirrowel added 2 commits January 15, 2026 19:17

Mirrowel self-assigned this Jan 15, 2026

Mirrowel closed this Jan 15, 2026

Mirrowel deleted the refactor/anthropic-translation-cleanup branch January 16, 2026 02:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

refactor: clean up Anthropic translation layer (for PR #45 review) #76

refactor: clean up Anthropic translation layer (for PR #45 review) #76

Uh oh!

Mirrowel commented Jan 15, 2026 •

edited by ellipsis-dev bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

refactor: clean up Anthropic translation layer (for PR #45 review) #76

refactor: clean up Anthropic translation layer (for PR #45 review) #76

Uh oh!

Conversation

Mirrowel commented Jan 15, 2026 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why

Files Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mirrowel commented Jan 15, 2026 •

edited by ellipsis-dev bot

Loading