-
-
Notifications
You must be signed in to change notification settings - Fork 51
refactor: clean up Anthropic translation layer (for PR #45 review) #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…atibility - Add /v1/messages endpoint with Anthropic-format request/response - Support both x-api-key and Bearer token authentication - Implement Anthropic <-> OpenAI format translation for messages, tools, and responses - Add streaming wrapper converting OpenAI SSE to Anthropic SSE events - Handle tool_use blocks with proper stop_reason detection - Fix NoneType iteration bug in tool_calls handling
- Add AnthropicThinkingConfig model and thinking parameter to request - Translate Anthropic thinking config to reasoning_effort for providers - Handle reasoning_content in streaming wrapper (thinking_delta events) - Convert reasoning_content to thinking blocks in non-streaming responses
When no thinking config is provided in the request, Opus models now automatically use reasoning_effort=high with custom_reasoning_budget=True. This ensures Opus 4.5 uses the full 32768 token thinking budget instead of the backend's auto mode (thinkingBudget: -1) which may use less. Opus always uses the -thinking variant regardless, but this change guarantees maximum thinking capacity for better reasoning quality.
…ling - Add validation to ensure maxOutputTokens > thinkingBudget for Claude extended thinking (prevents 400 INVALID_ARGUMENT API errors) - Improve streaming error handling to send proper message_start and content blocks before error event for better client compatibility - Minor code formatting improvements
Track each tool_use block index separately and emit content_block_stop for all blocks (thinking, text, and each tool_use) when stream ends. Fixes Claude Code stopping mid-action due to malformed streaming events.
…nabled - Fixed bug where budget_tokens between 10000-32000 would get ÷4 reduction - Now any explicit thinking request sets custom_reasoning_budget=True - Added logging to show thinking budget, effort level, and custom_budget flag - Simplified budget tier logic (removed redundant >= 32000 check) Before: 31999 tokens requested → 8192 tokens actual (÷4 applied) After: 31999 tokens requested → 32768 tokens actual (full "high" budget)
When using /v1/chat/completions with Opus and reasoning_effort="high" or "medium", automatically set custom_reasoning_budget=true to get full thinking tokens instead of the ÷4 reduced default. This makes the OpenAI endpoint behave consistently with the Anthropic endpoint for Opus models - if you're using Opus with high reasoning, you want the full thinking budget. Adds logging: "🧠 Thinking: auto-enabled custom_reasoning_budget for Opus"
…treaming Claude Code and other Anthropic SDK clients require message_start to be sent before any other SSE events. When a stream completed quickly without content chunks, the wrapper would send message_stop without message_start, causing clients to silently discard all output.
Signed-off-by: Moeeze Hassan <[email protected]>
This reverts commit e80645e.
…ing is enabled" This reverts commit 2ee549d.
- Create rotator_library/anthropic_compat module with models, translator, and streaming - Add anthropic_messages() and anthropic_count_tokens() methods to RotatingClient - Simplify main.py endpoints to use library methods - Remove ~762 lines of duplicate code from main.py - Fix: Use UUID instead of time.time() for tool/message IDs (avoids collisions) - Fix: Remove unused accumulated_text/accumulated_thinking variables - Fix: Map top_k parameter from Anthropic to OpenAI format
- Add comment explaining metadata parameter is intentionally not mapped (OpenAI doesn't have an equivalent field) - Use safer regex pattern matching for Opus model detection (avoids false positives like "magnum-opus-model") - Document reasoning budget thresholds and // 4 reduction behavior - Conserve thinking tokens for Opus auto-detection (use // 4 like other models) Only set custom_reasoning_budget=True when user explicitly requests 32000+ tokens
Tool results with images (e.g., from Read tool) were being dropped during Anthropic→OpenAI translation, and not properly converted to Gemini format. - translator.py: Extract image blocks from tool_result content and convert to OpenAI image_url format - antigravity_provider.py: Handle multimodal tool responses by converting image_url to Gemini inlineData format
- Force default Claude thinking budget to 31999 when thinking is enabled - Inject interleaved thinking hint for Claude tool calls - Log request headers and raw/unwrapped Claude responses for debugging - Preserve thinking signatures across Anthropic compat translation - Improve thinking signature validation/filtering in Antigravity provider Signed-off-by: Moeeze Hassan <[email protected]>
Pass through the exact budget_tokens value from the Anthropic request instead of using a hardcoded constant. This allows Claude Code and other clients to control the thinking budget directly. Changes: - translator.py: Pass thinking_budget from request.thinking.budget_tokens - antigravity_provider.py: Accept and use thinking_budget parameter in _get_thinking_config(), falling back to default if not provided Signed-off-by: Moeeze Hassan <[email protected]>
When thinking is enabled but the last assistant message has no thinking block AND no tool calls (simple text response), Claude API rejects with "Expected thinking but found text". Add synthetic user message to start a fresh turn, allowing thinking to be generated naturally. Signed-off-by: Moeeze Hassan <[email protected]>
Require a thinking block before each tool call and after tool results for Claude interleaved thinking. Signed-off-by: Moeeze Hassan <[email protected]>
…config Claude models always return early before reaching the model-specific budgets section, making the `or is_claude` condition dead code.
Logs a debug message when skipping non-data URL images, helping developers troubleshoot why images may not appear in requests.
Google's promptTokenCount INCLUDES cached tokens, but Anthropic's input_tokens EXCLUDES cached tokens. This fix: - Extract cachedContentTokenCount from Google's usageMetadata - Subtract cached tokens from input_tokens in responses - Include cache_read_input_tokens and cache_creation_input_tokens - Apply fix to both streaming and non-streaming responses
1. Session ID for Prompt Caching (High Priority) - Derive stable session ID from first user message hash - Enables prompt caching continuity across conversation turns - Falls back to random ID if no user message found 2. Content Reordering (Medium Priority) - Reorder assistant content blocks: thinking → text → tool_use - Matches Anthropic's expected ordering - Sanitizes thinking blocks by removing cache_control 3. Document/PDF Handling (Low Priority) - Support for 'document' type content blocks - Converts base64/URL documents to OpenAI image_url format - Default media type: application/pdf 4. Gemini Output Token Cap (Low Priority) - Add GEMINI_MAX_OUTPUT_TOKENS constant (16384) - Cap maxOutputTokens for non-Claude models - Prevents errors from exceeding Gemini limits 5. Schema Sanitization Improvements (Low Priority) - Add _score_schema_option() for smarter anyOf/oneOf selection - Add _merge_all_of() to properly merge allOf schemas - Add description hints when flattening union types - Select best option (objects > arrays > primitives > null)
Use structured format with CRITICAL prefix and bullet points to reduce skipped thinking blocks between tool calls. Signed-off-by: Moeeze Hassan <[email protected]>
…imit Instead of silently capping max_tokens, raise a ValueError so Claude Code sees the error and can adjust its request. Fixes 400 INVALID_ARGUMENT errors when clients send max_tokens > 64000 for Claude models.
Signed-off-by: Moeeze Hassan <[email protected]>
- Integrate refactored utilities from dev (gemini_shared_utils, etc.) - Keep Antigravity system prompts (critical for API) - Add Claude interleaved thinking features from feature branch - Ensure interleaved thinking hint comes AFTER Antigravity prompts
- Add explicit_budget parameter to _get_thinking_config - Cap Claude thinking budget at 31999 when explicit budget provided - Pass thinking_budget kwarg from Anthropic translator to provider
Ignore client's budget_tokens value and always use 31999 for Claude via Anthropic routes to ensure full thinking capacity.
…y lacks thinking blocks
Token counting endpoints (/v1/token-count and /v1/messages/count_tokens) were returning inaccurate counts because they didn't include the Antigravity preprompts that get injected during actual API calls. - Add get_antigravity_preprompt_text() helper to expose preprompt text - Update RotatingClient.token_count() to add preprompt tokens for Antigravity provider models Signed-off-by: Moeeze Hassan <[email protected]>
Resolve conflicts by adopting dev's interleaved thinking implementation and removing duplicate code from this branch.
Remove remaining references to removed interleaved thinking attributes that were brought in during merge.
Bring in latest dev changes including: - Docker support and workflows - Fair cycle rotation and custom usage caps - Smart cooldown waiting and fail-fast logic - Centralized library defaults - Dynamic custom OpenAI-compatible provider system - Interactive connection recovery
…remove legacy hacks Replaces ad-hoc thinking logic with a structured mapping from Anthropic `budget_tokens` to `reasoning_effort` levels. This change aligns the translation layer with standard provider capabilities and cleans up deprecated workarounds. - Implement `_budget_to_reasoning_effort` to convert token counts to reasoning levels (e.g., "low", "medium", "high", "granular"). - Remove legacy logic that forced max thinking budget for Claude Opus models. - Remove workaround for injecting "[Continue]" messages into conversation history. - Delete unused helper functions in `AntigravityProvider` (signature validation, content reordering, and explicit budget overrides).
…ompatibility This change introduces a hierarchical logging structure to better trace requests passing through the translation layer. - Update `TransactionLogger` to support nested directories (`parent_dir`) and custom filenames, allowing internal OpenAI transactions to be logged as children of the original Anthropic request. - Implement full response reconstruction in `anthropic_streaming_wrapper` to accumulate and log the final state of streaming interactions (including thinking blocks and tool calls). - Modify `RotatingClient` to pass logging context down to the translation layer. - Switch `proxy_app` to use `RawIOLogger` when enabled for better debugging of the proxy boundary.
The previous implementation using `delta.get("tool_calls", [])` would return `None` if the provider explicitly sent `"tool_calls": null`, bypassing the default value. This change ensures `tool_calls` always resolves to a list using the `or []` pattern, preventing potential errors during iteration.
Updates project documentation to reflect the new Anthropic API compatibility features: - **README.md**: Add setup guides for Claude Code and Anthropic Python SDK, plus API endpoint details. - **DOCUMENTATION.md**: Add deep dive into the `anthropic_compat` architecture, including translation logic and streaming behavior. - **Library Docs**: Document `anthropic_messages` and `anthropic_count_tokens` methods in `rotator_library`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Replaces ad-hoc thinking logic with a structured mapping from Anthropic
budget_tokenstoreasoning_effortlevels. This change aligns the translation layer with standard provider capabilities and cleans up deprecated workarounds.Changes
_budget_to_reasoning_effort()to convert token counts to reasoning levels (low, medium, high, + granular for whitelisted providers)AntigravityProvider(signature validation, content reordering, explicit budget overrides)Why
The translation layer should be provider-agnostic. Provider-specific logic (thinking budgets, model detection, history workarounds) belongs in the providers themselves, not the translation layer.
Files Changed
translator.pyantigravity_provider.pyget_antigravity_preprompt_textfor token counting)Related to: #45
Important
Refactor Anthropic translation layer to map token budgets to reasoning effort levels and remove deprecated logic.
_budget_to_reasoning_effort()intranslator.pyto mapbudget_tokenstoreasoning_effortlevels.translator.py.translator.py.AntigravityProvider.translator.py: Added budget to level mapping, removed provider-specific workarounds.antigravity_provider.py: Removed PR additions, keptget_antigravity_preprompt_textfor token counting.DOCUMENTATION.mdto include Anthropic API compatibility.README.mdto reflect Anthropic-compatible endpoints.This description was created by
for 1798e75. You can customize this summary. It will automatically update as commits are pushed.