Anthropic endpoint for claude code #45

FammasMaz · 2025-12-19T14:38:18Z

Important

Adds Anthropic endpoint for Claude code with format translation and updates to Antigravity provider for compatibility.

Endpoints:
- Adds /v1/messages and /v1/messages/count_tokens endpoints in main.py for Anthropic API compatibility.
- Implements anthropic_messages and anthropic_count_tokens functions to handle requests and responses in Anthropic format.
Models:
- Introduces AnthropicTextBlock, AnthropicImageSource, AnthropicImageBlock, AnthropicToolUseBlock, AnthropicToolResultBlock, AnthropicMessage, AnthropicTool, AnthropicThinkingConfig, AnthropicMessagesRequest, AnthropicUsage, AnthropicMessagesResponse, AnthropicCountTokensRequest, and AnthropicCountTokensResponse in main.py.
Format Translation:
- Adds functions anthropic_to_openai_messages, anthropic_to_openai_tools, anthropic_to_openai_tool_choice, and openai_to_anthropic_response in main.py for format conversion.
Streaming:
- Implements anthropic_streaming_wrapper in main.py to convert OpenAI streaming format to Anthropic streaming format.
Antigravity Provider:
- Updates _transform_to_antigravity_format in antigravity_provider.py to ensure max_tokens is greater than thinking.budget_tokens for Claude models.

^{This description was created by}^{for aef492b. You can customize this summary. It will automatically update as commits are pushed.}

mirrobot-agent · 2025-12-19T14:39:08Z

Starting my review of the Anthropic endpoint implementation. This is a significant addition bringing Claude Code compatibility via format translation between Anthropic and OpenAI APIs. Analyzing the endpoint handlers, model definitions, and streaming wrapper now.

mirrobot-agent

Overall Assessment

This PR delivers a well-structured Anthropic API compatibility layer, enabling Claude Code and other Anthropic clients to use the proxy via format translation. The implementation demonstrates solid understanding of both API formats and handles edge cases thoughtfully (e.g., forcing high thinking for Opus models, defensive error handling in streaming).

Verdict: The code quality is good with comprehensive error handling and proper Anthropic error format responses. There are two minor improvements suggested inline, but neither is blocking.

Architectural Feedback

The approach of translating Anthropic format → OpenAI format → processing → translating back is sound and aligns with the existing proxy architecture. The streaming wrapper correctly maintains state across the SSE event lifecycle, and the block index tracking for tool calls is well-implemented.

One architectural note: The code adds ~940 lines to main.py, which is already substantial. Consider whether these Anthropic-specific handlers could be extracted to a separate module (e.g., anthropic_compat.py) in a future refactor to improve maintainability.

Key Suggestions

Tool ID collision risk (inline comment): The fallback int(time.time()) for tool IDs could collide with parallel calls. Using UUID is recommended.
Unused state variables (inline comment): accumulated_text and accumulated_thinking in the streaming wrapper are tracked but never used.

Minor Observations (Not Inline)

Token counting for tools: The count_tokens endpoint serializes tool definitions to JSON text and counts that. This is an approximation since tokenizers may handle function schemas differently. Consider documenting this limitation.
Reasoning effort thresholds: The budget_tokens → reasoning_effort mapping (32000/10000/5000) works but lacks inline documentation explaining the rationale.
Testing: No tests are included in this PR for the new endpoints. Consider adding test coverage for the format translation functions and endpoint handlers.

Questions for the Author

Are the accumulated_text and accumulated_thinking variables in the streaming wrapper intended for future use (e.g., logging), or are they leftover from development?
Is there a plan to add integration tests for the Anthropic endpoint compatibility?

This review was generated by an AI assistant.

mirrobot-agent · 2025-12-19T14:42:08Z

src/proxy_app/main.py

+        content_blocks.append(
+            {
+                "type": "tool_use",
+                "id": tc.get("id", f"toolu_{int(time.time())}"),


Using int(time.time()) as a fallback tool ID could cause collisions if multiple tool calls are processed within the same second. Since uuid is already imported in this file, consider using a UUID for more robust uniqueness.

Suggested change

"id": tc.get("id", f"toolu_{int(time.time())}"),

"id": tc.get("id", f"toolu_{uuid.uuid4().hex[:12]}"),

mirrobot-agent · 2025-12-19T14:42:08Z

src/proxy_app/main.py

+    thinking_block_started = False
+    current_block_index = 0
+    accumulated_text = ""
+    accumulated_thinking = ""


The variables accumulated_text and accumulated_thinking are initialized and updated throughout the streaming wrapper but are never read. If these are intended for debugging or future use (e.g., logging the complete response), consider adding a comment explaining their purpose. Otherwise, removing them would reduce cognitive overhead.

Copilot

Pull request overview

This PR adds Anthropic API endpoint support to enable compatibility with Claude Code and other Anthropic API clients. The implementation translates between Anthropic's message format and the existing OpenAI-compatible format used internally.

Key changes:

Added Anthropic Messages API endpoint (/v1/messages) with streaming and non-streaming support
Implemented format translation between Anthropic and OpenAI request/response structures
Added Claude extended thinking parameter validation to ensure max_tokens > thinking_budget

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 14 comments.

File	Description
`src/rotator_library/providers/antigravity_provider.py`	Adds validation logic to ensure max_tokens exceeds thinking_budget for Claude models with extended thinking enabled
`src/proxy_app/main.py`	Implements Anthropic Messages API endpoint with request/response translation, streaming support, token counting endpoint, and dual authentication (x-api-key and Bearer token)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-19T14:42:19Z

src/proxy_app/main.py

+            anthropic_usage["cache_read_input_tokens"] = details["cached_tokens"]
+
+    return {
+        "id": openai_response.get("id", f"msg_{int(time.time())}"),


Using time.time() as a fallback for message IDs can generate duplicate IDs if multiple messages are processed within the same second. Consider using uuid or a more robust unique ID generation strategy for the fallback case, similar to request_id generation at line 1672.

Copilot · 2025-12-19T14:42:19Z

src/rotator_library/providers/antigravity_provider.py

+        # CRITICAL: For Claude with extended thinking, max_tokens MUST be > thinking.budget_tokens
+        # Per Claude docs: https://docs.claude.com/en/docs/build-with-claude/extended-thinking
+        # If this constraint is violated, the API returns 400 INVALID_ARGUMENT


The comment references Claude documentation, but the logic is being applied to all Claude models via the antigravity provider. Consider clarifying which specific models or model versions this constraint applies to, as extended thinking may not be available for all Claude models.

Suggested change

# CRITICAL: For Claude with extended thinking, max_tokens MUST be > thinking.budget_tokens

# Per Claude docs: https://docs.claude.com/en/docs/build-with-claude/extended-thinking

# If this constraint is violated, the API returns 400 INVALID_ARGUMENT

# CRITICAL: For Claude models that support extended thinking (e.g., Sonnet 4.5 and Opus 4.5

# via the Antigravity proxy), max_tokens MUST be > thinking.budget_tokens when

# extended thinking is enabled. Per Claude docs:

# https://docs.claude.com/en/docs/build-with-claude/extended-thinking

# If this constraint is violated for those models, the API can return 400 INVALID_ARGUMENT

Copilot · 2025-12-19T14:42:20Z

src/proxy_app/main.py

+    # Check x-api-key first (Anthropic style)
+    if x_api_key and x_api_key == PROXY_API_KEY:
+        return x_api_key
+    # Fall back to Bearer token (OpenAI style)
+    if auth and auth == f"Bearer {PROXY_API_KEY}":
+        return auth
+    raise HTTPException(status_code=401, detail="Invalid or missing API Key")


When PROXY_API_KEY is not set or empty (open access mode), this function will always raise an HTTPException because neither condition will match. This is inconsistent with verify_api_key at line 794 which allows access when PROXY_API_KEY is not set. Consider adding a check similar to line 794 to allow open access mode.

Copilot · 2025-12-19T14:42:20Z

src/proxy_app/main.py

+        elif "opus" in body.model.lower():
+            # Force high thinking for Opus models when no thinking config is provided
+            # Opus 4.5 always uses the -thinking variant, so we want maximum thinking budget
+            # Without this, the backend defaults to thinkingBudget: -1 (auto) instead of high
+            openai_request["reasoning_effort"] = "high"
+            openai_request["custom_reasoning_budget"] = True


The model name matching logic uses a simple substring check ("opus" in body.model.lower()), which could match unintended model names. Consider using a more specific pattern or a list of known Opus model names to avoid false positives with models that might contain "opus" in their name but aren't Claude Opus models.

Copilot · 2025-12-19T14:42:20Z

src/proxy_app/main.py

+        # Build OpenAI-compatible request
+        openai_request = {
+            "model": body.model,
+            "messages": openai_messages,
+            "max_tokens": body.max_tokens,
+            "stream": body.stream or False,
+        }
+
+        if body.temperature is not None:
+            openai_request["temperature"] = body.temperature
+        if body.top_p is not None:
+            openai_request["top_p"] = body.top_p
+        if body.stop_sequences:
+            openai_request["stop"] = body.stop_sequences
+        if openai_tools:
+            openai_request["tools"] = openai_tools
+        if openai_tool_choice:
+            openai_request["tool_choice"] = openai_tool_choice


The top_k parameter from the Anthropic request (defined in AnthropicMessagesRequest at line 301) is not being mapped to the OpenAI request. Anthropic's top_k parameter controls sampling diversity and should be included if provided. Consider adding a check similar to lines 1699-1702 to include top_k in the openai_request when it's not None.

Copilot · 2025-12-19T14:42:22Z

src/proxy_app/main.py

+            openai_request["tools"] = openai_tools
+        if openai_tool_choice:
+            openai_request["tool_choice"] = openai_tool_choice
+


The metadata parameter from the Anthropic request (defined in AnthropicMessagesRequest at line 306) is not being mapped or handled. While metadata may not directly map to OpenAI format, it could be useful for logging or tracking purposes. Consider whether this parameter should be preserved for request logging or passed through as custom fields.

Suggested change

# Preserve Anthropic metadata for logging/tracking or downstream use

metadata = anthropic_request.get("metadata")

if metadata is not None:

openai_request["metadata"] = metadata

Copilot · 2025-12-19T14:42:22Z

src/rotator_library/providers/antigravity_provider.py

+            if current_max_tokens <= thinking_budget:
+                lib_logger.warning(
+                    f"max_tokens ({current_max_tokens}) must be > thinkingBudget ({thinking_budget}). "
+                    f"Adjusting to {min_required_tokens}"
+                )
+                gen_config["maxOutputTokens"] = min_required_tokens


The condition checks if current_max_tokens <= thinking_budget, but the adjustment only occurs in this case. However, the comment states that max_tokens MUST be > thinking_budget. If current_max_tokens is between thinking_budget + 1 and thinking_budget + 1024 (exclusive), it satisfies the API requirement (> thinking_budget) but might not provide enough buffer for the actual response content. Consider changing the condition to check if current_max_tokens < min_required_tokens instead.

Copilot · 2025-12-19T14:42:22Z

src/proxy_app/main.py

+                # Close any open thinking block
+                if thinking_block_started:
+                    yield f'event: content_block_stop\ndata: {{"type": "content_block_stop", "index": {current_block_index}}}\n\n'
+                    current_block_index += 1
+                    thinking_block_started = False
+


Variable thinking_block_started is not used.

Suggested change

# Close any open thinking block

if thinking_block_started:

yield f'event: content_block_stop\ndata: {{"type": "content_block_stop", "index": {current_block_index}}}\n\n'

current_block_index += 1

thinking_block_started = False

Copilot · 2025-12-19T14:42:23Z

src/proxy_app/main.py

+                if content_block_started:
+                    yield f'event: content_block_stop\ndata: {{"type": "content_block_stop", "index": {current_block_index}}}\n\n'
+                    current_block_index += 1
+                    content_block_started = False


Variable content_block_started is not used.

Suggested change

content_block_started = False

Copilot · 2025-12-19T14:42:23Z

src/proxy_app/main.py

+                continue
+
+            delta = choices[0].get("delta", {})
+            finish_reason = choices[0].get("finish_reason")


Variable finish_reason is not used.

Suggested change

finish_reason = choices[0].get("finish_reason")

Mirrowel · 2025-12-19T17:22:38Z

Problem I noticed from the start - only files edited are antigravity and main.py. main.py does not do anything by itself - logic should be in client.py or helper files called by it. Tldr the logic should be in the library.
Though, I can make the changes myself If you don't mind.

FammasMaz · 2025-12-19T17:54:48Z

Sure, go ahead

Edit: oh you want me to make these changes, maybe i misunderstood?

Mirrowel · 2025-12-19T19:08:34Z

Sure, go ahead

Edit: oh you want me to make these changes, maybe i misunderstood?

Preferably you, but it can go either way. I didn't set up any guidance for issues or PRs, contributing guidelines, etc. - so it is technically my fault you didn't do that.

Not sure when I'll get around to this, but soon-ish I'll review and work on making this be a part of the library proper.

This is full on anthropic translation layer, right? If it works like the bot says - it should be compatible with all providers automatically.

Mirrowel · 2025-12-19T19:12:02Z

@mirrobot-agent what will it take to implement what I said? To move it to the library and make it a separate method for anthropic endpoint proper. Converting to and from anthropic to openai, and from opeanai any existing provider can work as normal.

mirrobot-agent · 2025-12-19T19:12:44Z

@Mirrowel, I'm analyzing the codebase to provide a detailed plan for moving the Anthropic translation layer from main.py into the rotator_library. I'll examine the current structure and outline what changes would be needed to make this a proper library feature.

Give me a moment to review the architecture...

mirrobot-agent · 2025-12-19T19:13:39Z

@Mirrowel, I've completed my analysis of the codebase. Here's a detailed plan for moving the Anthropic translation layer from main.py into the rotator_library.

Summary

The goal is to extract the ~700 lines of Anthropic format translation code from main.py into the library so that:

Any existing provider can work with Anthropic-formatted requests automatically
The translation is a reusable component, not tied to the FastAPI proxy
The library maintains its OpenAI-centric internal format while supporting Anthropic clients

What Needs to Move

From main.py, the following components would be extracted:

1. Pydantic Models (~130 lines, lines 218-348)

AnthropicTextBlock, AnthropicImageSource, AnthropicImageBlock
AnthropicToolUseBlock, AnthropicToolResultBlock, AnthropicMessage
AnthropicTool, AnthropicThinkingConfig, AnthropicMessagesRequest
AnthropicUsage, AnthropicMessagesResponse
AnthropicCountTokensRequest, AnthropicCountTokensResponse

2. Format Translation Functions (~200 lines, lines 822-1068)

anthropic_to_openai_messages() - Converts Anthropic message format to OpenAI
anthropic_to_openai_tools() - Converts Anthropic tool definitions
anthropic_to_openai_tool_choice() - Converts tool_choice parameter
openai_to_anthropic_response() - Converts OpenAI responses back to Anthropic format

3. Streaming Wrapper (~280 lines, lines 1071-1351)

anthropic_streaming_wrapper() - Converts OpenAI SSE stream to Anthropic SSE format
Handles: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop

Proposed Library Architecture

New Files to Create

src/rotator_library/
├── anthropic_compat/
│   ├── __init__.py              # Exports main classes
│   ├── models.py                # Pydantic models for Anthropic format
│   ├── translator.py            # Format translation functions
│   └── streaming.py             # Streaming format converter

Integration Point: `RotatingClient`

Add new methods to client.py:

class RotatingClient:
    # ... existing code ...
    
    async def anthropic_messages(
        self,
        request: AnthropicMessagesRequest,
        raw_request: Optional[Any] = None  # For disconnect checks
    ) -> Union[AnthropicMessagesResponse, AsyncGenerator]:
        """
        Handle Anthropic Messages API requests.
        
        Internally:
        1. Translates Anthropic request → OpenAI format
        2. Calls existing acompletion() with OpenAI format
        3. Translates OpenAI response → Anthropic format
        """
        # Translate request
        openai_request = anthropic_to_openai_request(request)
        
        # Call existing OpenAI-compatible method
        if request.stream:
            response_stream = self.acompletion(**openai_request)
            return anthropic_streaming_wrapper(response_stream, request.model)
        else:
            response = await self.acompletion(**openai_request)
            return openai_to_anthropic_response(response, request.model)
    
    async def anthropic_count_tokens(
        self,
        request: AnthropicCountTokensRequest
    ) -> AnthropicCountTokensResponse:
        """Handle Anthropic count_tokens API requests."""
        # Translate and use existing token_count()
        ...

Implementation Steps

Step 1: Create `anthropic_compat/models.py`

Move all Pydantic model definitions
Keep them identical to maintain compatibility

Step 2: Create `anthropic_compat/translator.py`

Move all anthropic_to_openai_* and openai_to_anthropic_* functions
Add a high-level translate_request() function that handles the full conversion

Step 3: Create `anthropic_compat/streaming.py`

Move anthropic_streaming_wrapper()
Refactor to be framework-agnostic (no FastAPI Request dependency)
Accept a generic is_disconnected callback for client disconnect checks

Step 4: Update `client.py`

Add anthropic_messages() method
Add anthropic_count_tokens() method
Import from anthropic_compat module

Step 5: Update `init.py`

Export new Anthropic classes for external use

Step 6: Simplify `main.py`

Replace ~700 lines with ~50 lines calling client.anthropic_messages()
Keep only FastAPI endpoint definitions and error handling

Benefits

Reusability: The translation layer can be used by any application using rotator_library, not just the proxy
Maintainability: Anthropic-specific code is isolated in its own module
Testability: Format translation can be unit tested independently
Extensibility: Easy to add support for other API formats (e.g., Bedrock, Vertex) following the same pattern

Effort Estimate

Component	Lines of Code	Complexity
Create `models.py`	~130	Low (copy/move)
Create `translator.py`	~200	Medium (refactor for reuse)
Create `streaming.py`	~280	Medium (remove FastAPI deps)
Update `client.py`	~80	Medium (new methods)
Simplify `main.py`	-650 / +50	Low (remove/replace)
Total	~560 new, -650 removed	Medium

Questions

Should the Anthropic models/functions be exposed in the library's public API (via __init__.py), or kept internal?
Do you want me to implement this refactoring, or just provide this plan for FammasMaz or yourself to implement?

Let me know how you'd like to proceed!

This analysis was generated by an AI assistant.

Mirrowel · 2025-12-19T19:31:25Z

@mirrobot-agent implement this, and open a separate PR with that against this PR. Make sure it is comprehensive and integrated with any provider library and proxy work with. Maintain opeanai- compatible format internally, and convert to and from it for providers, so they can essentially be unchanged. Try to do this iteratively, so make a plan and go along with it, committing as you go. You are not allowed to commit to this PR - only the new one you will open.

mirrobot-agent · 2025-12-19T19:32:10Z

@Mirrowel, I'm on it! I'll implement the refactoring plan to move the Anthropic translation layer from main.py into the rotator_library as a proper, reusable module.

My Plan

I'll work iteratively, committing as I go:

Create new branch from feature/anthropic-endpoints
Create anthropic_compat/models.py - Extract all Pydantic models
Create anthropic_compat/translator.py - Extract format translation functions
Create anthropic_compat/streaming.py - Extract streaming wrapper (framework-agnostic)
Update client.py - Add anthropic_messages() and anthropic_count_tokens() methods
Update library exports - Make new module accessible
Simplify main.py - Replace ~700 lines with calls to library
Open PR against this branch

Starting now... 🚀

This task is being handled by an AI assistant.

mirrobot-agent · 2025-12-19T19:42:55Z

Refactoring PR Created

I've created PR #47 which refactors the Anthropic translation layer from this PR into the rotator_library as a proper, reusable module.

Key Changes in #47:

New rotator_library/anthropic_compat/ module with:
- Pydantic models for Anthropic API
- Format translation functions (Anthropic ↔ OpenAI)
- Framework-agnostic streaming wrapper
New RotatingClient methods:
- anthropic_messages() - Handle Anthropic Messages API
- anthropic_count_tokens() - Handle token counting
Simplified main.py: Removed ~663 lines of local code, now uses library components

Benefits:

Reusability for other applications using rotator_library
Clear separation between library and application code
Independent unit testability
Consistent with existing library architecture

See: #47

…atibility - Add /v1/messages endpoint with Anthropic-format request/response - Support both x-api-key and Bearer token authentication - Implement Anthropic <-> OpenAI format translation for messages, tools, and responses - Add streaming wrapper converting OpenAI SSE to Anthropic SSE events - Handle tool_use blocks with proper stop_reason detection - Fix NoneType iteration bug in tool_calls handling

- Add AnthropicThinkingConfig model and thinking parameter to request - Translate Anthropic thinking config to reasoning_effort for providers - Handle reasoning_content in streaming wrapper (thinking_delta events) - Convert reasoning_content to thinking blocks in non-streaming responses

When no thinking config is provided in the request, Opus models now automatically use reasoning_effort=high with custom_reasoning_budget=True. This ensures Opus 4.5 uses the full 32768 token thinking budget instead of the backend's auto mode (thinkingBudget: -1) which may use less. Opus always uses the -thinking variant regardless, but this change guarantees maximum thinking capacity for better reasoning quality.

…ling - Add validation to ensure maxOutputTokens > thinkingBudget for Claude extended thinking (prevents 400 INVALID_ARGUMENT API errors) - Improve streaming error handling to send proper message_start and content blocks before error event for better client compatibility - Minor code formatting improvements

Track each tool_use block index separately and emit content_block_stop for all blocks (thinking, text, and each tool_use) when stream ends. Fixes Claude Code stopping mid-action due to malformed streaming events.

…nabled - Fixed bug where budget_tokens between 10000-32000 would get ÷4 reduction - Now any explicit thinking request sets custom_reasoning_budget=True - Added logging to show thinking budget, effort level, and custom_budget flag - Simplified budget tier logic (removed redundant >= 32000 check) Before: 31999 tokens requested → 8192 tokens actual (÷4 applied) After: 31999 tokens requested → 32768 tokens actual (full "high" budget)

When using /v1/chat/completions with Opus and reasoning_effort="high" or "medium", automatically set custom_reasoning_budget=true to get full thinking tokens instead of the ÷4 reduced default. This makes the OpenAI endpoint behave consistently with the Anthropic endpoint for Opus models - if you're using Opus with high reasoning, you want the full thinking budget. Adds logging: "🧠 Thinking: auto-enabled custom_reasoning_budget for Opus"

…ponses

…treaming Claude Code and other Anthropic SDK clients require message_start to be sent before any other SSE events. When a stream completed quickly without content chunks, the wrapper would send message_stop without message_start, causing clients to silently discard all output.

Signed-off-by: Moeeze Hassan <[email protected]>

This reverts commit e80645e.

mirrobot-agent

Follow-up Review

This incremental update adds a tool result thinking reminder feature for Claude and strengthens the interleaved thinking enforcement.

Assessment of New Changes:

The implementation is clean and well-integrated:

Tool Result Thinking Reminder: New feature that appends a <system-reminder> block to tool results for Claude, reinforcing the interleaved thinking requirement. The implementation correctly handles both multimodal (list) and string content paths. Configurable via ANTIGRAVITY_ENABLE_CLAUDE_TOOL_RESULT_REMINDER env var.
Strengthened Interleaved Thinking Hint: The hint is now both prepended AND appended to system instructions. The prompt language was also strengthened ("IS UNCOMPROMISINGLY A MUST DO"). This is a reasonable strategy to improve Claude's compliance with thinking requirements.
Code Formatting: Import reordering and consistent line-break formatting throughout. No functional impact.

Minor Observations (not blocking):

The duplicate hint injection (prepend + append) doubles token overhead for this instruction. If token budget ever becomes a concern, one injection point may suffice.
For dict tool results, the _system_reminder key could theoretically collide with tool output. A more unique key like __anthropic_system_reminder__ would reduce this edge case risk.

Previous Feedback Status:

Two non-blocking suggestions from my 2026-01-02 review remain open but are unrelated to this update:

⚠️ Code duplication: _reorder_assistant_content() duplicated between translator.py and antigravity_provider.py
⚠️ Mutation side effect: _merge_all_of() mutates input schema in-place

Overall Status:

The new changes are solid and ready to merge. No blocking issues identified.

This review was generated by an AI assistant.

- Integrate refactored utilities from dev (gemini_shared_utils, etc.) - Keep Antigravity system prompts (critical for API) - Add Claude interleaved thinking features from feature branch - Ensure interleaved thinking hint comes AFTER Antigravity prompts

- Add explicit_budget parameter to _get_thinking_config - Cap Claude thinking budget at 31999 when explicit budget provided - Pass thinking_budget kwarg from Anthropic translator to provider

Ignore client's budget_tokens value and always use 31999 for Claude via Anthropic routes to ensure full thinking capacity.

mirrobot-agent

Follow-up Review

This update is a significant refactoring that extracts shared logic between Antigravity and Gemini CLI providers into reusable utility modules, reducing code duplication and improving maintainability.

Key Changes Reviewed:

New Utility Modules (providers/utilities/):
- gemini_shared_utils.py - Common constants (FINISH_REASON_MAP, DEFAULT_SAFETY_SETTINGS) and helpers (env_bool, env_int, inline_schema_refs, etc.)
- gemini_file_logger.py - Transaction logging classes
- gemini_tool_handler.py - Tool schema transformation mixin
- gemini_credential_manager.py - Credential/tier management mixin
- base_quota_tracker.py - Abstract base for quota tracking
- gemini_cli_quota_tracker.py - Gemini CLI specific quota tracking
Antigravity System Instruction Injection:
- New ANTIGRAVITY_AGENT_SYSTEM_INSTRUCTION prepended to requests (mimics CLIProxyAPI Go implementation)
- Identity override instruction to neutralize the Antigravity persona for user-defined identities
- Configurable via ANTIGRAVITY_PREPEND_INSTRUCTION, ANTIGRAVITY_INJECT_IDENTITY_OVERRIDE, ANTIGRAVITY_USE_SHORT_PROMPTS
Simplified Thinking Logic:
- Removed _is_valid_thinking_signature() validation checks
- Removed custom_reasoning_budget parameter (simplified to just reasoning_effort)
- Removed _merge_all_of() schema helper (addresses previous feedback ✅)
- Added fallback signature: "skip_thought_signature_validator" when signature is missing
Quota Tracking Updates:
- Changed from percentage-based (DEFAULT_QUOTA_COSTS) to max-requests-based (DEFAULT_MAX_REQUESTS)
- Added normalize_model_for_tracking() for public-facing model name normalization
- Updated quota groups: g3-pro, g3-flash, g25-flash, g25-lite (separated Flash Lite)
env:// Credential Support:
- Multiple environment credentials via PROVIDER_N_ACCESS_TOKEN pattern
- Documented in README and DOCUMENTATION.md

Previous Feedback Status:

Feedback	Status
`_merge_all_of()` mutation side effect	✅ Resolved (function removed)
`_reorder_assistant_content()` duplication	⚠️ Still present - `translator.py:18` and `antigravity_provider.py:466` (low priority)

Potential Behavioral Changes to Verify:

Cached Token Tracking Removed (antigravity_provider.py:_build_usage()):
- cachedContentTokenCount is no longer extracted from Gemini metadata
- prompt_tokens_details.cached_tokens is no longer populated
- This field is used by the Anthropic translator to calculate cache_read_input_tokens
- Impact: Requests via Anthropic compat layer will not show cache hit information
- Question: Intentional simplification or regression?
anthropic-beta Header Removed:
- The anthropic-beta: interleaved-thinking-2025-05-14 header is no longer added for Claude thinking models
- Previously added in acompletion() when thinking_enabled=True
- Question: Is this no longer required by the API?
Claude Interleaved Thinking Hint:
- Now only prepends the hint (removed the additional append at end of system instructions)
- Simplified from prepend+append to just prepend

Overall Status:

The refactoring is well-structured and significantly improves code organization. The behavioral changes noted above are non-blocking but worth verifying they were intentional.

This review was generated by an AI assistant.

mirrobot-agent

Follow-up Review

This incremental update contains code formatting improvements and a thinking budget simplification in translator.py.

Assessment of New Changes:

Formatting cleanup (lines 9-13, 20-24, 32-35, 43-46, 83-88): Multi-line formatting for better readability on signature checks, image_url dicts, and regex patterns. No functional impact.
Thinking budget simplification (lines 53-76): Now always uses 31999 as the thinking budget for Claude via Anthropic routes, removing the previous pass-through of client's budget_tokens. This is a reasonable simplification since 31999 is Claude's max thinking budget. Note that clients specifying lower budgets will now receive max thinking regardless.
Opus model handling: Also applies the fixed budget for Opus models, ensuring consistent behavior.

Previous Feedback Status:

Feedback	Status
`_reorder_assistant_content()` duplication	⚠️ Still present - `translator.py:18` and `antigravity_provider.py` (low priority)
`_merge_all_of()` mutation side effect	✅ Resolved (function removed)
Cached token tracking removal	ℹ️ Noted in previous review - awaiting clarification

Overall Status:

Clean incremental update with no issues identified. The changes improve code formatting and simplify thinking configuration. Ready to merge.

This review was generated by an AI assistant.

…y lacks thinking blocks

mirrobot-agent

Follow-up Review

This incremental update adds thinking history compatibility checking to the Anthropic translator, ensuring Claude API compliance when thinking is enabled mid-conversation.

Assessment of New Changes:

The implementation is clean and addresses a real edge case:

_history_supports_thinking() (lines 500-512): Correctly detects when conversation history contains assistant messages without thinking blocks. The logic properly handles the first-turn case (no assistant messages = compatible) and iterates through all assistant messages to verify thinking consistency.
_inject_continue_for_fresh_thinking_turn() (lines 515-517): Simple utility that appends a synthetic user message to close the current turn, allowing Claude to start fresh with thinking enabled.
Integration (lines 576-588): The [Continue] injection is correctly gated on not thinking_compatible for both explicit thinking config and implicit Opus model handling.

This mirrors similar logic in antigravity_provider.py (lines 1537-1610) but at the translator layer, providing earlier detection before messages reach the provider. Good layered approach.

Previous Feedback Status:

Feedback	Status
`_reorder_assistant_content()` duplication	⚠️ Still present (`translator.py:18`, `antigravity_provider.py`) - low priority
`_merge_all_of()` mutation side effect	✅ Resolved (function removed)
Cached token tracking removal	ℹ️ Noted previously - awaiting clarification

Overall Status:

Clean implementation. One minor inline comment for consideration. Ready to merge.

This review was generated by an AI assistant.

mirrobot-agent · 2026-01-08T13:27:31Z

src/rotator_library/anthropic_compat/translator.py

+def _inject_continue_for_fresh_thinking_turn(openai_messages: List[dict]) -> List[dict]:
+    openai_messages.append({"role": "user", "content": "[Continue]"})
+    return openai_messages


Minor (non-blocking): This function mutates the input list in-place while also returning it. Consider using return openai_messages + [{"role": "user", "content": "[Continue]"}] to avoid side effects, which would be safer for future maintainability.

The current implementation works correctly - this is just a stylistic suggestion.

Token counting endpoints (/v1/token-count and /v1/messages/count_tokens) were returning inaccurate counts because they didn't include the Antigravity preprompts that get injected during actual API calls. - Add get_antigravity_preprompt_text() helper to expose preprompt text - Update RotatingClient.token_count() to add preprompt tokens for Antigravity provider models Signed-off-by: Moeeze Hassan <[email protected]>

mirrobot-agent

Follow-up Review

This incremental update adds preprompt-aware token counting for Antigravity provider models, ensuring token count endpoints accurately reflect actual API usage.

Assessment of New Changes:

The implementation is clean and well-designed:

get_antigravity_preprompt_text() (antigravity_provider.py:452-485): New helper function that correctly mirrors the logic from _inject_antigravity_system_instruction(), respecting all three configuration flags (PREPEND_INSTRUCTION, USE_SHORT_ANTIGRAVITY_PROMPTS, INJECT_IDENTITY_OVERRIDE). Good docstring explaining the purpose.
token_count() enhancement (client.py:2489-2504): Properly detects Antigravity provider models and adds preprompt tokens to the base count. The lazy import pattern (from .providers.antigravity_provider import ...) avoids circular dependency issues, and the try/except ImportError provides graceful fallback.

Technical Notes:

Token counts won't include Claude-specific hints (interleaved thinking reminder) that are added conditionally at runtime—this is an acceptable simplification since those hints are relatively small and model-dependent.

Previous Feedback Status:

Feedback	Status
`_reorder_assistant_content()` duplication	⚠️ Still present (`translator.py`, `antigravity_provider.py`) - low priority cleanup
All other previous feedback	✅ Addressed

Overall Status:

The changes are solid, well-documented, and ready to merge. No issues identified.

This review was generated by an AI assistant.

…oints

mirrobot-agent

Follow-up Review

This incremental update adds full URL support to the Quota Viewer TUI, allowing users to configure remotes using complete URLs (e.g., https://api.example.com/v1) instead of just host:port pairs.

Assessment of New Changes:

The implementation is clean and well-structured:

is_full_url() helper: Simple and correct check for URL prefixes
_build_endpoint_url() with path deduplication: Intelligent handling to prevent /v1/v1/... duplication when base URL already contains a path segment. Good docstring with clear examples.
check_connection() update: Correctly hits the root domain rather than the path (which would 404)
Display formatting: Properly handles display of full URLs vs host:port pairs throughout the UI
Config type updates: port now accepts Optional[Union[int, str]] with proper normalization

The user experience improvements (hints in add/edit dialogs, wider display columns for long URLs) are thoughtful additions.

Previous Feedback Status:

Feedback	Status
`_reorder_assistant_content()` duplication	⚠️ Still present (unrelated to this update)
All other previous feedback	✅ Addressed

Overall Status:

No issues identified. The changes are solid and ready to merge.

This review was generated by an AI assistant.

Resolve conflicts by adopting dev's interleaved thinking implementation and removing duplicate code from this branch.

Remove remaining references to removed interleaved thinking attributes that were brought in during merge.

mirrobot-agent

Follow-up Review

This incremental update brings significant improvements across workflow reliability, logging architecture, and Antigravity provider capabilities.

Key Changes Reviewed:

Workflow Hardening (bot-reply.yml, pr-review.yml, compliance-check.yml):
- Bot user checks prevent infinite loops when bot comments trigger bot actions
- Comment validation filters triggers in code blocks/quotes (prevents false positives)
- Clean conditional propagation via should_proceed output
- Build workflow now correctly scoped to dev/main branches
Logging Architecture Refactoring:
- New TransactionLogger in transaction_logger.py provides unified, correlated logging
- Clear separation: "Transaction Logging" (library-level with provider correlation) vs "Raw I/O Logging" (proxy boundary HTTP capture)
- RawIOLogger (renamed from DetailedLogger) clarifies purpose
- ProviderLogger base class with AntigravityProviderLogger extension
- Removed scattered provider-specific loggers in favor of centralized system
Antigravity Provider Enhancements:
- Tool name sanitization: New _sanitize_tool_name()/_restore_tool_name() handles invalid characters (slashes) with collision detection
- Schema handling: New _score_schema_option(), _try_merge_enum_from_union(), _merge_all_of() for intelligent anyOf/oneOf/allOf handling
- Interleaved thinking: CLAUDE_INTERLEAVED_THINKING_HINT and user-message reminder injection
- Unified streaming: Non-streaming now collects from streaming internally (simpler code path)
- Thinking model detection: Opus/-thinking suffix models always enable thinking
Library Updates:
- Version bump: 1.05 → 1.15
- Python requirement: 3.7 → 3.10

Previous Feedback Status:

Feedback	Status
`_reorder_assistant_content()` duplication	⚠️ Still present in `translator.py:18` and `antigravity_provider.py:538` (low priority cleanup)
`_merge_all_of()` mutation concern	ℹ️ Re-introduced with shallow copy; nested mutation still possible but likely acceptable given usage context

Overall Status:

Excellent incremental update with no blocking issues. The logging consolidation significantly improves debuggability, and the workflow hardening prevents potential bot loops. Ready to merge.

This review was generated by an AI assistant.

Bring in latest dev changes including: - Docker support and workflows - Fair cycle rotation and custom usage caps - Smart cooldown waiting and fail-fast logic - Centralized library defaults - Dynamic custom OpenAI-compatible provider system - Interactive connection recovery

…remove legacy hacks Replaces ad-hoc thinking logic with a structured mapping from Anthropic `budget_tokens` to `reasoning_effort` levels. This change aligns the translation layer with standard provider capabilities and cleans up deprecated workarounds. - Implement `_budget_to_reasoning_effort` to convert token counts to reasoning levels (e.g., "low", "medium", "high", "granular"). - Remove legacy logic that forced max thinking budget for Claude Opus models. - Remove workaround for injecting "[Continue]" messages into conversation history. - Delete unused helper functions in `AntigravityProvider` (signature validation, content reordering, and explicit budget overrides).

Mirrowel · 2026-01-15T17:21:27Z

Hey @FammasMaz! 👋

I've been reviewing this PR and did some refactoring to make the Anthropic translation layer more provider-agnostic.

Changes proposed in #76:

Replaced hardcoded thinking_budget = 31999 with a proper budget→reasoning_effort level mapping
Removed provider-specific workarounds (_is_opus_model, _inject_continue_for_fresh_thinking_turn, etc.)
Cleaned up duplicate code in antigravity_provider.py

The goal is to keep the translation layer pure - just format conversion between Anthropic ↔ OpenAI, without provider-specific logic baked in.

See: #76

Ai-generated^^

…ompatibility This change introduces a hierarchical logging structure to better trace requests passing through the translation layer. - Update `TransactionLogger` to support nested directories (`parent_dir`) and custom filenames, allowing internal OpenAI transactions to be logged as children of the original Anthropic request. - Implement full response reconstruction in `anthropic_streaming_wrapper` to accumulate and log the final state of streaming interactions (including thinking blocks and tool calls). - Modify `RotatingClient` to pass logging context down to the translation layer. - Switch `proxy_app` to use `RawIOLogger` when enabled for better debugging of the proxy boundary.

The previous implementation using `delta.get("tool_calls", [])` would return `None` if the provider explicitly sent `"tool_calls": null`, bypassing the default value. This change ensures `tool_calls` always resolves to a list using the `or []` pattern, preventing potential errors during iteration.

Updates project documentation to reflect the new Anthropic API compatibility features: - **README.md**: Add setup guides for Claude Code and Anthropic Python SDK, plus API endpoint details. - **DOCUMENTATION.md**: Add deep dive into the `anthropic_compat` architecture, including translation logic and streaming behavior. - **Library Docs**: Document `anthropic_messages` and `anthropic_count_tokens` methods in `rotator_library`.

Copilot AI review requested due to automatic review settings December 19, 2025 14:38

Copilot started reviewing on behalf of FammasMaz December 19, 2025 14:38 View session

mirrobot-agent bot reviewed Dec 19, 2025

View reviewed changes

Copilot AI reviewed Dec 19, 2025

View reviewed changes

Mirrowel added enhancement New feature or request Agent Monitored Monitored for AI Agent to review PR's and commits Priority labels Dec 19, 2025

mirrobot-agent bot mentioned this pull request Dec 19, 2025

refactor: Move Anthropic translation layer to library #47

Closed

FammasMaz added 12 commits December 20, 2025 22:26

fix(anthropic): properly close all content blocks in streaming wrapper

e35f3f0

Track each tool_use block index separately and emit content_block_stop for all blocks (thinking, text, and each tool_use) when stream ends. Fixes Claude Code stopping mid-action due to malformed streaming events.

fix(anthropic): add missing uuid import for /v1/messages endpoint

4ec92ec

fix(anthropic): add missing JSONResponse import for non-streaming res…

758b4b5

…ponses

feat: add /context endpoint for anthropic routes

de88557

Signed-off-by: Moeeze Hassan <[email protected]>

Revert "feat(openai): auto-enable full thinking budget for Opus"

beed0bc

This reverts commit e80645e.

mirrobot-agent bot previously approved these changes Jan 5, 2026

View reviewed changes

FammasMaz added 2 commits January 8, 2026 10:01

feat(anthropic): respect explicit thinking_budget from Anthropic routes

d4ad8af

- Add explicit_budget parameter to _get_thinking_config - Cap Claude thinking budget at 31999 when explicit budget provided - Pass thinking_budget kwarg from Anthropic translator to provider

FammasMaz dismissed mirrobot-agent[bot]’s stale review via d4ad8af January 8, 2026 09:10

feat(anthropic): always use max thinking budget (31999) for Claude

9d568fe

Ignore client's budget_tokens value and always use 31999 for Claude via Anthropic routes to ensure full thinking capacity.

mirrobot-agent bot reviewed Jan 8, 2026

View reviewed changes

mirrobot-agent bot previously approved these changes Jan 8, 2026

View reviewed changes

fix(anthropic): inject [Continue] for fresh thinking turn when histor…

67ffea5

…y lacks thinking blocks

FammasMaz dismissed mirrobot-agent[bot]’s stale review via 67ffea5 January 8, 2026 13:25

mirrobot-agent bot previously approved these changes Jan 8, 2026

View reviewed changes

FammasMaz dismissed mirrobot-agent[bot]’s stale review via b7b5d07 January 8, 2026 13:37

mirrobot-agent bot approved these changes Jan 8, 2026

View reviewed changes

Merge remote-tracking branch 'origin/dev' into feature/anthropic-endp…

4aa703f

…oints

mirrobot-agent bot previously approved these changes Jan 8, 2026

View reviewed changes

FammasMaz added 2 commits January 10, 2026 00:54

Merge origin/dev into feature/anthropic-endpoints

9d4799e

Resolve conflicts by adopting dev's interleaved thinking implementation and removing duplicate code from this branch.

fix(antigravity): remove stale interleaved thinking references

49d2e47

Remove remaining references to removed interleaved thinking attributes that were brought in during merge.

FammasMaz dismissed mirrobot-agent[bot]’s stale review via 49d2e47 January 10, 2026 00:21

mirrobot-agent bot previously approved these changes Jan 10, 2026

View reviewed changes

Mirrowel removed the Agent Monitored Monitored for AI Agent to review PR's and commits label Jan 15, 2026

Mirrowel added 2 commits January 15, 2026 17:13

Mirrowel mentioned this pull request Jan 15, 2026

refactor: clean up Anthropic translation layer (for PR #45 review) #76

Closed

Mirrowel added 3 commits January 15, 2026 19:17

FammasMaz dismissed mirrobot-agent[bot]’s stale review via 1798e75 January 15, 2026 20:51

Mirrowel merged commit a9be460 into Mirrowel:dev Jan 15, 2026
4 of 5 checks passed

Mirrowel mentioned this pull request Jan 15, 2026

Anthropic API Endpoint #33

Closed

	"id": tc.get("id", f"toolu_{int(time.time())}"),
	"id": tc.get("id", f"toolu_{uuid.uuid4().hex[:12]}"),

-        # CRITICAL: For Claude with extended thinking, max_tokens MUST be > thinking.budget_tokens
-        # Per Claude docs: https://docs.claude.com/en/docs/build-with-claude/extended-thinking
-        # If this constraint is violated, the API returns 400 INVALID_ARGUMENT
+        # CRITICAL: For Claude models that support extended thinking (e.g., Sonnet 4.5 and Opus 4.5
+        # via the Antigravity proxy), max_tokens MUST be > thinking.budget_tokens when
+        # extended thinking is enabled. Per Claude docs:
+        # https://docs.claude.com/en/docs/build-with-claude/extended-thinking
+        # If this constraint is violated for those models, the API can return 400 INVALID_ARGUMENT

+        # Preserve Anthropic metadata for logging/tracking or downstream use
+        metadata = anthropic_request.get("metadata")
+        if metadata is not None:
+            openai_request["metadata"] = metadata

Uh oh!

Anthropic endpoint for claude code #45

Anthropic endpoint for claude code #45

Uh oh!

Conversation

FammasMaz commented Dec 19, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mirrobot-agent bot commented Dec 19, 2025

Uh oh!

mirrobot-agent bot left a comment

Choose a reason for hiding this comment

Overall Assessment

Architectural Feedback

Key Suggestions

Minor Observations (Not Inline)

Questions for the Author

Uh oh!

mirrobot-agent bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Mirrowel commented Dec 19, 2025

Uh oh!

FammasMaz commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mirrowel commented Dec 19, 2025

Uh oh!

Mirrowel commented Dec 19, 2025

Uh oh!

mirrobot-agent bot commented Dec 19, 2025

Uh oh!

mirrobot-agent bot commented Dec 19, 2025

Summary

What Needs to Move

1. Pydantic Models (~130 lines, lines 218-348)

2. Format Translation Functions (~200 lines, lines 822-1068)

3. Streaming Wrapper (~280 lines, lines 1071-1351)

Proposed Library Architecture

New Files to Create

Integration Point: RotatingClient

Implementation Steps

FammasMaz commented Dec 19, 2025 •

edited by ellipsis-dev bot

Loading

FammasMaz commented Dec 19, 2025 •

edited

Loading

Integration Point: `RotatingClient`

Step 1: Create `anthropic_compat/models.py`

Step 2: Create `anthropic_compat/translator.py`

Step 3: Create `anthropic_compat/streaming.py`

Step 4: Update `client.py`

Step 5: Update `init.py`

Step 6: Simplify `main.py`

Mirrowel commented Dec 19, 2025 •

edited

Loading

Mirrowel commented Jan 15, 2026 •

edited

Loading