Skip to content

Conversation

@mirrobot-agent
Copy link
Contributor

@mirrobot-agent mirrobot-agent bot commented Dec 19, 2025

Summary

This PR refactors the Anthropic endpoint support from PR #45 by moving the translation layer into the rotator_library as a proper, reusable module.

Related to: #45

Changes

New Library Module: rotator_library/anthropic_compat/

  • models.py: Pydantic models for Anthropic API (requests, responses, content blocks)
  • translator.py: Format translation functions between Anthropic and OpenAI formats
  • streaming.py: Framework-agnostic streaming wrapper that converts OpenAI SSE to Anthropic SSE
  • __init__.py: Public exports

Updated rotator_library/client.py

Added two new methods to RotatingClient:

  • anthropic_messages() - Handle Anthropic Messages API requests
  • anthropic_count_tokens() - Handle token counting

Simplified proxy_app/main.py

  • Removed ~663 lines of local Anthropic code
  • Now imports models and functions from rotator_library.anthropic_compat
  • Endpoints work the same way but use library components
  • Fixed verify_anthropic_api_key to support open access mode

Benefits

  1. Reusability: The Anthropic translation layer can now be used by other applications using rotator_library
  2. Maintainability: Clear separation between library code and application code
  3. Testability: Library components can be unit tested independently
  4. Consistency: Follows the existing library architecture patterns

Files Changed

src/rotator_library/
├── anthropic_compat/
│   ├── __init__.py          (NEW)
│   ├── models.py            (NEW)
│   ├── translator.py        (NEW)
│   └── streaming.py         (NEW)
├── client.py                (MODIFIED)
└── __init__.py              (MODIFIED)

src/proxy_app/
└── main.py                  (MODIFIED)

Important

Refactor Anthropic translation layer into a reusable library module for improved code reusability and maintainability.

  • New Library Module: rotator_library/anthropic_compat/
    • models.py: Pydantic models for Anthropic API requests and responses.
    • translator.py: Functions for translating between Anthropic and OpenAI formats.
    • streaming.py: Converts OpenAI SSE to Anthropic SSE.
  • Client Updates in client.py:
    • Added anthropic_messages() and anthropic_count_tokens() methods to RotatingClient.
  • Main Application Simplification in main.py:
    • Removed ~663 lines of Anthropic-specific code.
    • Now uses rotator_library.anthropic_compat for Anthropic functionality.
    • Updated verify_anthropic_api_key() to support open access mode.

This description was created by Ellipsis for 9d30ea6. You can customize this summary. It will automatically update as commits are pushed.

FammasMaz and others added 18 commits December 19, 2025 15:03
…atibility

- Add /v1/messages endpoint with Anthropic-format request/response
- Support both x-api-key and Bearer token authentication
- Implement Anthropic <-> OpenAI format translation for messages, tools, and responses
- Add streaming wrapper converting OpenAI SSE to Anthropic SSE events
- Handle tool_use blocks with proper stop_reason detection
- Fix NoneType iteration bug in tool_calls handling
- Add AnthropicThinkingConfig model and thinking parameter to request
- Translate Anthropic thinking config to reasoning_effort for providers
- Handle reasoning_content in streaming wrapper (thinking_delta events)
- Convert reasoning_content to thinking blocks in non-streaming responses
When no thinking config is provided in the request, Opus models now
automatically use reasoning_effort=high with custom_reasoning_budget=True.

This ensures Opus 4.5 uses the full 32768 token thinking budget instead
of the backend's auto mode (thinkingBudget: -1) which may use less.

Opus always uses the -thinking variant regardless, but this change
guarantees maximum thinking capacity for better reasoning quality.
…ling

- Add validation to ensure maxOutputTokens > thinkingBudget for Claude
  extended thinking (prevents 400 INVALID_ARGUMENT API errors)
- Improve streaming error handling to send proper message_start and
  content blocks before error event for better client compatibility
- Minor code formatting improvements
Track each tool_use block index separately and emit content_block_stop
for all blocks (thinking, text, and each tool_use) when stream ends.
Fixes Claude Code stopping mid-action due to malformed streaming events.
…nabled

- Fixed bug where budget_tokens between 10000-32000 would get ÷4 reduction
- Now any explicit thinking request sets custom_reasoning_budget=True
- Added logging to show thinking budget, effort level, and custom_budget flag
- Simplified budget tier logic (removed redundant >= 32000 check)

Before: 31999 tokens requested → 8192 tokens actual (÷4 applied)
After:  31999 tokens requested → 32768 tokens actual (full "high" budget)
When using /v1/chat/completions with Opus and reasoning_effort="high" or
"medium", automatically set custom_reasoning_budget=true to get full
thinking tokens instead of the ÷4 reduced default.

This makes the OpenAI endpoint behave consistently with the Anthropic
endpoint for Opus models - if you're using Opus with high reasoning,
you want the full thinking budget.

Adds logging: "🧠 Thinking: auto-enabled custom_reasoning_budget for Opus"
…treaming

Claude Code and other Anthropic SDK clients require message_start to be
sent before any other SSE events. When a stream completed quickly without
content chunks, the wrapper would send message_stop without message_start,
causing clients to silently discard all output.
Extract Anthropic API models and format translation functions from main.py into reusable library module:

- models.py: Pydantic models for Anthropic Messages API (request/response)
- translator.py: Functions to convert between Anthropic and OpenAI formats
  - anthropic_to_openai_messages()
  - anthropic_to_openai_tools()
  - anthropic_to_openai_tool_choice()
  - openai_to_anthropic_response()
  - translate_anthropic_request() - High-level request translation

This is part of the refactoring to make Anthropic compatibility a proper library feature.
Add framework-agnostic streaming wrapper for Anthropic format:

- streaming.py: Converts OpenAI SSE format to Anthropic SSE format
  - Handles message_start, content_block_start/delta/stop, message_delta, message_stop
  - Supports text, thinking, and tool_use content blocks
  - Uses callback-based disconnect detection instead of FastAPI Request
  - Proper error handling with client-visible error blocks

- __init__.py: Export all models, translator functions, and streaming wrapper

The streaming wrapper is now reusable outside of FastAPI.
…ethods

Add high-level Anthropic API methods to RotatingClient:

- anthropic_messages(): Handle Anthropic Messages API requests
  - Accepts AnthropicMessagesRequest, translates to OpenAI format
  - Routes through existing acompletion() with full retry/rotation logic
  - Returns response in Anthropic format (streaming or non-streaming)

- anthropic_count_tokens(): Handle token counting for Anthropic requests
  - Counts tokens for messages and tools
  - Returns count in Anthropic format

These methods enable any application using rotator_library to support
Anthropic clients without needing to implement format translation.
…rary

Add lazy-loaded exports for Anthropic API models:

- AnthropicMessagesRequest
- AnthropicMessagesResponse
- AnthropicCountTokensRequest
- AnthropicCountTokensResponse

These can now be imported directly from rotator_library:
  from rotator_library import AnthropicMessagesRequest
Refactor main.py to use the anthropic_compat library module:

- Import models from rotator_library.anthropic_compat instead of local definitions
- Import translation functions from the library
- Remove ~130 lines of local Anthropic model definitions
- Keep endpoints functional by importing needed functions

The endpoints still work the same way, but now use the library's
reusable components. This is a stepping stone - the full client
methods (anthropic_messages, anthropic_count_tokens) can be used
for even simpler endpoints in the future.

Net reduction: ~130 lines removed from main.py
@mirrobot-agent
Copy link
Contributor Author

Time to examine my own handiwork! Let's see what past-me was thinking when moving 1,300+ lines of Anthropic compatibility code into the library. This should be interesting... 🔍

Copy link
Contributor Author

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-Review Assessment

Well, well, well... time to grade my own homework! 📝

Overall, this is a solid refactoring that moves the Anthropic compatibility layer from proxy_app into rotator_library as a reusable module. The architecture follows the existing library patterns, and the separation of concerns is clear. Past-me did a decent job here... mostly.

Architectural Reflections

The module structure (models.py, translator.py, streaming.py) is clean and intuitive. I particularly like that:

  • The streaming wrapper is framework-agnostic (accepts an optional disconnect callback rather than a FastAPI Request)
  • The lazy-loading in __init__.py maintains fast import times
  • The new RotatingClient methods provide a clean interface for Anthropic requests

Key Fixes I Should Make

  1. Code duplication in main.py: I created a nice translate_anthropic_request() function but then... didn't use it in the endpoint? Classic past-me.

  2. Type validation in models: The AnthropicThinkingConfig.type field accepts any string. Should use Literal["enabled", "disabled"] for proper validation.

  3. Silent JSON failures: When parsing tool call arguments fails, I silently return an empty dict. Adding a debug log would make troubleshooting much easier.

Things That Actually Look Good

  • The antigravity_provider.py addition for validating max_tokens > thinkingBudget is a good defensive check
  • Error handling in the streaming wrapper properly ensures message_start is sent before any error events
  • The token counting endpoint correctly handles both messages and tool definitions

Overall, this PR is ready to merge once the minor improvements above are addressed. Future-me thanks past-me for the decent documentation and clean module structure!

This self-review was generated by an AI assistant.

Comment on lines +1036 to +1089
# Convert Anthropic request to OpenAI format
anthropic_request = body.model_dump(exclude_none=True)

openai_messages = anthropic_to_openai_messages(
anthropic_request.get("messages", []), anthropic_request.get("system")
)

openai_tools = anthropic_to_openai_tools(anthropic_request.get("tools"))
openai_tool_choice = anthropic_to_openai_tool_choice(
anthropic_request.get("tool_choice")
)

# Build OpenAI-compatible request
openai_request = {
"model": body.model,
"messages": openai_messages,
"max_tokens": body.max_tokens,
"stream": body.stream or False,
}

if body.temperature is not None:
openai_request["temperature"] = body.temperature
if body.top_p is not None:
openai_request["top_p"] = body.top_p
if body.stop_sequences:
openai_request["stop"] = body.stop_sequences
if openai_tools:
openai_request["tools"] = openai_tools
if openai_tool_choice:
openai_request["tool_choice"] = openai_tool_choice

# Handle Anthropic thinking config -> reasoning_effort translation
if body.thinking:
if body.thinking.type == "enabled":
# Map budget_tokens to reasoning_effort level
# Default to "medium" if enabled but budget not specified
budget = body.thinking.budget_tokens or 10000
if budget >= 32000:
openai_request["reasoning_effort"] = "high"
openai_request["custom_reasoning_budget"] = True
elif budget >= 10000:
openai_request["reasoning_effort"] = "high"
elif budget >= 5000:
openai_request["reasoning_effort"] = "medium"
else:
openai_request["reasoning_effort"] = "low"
elif body.thinking.type == "disabled":
openai_request["reasoning_effort"] = "disable"
elif "opus" in body.model.lower():
# Force high thinking for Opus models when no thinking config is provided
# Opus 4.5 always uses the -thinking variant, so we want maximum thinking budget
# Without this, the backend defaults to thinkingBudget: -1 (auto) instead of high
openai_request["reasoning_effort"] = "high"
openai_request["custom_reasoning_budget"] = True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it seems past-me got a bit ahead of himself! I specifically created translate_anthropic_request() in the library to handle this translation, but then proceeded to duplicate the logic here manually.

The thinking budget mapping logic (lines 1067-1089) is nearly identical to what I put in translator.py (lines 283-304). This could lead to drift if one is updated without the other.

I should either:

  1. Use translate_anthropic_request() directly, or
  2. Document why the endpoint needs different handling than the library method

Comment on lines +80 to +84
class AnthropicThinkingConfig(BaseModel):
"""Anthropic thinking configuration."""

type: str # "enabled" or "disabled"
budget_tokens: Optional[int] = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Past-me was feeling a bit lazy here. The type field should be constrained to valid values ("enabled" or "disabled"). Using a plain str allows invalid values to slip through validation.

Consider using:

from typing import Literal
# ...
type: Literal["enabled", "disabled"]

This gives proper IDE autocomplete and catches typos at validation time.

Comment on lines +246 to +249
try:
input_data = json.loads(func.get("arguments", "{}"))
except json.JSONDecodeError:
input_data = {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My past-self used a silent fallback here - while pragmatic, it could mask issues where tool arguments are malformed. When tools mysteriously receive empty inputs, this will be hard to debug.

A quick logging statement would help future-me:

except json.JSONDecodeError as e:
    logging.debug(f"Failed to parse tool arguments: {e}")
    input_data = {}

@Mirrowel Mirrowel added enhancement New feature or request Agent Monitored Monitored for AI Agent to review PR's and commits Priority labels Dec 19, 2025
@b3nw
Copy link
Contributor

b3nw commented Jan 9, 2026

BUG Report: Anthropic Compatibility & Claude Code Integration

Executive Summary

This document tracks critical bugs identified and resolved during the testing of PR #47 (Anthropic Compatibility) when used with the Claude Code v2.1.2 CLI client.

Resolved Issues:

  • Antigravity 400 Error (Tool Schema): Prevented crashes during tool-use by stripping unsupported enumDescriptions from JSON schemas.
  • Streaming Wrapper TypeError: Fixed a 500 Internal Server Error caused by a parameter mismatch in the async streaming generator.
  • Redundant Translation Logic: Refactored main.py to use core library methods, ensuring consistent behavior and easier maintenance.
  • Pydantic Validation Failures: Relaxed validation to allow "extra" fields, supporting Claude Code's newer beta features like interleaved-thinking.

1. Antigravity API 400 Error (Tool Schemas)

Issue Description

Requests failed with a 400 Bad Request error from the Antigravity backend whenever Claude Code attempted to use tools (indexing, bash commands, etc.).

Root Cause

The Antigravity API is a strict Proto-based interface. Claude Code includes an enumDescriptions field in its tool parameter schemas which is not part of the standard Google Gemini tool specification.

Error Message

"message": "Invalid JSON payload received. Unknown name \"enumDescriptions\" at 'request.tools[0]... Cannot find field."

Resolution

Modified antigravity_provider.py to recursively strip enumDescriptions from all tool definitions before sending the request to the Google backend.


2. TypeError in Anthropic Streaming Wrapper

Issue Description

Claude Code sessions triggered a server-side 500 error immediately upon starting a stream.

Root Cause

A parameter mismatch in the /v1/messages endpoint within main.py. The code was incorrectly passing the FastAPI Request object into a generator that expected an openai_stream object, causing an iteration failure.

Error Message

Error in Anthropic streaming wrapper: 'async for' requires an object with __aiter__ method, got Request
TypeError: Object of type async_generator is not JSON serializable

Resolution

Refactored the endpoints in main.py to delegate request handling to the RotatingClient.anthropic_messages() method, which handles the parameter passing and stream wrapping correctly.


3. Pydantic 422 "Unprocessable Content"

Issue Description

Requests containing newer Anthropic beta headers (like interleaved-thinking-2025-05-14) were rejected with a 422 error before reaching the provider logic.

Root Cause

The Pydantic models for AnthropicMessagesRequest were too strict and did not allow extra fields. Claude Code frequently adds experimental fields to its JSON payload.

Resolution

Updated models.py to set model_config = ConfigDict(extra="allow") for all Anthropic request objects, ensuring future-proof compatibility with evolving Anthropic SDKs.


@b3nw
Copy link
Contributor

b3nw commented Jan 9, 2026

Issue: Hardcoded Background Model Probes in Claude Code

Description

When using Claude Code (v2.1.2+) with the LLM-API-Key-Proxy, the client performs background "probes" for specific model names that are not explicitly requested by the user in the --model flag.

These probes appear in the proxy logs as requests for models such as:

  • claude-haiku-4-5-20251001
  • gemini-claude-opus-4-5-thinking
  • claude-3-5-sonnet-20241022

Root Cause

These model names are hardcoded into the Claude Code binary for two primary purposes:

  1. Capability Discovery: Checking if the endpoint supports specific beta features (e.g., interleaved-thinking, computer-use).
  2. Task Scaling: Attempting to use a cheaper "Haiku-class" model for background indexing or file scanning tasks to optimize performance and cost.

Because these requests are generated internally by the client, they lack the antigravity/ provider prefix and use futuristic version strings that do not exist in the proxy's default mapping tables, resulting in 404 Not Found errors from the backend.

Impact

  • Endless Thinking: Claude Code may hang or show a "thinking" spinner indefinitely while it retries these failed background probes.
  • Log Noise: Proxy logs are cluttered with 400/404 errors for models that are not configured.
  • Session Instability: Critical background tasks like code indexing may fail to complete.

Recommended Workaround

Since this behavior is hardcoded in the client and cannot be disabled via --no-stats or environment variables, the recommended solution is to provide a "safety net" mapping in the Proxy's configuration.

Add the following to your LLM-API-Key-Proxy/.env file to funnel these probes into the models you actually have credentials for:

# Map prefix-less background probes to the Antigravity provider
ANTIGRAVITY_MODELS='{
  "claude-haiku-4-5-20251001": {"id": "claude-sonnet-4-5"},
  "gemini-claude-opus-4-5-thinking": {"id": "claude-opus-4-5"},
  "claude-3-5-sonnet-20241022": {"id": "claude-sonnet-4-5"}
}'

Status

@Mirrowel Mirrowel closed this Jan 15, 2026
@Mirrowel Mirrowel deleted the refactor/anthropic-library-integration branch January 16, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Agent Monitored Monitored for AI Agent to review PR's and commits enhancement New feature or request Priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants