Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
b4df352
feat(proxy): add Anthropic Messages API endpoint for Claude Code comp…
FammasMaz Dec 10, 2025
7e229f4
feat(anthropic): add extended thinking support to /v1/messages endpoint
FammasMaz Dec 12, 2025
7aea08e
feat(anthropic): force high thinking budget for Opus models by default
FammasMaz Dec 12, 2025
05d89a2
fix: ensure max_tokens exceeds thinking budget and improve error hand…
FammasMaz Dec 13, 2025
e35f3f0
fix(anthropic): properly close all content blocks in streaming wrapper
FammasMaz Dec 14, 2025
4ec92ec
fix(anthropic): add missing uuid import for /v1/messages endpoint
FammasMaz Dec 14, 2025
b70efdf
fix(anthropic): always set custom_reasoning_budget when thinking is e…
FammasMaz Dec 14, 2025
4bd879b
feat(openai): auto-enable full thinking budget for Opus
FammasMaz Dec 14, 2025
758b4b5
fix(anthropic): add missing JSONResponse import for non-streaming res…
FammasMaz Dec 14, 2025
f2d7288
fix(anthropic): ensure message_start is sent before message_stop in s…
FammasMaz Dec 15, 2025
de88557
feat: add /context endpoint for anthropic routes
FammasMaz Dec 16, 2025
beed0bc
Revert "feat(openai): auto-enable full thinking budget for Opus"
FammasMaz Dec 19, 2025
2c93a68
Revert "fix(anthropic): always set custom_reasoning_budget when think…
FammasMaz Dec 19, 2025
b19526c
refactor: Move Anthropic translation layer to rotator_library
FammasMaz Dec 20, 2025
d91f98b
fix(anthropic): improve model detection and document thinking budget
FammasMaz Dec 20, 2025
16c889f
fix(anthropic): handle images in tool results for Claude Code
FammasMaz Dec 22, 2025
545d0d5
fix(anthropic): force Claude thinking budget and interleaved hint
FammasMaz Dec 31, 2025
765df7a
fix(anthropic): read thinking budget from client request
FammasMaz Dec 31, 2025
5af1f10
fix(anthropic): handle thinking toggle for text-only assistant messages
FammasMaz Jan 1, 2026
0bb8a52
fix(anthropic): strengthen interleaved thinking hint
FammasMaz Jan 1, 2026
991a8e3
fix(antigravity): remove unreachable is_claude condition in thinking …
FammasMaz Jan 1, 2026
354ac17
fix(antigravity): add debug logging for non-data URL images
FammasMaz Jan 1, 2026
b81ca57
fix(anthropic): correct cache token handling in usage responses
FammasMaz Jan 2, 2026
97ef2d1
feat(anthropic): add 5 translation improvements from reference
FammasMaz Jan 2, 2026
dc19691
fix(antigravity): make interleaved thinking hint more explicit
FammasMaz Jan 2, 2026
5a8258c
fix(antigravity): reject requests exceeding Claude's 64K max_tokens l…
FammasMaz Jan 5, 2026
bbc1060
experimental: try to be more explicit about must think instruction
FammasMaz Jan 5, 2026
3fc1436
Merge origin/dev into feature/anthropic-endpoints
FammasMaz Jan 8, 2026
d4ad8af
feat(anthropic): respect explicit thinking_budget from Anthropic routes
FammasMaz Jan 8, 2026
9d568fe
feat(anthropic): always use max thinking budget (31999) for Claude
FammasMaz Jan 8, 2026
67ffea5
fix(anthropic): inject [Continue] for fresh thinking turn when histor…
FammasMaz Jan 8, 2026
b7b5d07
fix(token-count): include Antigravity preprompt tokens in count
FammasMaz Jan 8, 2026
4aa703f
Merge remote-tracking branch 'origin/dev' into feature/anthropic-endp…
FammasMaz Jan 8, 2026
9d4799e
Merge origin/dev into feature/anthropic-endpoints
FammasMaz Jan 9, 2026
49d2e47
fix(antigravity): remove stale interleaved thinking references
FammasMaz Jan 10, 2026
aa88eb3
Merge origin/dev into feature/anthropic-endpoints
Mirrowel Jan 15, 2026
8e10a66
refactor(rotator_library): 🔨 standardize thinking budget mapping and …
Mirrowel Jan 15, 2026
d9f2ddb
feat(logging): ✨ implement nested transaction logging for anthropic c…
Mirrowel Jan 15, 2026
6d9f9cc
fix(anthropic-compat): 🐛 handle null tool_calls in streaming delta
Mirrowel Jan 15, 2026
1798e75
docs: 📚 document anthropic api compatibility layer and client usage
Mirrowel Jan 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions DOCUMENTATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ The project is a monorepo containing two primary components:
* **Batch Manager**: Optimizes high-volume embedding requests.
* **Detailed Logger**: Provides per-request file logging for debugging.
* **OpenAI-Compatible Endpoints**: `/v1/chat/completions`, `/v1/embeddings`, etc.
* **Anthropic-Compatible Endpoints**: `/v1/messages`, `/v1/messages/count_tokens` for Claude Code and other Anthropic API clients.
* **Model Filter GUI**: Visual interface for configuring model ignore/whitelist rules per provider (see Section 6).
2. **The Resilience Library (`rotator_library`)**: This is the core engine that provides high availability. It is consumed by the proxy app to manage a pool of API keys, handle errors gracefully, and ensure requests are completed successfully even when individual keys or provider endpoints face issues.

Expand Down Expand Up @@ -816,6 +817,108 @@ When a custom cap triggers a cooldown longer than the exhaustion threshold, it a

**Defaults:** See `src/rotator_library/config/defaults.py` for all configurable defaults.

### 2.21. Anthropic API Compatibility (`anthropic_compat/`)

A translation layer that enables Anthropic API clients (like Claude Code) to use any OpenAI-compatible provider through the proxy.

#### Architecture

The module consists of three components:

| File | Purpose |
|------|---------|
| `models.py` | Pydantic models for Anthropic request/response formats (`AnthropicMessagesRequest`, `AnthropicMessage`, `AnthropicTool`, etc.) |
| `translator.py` | Bidirectional format translation functions |
| `streaming.py` | SSE format conversion for streaming responses |

#### Request Translation (`translate_anthropic_request`)

Converts Anthropic Messages API requests to OpenAI Chat Completions format:

**Message Conversion:**
- Anthropic `system` field → OpenAI system message
- `content` blocks (text, image, tool_use, tool_result) → OpenAI format
- Image blocks with base64 data → OpenAI `image_url` with data URI
- Document blocks (PDF, etc.) → OpenAI `image_url` format

**Tool Conversion:**
- Anthropic `tools` with `input_schema` → OpenAI `tools` with `parameters`
- `tool_choice.type: "any"` → `"required"`
- `tool_choice.type: "tool"` → `{"type": "function", "function": {"name": ...}}`

**Thinking Configuration:**
- `thinking.type: "enabled"` → `reasoning_effort: "high"` + `thinking_budget`
- `thinking.type: "disabled"` → `reasoning_effort: "disable"`
- Opus models default to thinking enabled

**Special Handling:**
- Reorders assistant content blocks: thinking → text → tool_use
- Injects `[Continue]` prompt for fresh thinking turns
- Preserves thinking signatures for multi-turn conversations

#### Response Translation (`openai_to_anthropic_response`)

Converts OpenAI Chat Completions responses to Anthropic Messages format:

**Content Blocks:**
- `reasoning_content` → thinking block with signature
- `content` → text block
- `tool_calls` → tool_use blocks with parsed JSON input

**Field Mapping:**
- `finish_reason: "stop"` → `stop_reason: "end_turn"`
- `finish_reason: "length"` → `stop_reason: "max_tokens"`
- `finish_reason: "tool_calls"` → `stop_reason: "tool_use"`

**Usage Translation:**
- `prompt_tokens` minus `cached_tokens` → `input_tokens`
- `completion_tokens` → `output_tokens`
- `prompt_tokens_details.cached_tokens` → `cache_read_input_tokens`

#### Streaming Wrapper (`anthropic_streaming_wrapper`)

Converts OpenAI SSE streaming format to Anthropic's event-based format:

**Event Types Generated:**
```
message_start → Initial message metadata
content_block_start → Start of text/thinking/tool_use block
content_block_delta → Incremental content (text_delta, thinking_delta, input_json_delta)
content_block_stop → End of content block
message_delta → Final metadata (stop_reason, usage)
message_stop → End of message
```

**Features:**
- Accumulates tool call arguments across chunks
- Handles thinking/reasoning content from `delta.reasoning_content`
- Proper block indexing for multiple content blocks
- Cache token handling in usage statistics
- Error recovery with proper message structure

#### Client Integration

The `RotatingClient` provides two methods for Anthropic compatibility:

```python
async def anthropic_messages(self, request, raw_request=None, pre_request_callback=None):
"""Handle Anthropic Messages API requests."""
# 1. Translate Anthropic request to OpenAI format
# 2. Call acompletion() with translated request
# 3. Convert response back to Anthropic format
# 4. For streaming: wrap with anthropic_streaming_wrapper

async def anthropic_count_tokens(self, request):
"""Count tokens for Anthropic-format request."""
# Translates messages and tools, then uses token_count()
```

#### Authentication

The proxy accepts both Anthropic and OpenAI authentication styles:
- `x-api-key` header (Anthropic style)
- `Authorization: Bearer` header (OpenAI style)

### 3.5. Antigravity (`antigravity_provider.py`)

The most sophisticated provider implementation, supporting Google's internal Antigravity API for Gemini 3 and Claude models (including **Claude Opus 4.5**, Anthropic's most powerful model).
Expand Down
54 changes: 50 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,20 @@

**One proxy. Any LLM provider. Zero code changes.**

A self-hosted proxy that provides a single, OpenAI-compatible API endpoint for all your LLM providers. Works with any application that supports custom OpenAI base URLs—no code changes required in your existing tools.
A self-hosted proxy that provides OpenAI and Anthropic compatible API endpoints for all your LLM providers. Works with any application that supports custom OpenAI or Anthropic base URLs—including Claude Code, Opencode, and more—no code changes required in your existing tools.

This project consists of two components:

1. **The API Proxy** — A FastAPI application providing a universal `/v1/chat/completions` endpoint
1. **The API Proxy** — A FastAPI application providing universal `/v1/chat/completions` (OpenAI) and `/v1/messages` (Anthropic) endpoints
2. **The Resilience Library** — A reusable Python library for intelligent API key management, rotation, and failover

---

## Why Use This?

- **Universal Compatibility** — Works with any app supporting OpenAI-compatible APIs: Opencode, Continue, Roo/Kilo Code, JanitorAI, SillyTavern, custom applications, and more
- **Universal Compatibility** — Works with any app supporting OpenAI or Anthropic APIs: Claude Code, Opencode, Continue, Roo/Kilo Code, Cursor, JanitorAI, SillyTavern, custom applications, and more
- **One Endpoint, Many Providers** — Configure Gemini, OpenAI, Anthropic, and [any LiteLLM-supported provider](https://docs.litellm.ai/docs/providers) once. Access them all through a single API key
- **Anthropic API Compatible** — Use Claude Code or any Anthropic SDK client with non-Anthropic providers like Gemini, OpenAI, or custom models
- **Built-in Resilience** — Automatic key rotation, failover on errors, rate limit handling, and intelligent cooldowns
- **Exclusive Provider Support** — Includes custom providers not available elsewhere: **Antigravity** (Gemini 3 + Claude Sonnet/Opus 4.5), **Gemini CLI**, **Qwen Code**, and **iFlow**

Expand Down Expand Up @@ -177,12 +178,57 @@ In your configuration file (e.g., `config.json`):

</details>

<details>
<summary><b>Claude Code</b></summary>

Claude Code natively supports custom Anthropic API endpoints. The recommended setup is to edit your Claude Code `settings.json`:

```json
{
"env": {
"ANTHROPIC_AUTH_TOKEN": "your-proxy-api-key",
"ANTHROPIC_BASE_URL": "http://127.0.0.1:8000",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "gemini/gemini-3-pro",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "gemini/gemini-3-flash",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "openai/gpt-5-mini"
}
}
```

Now you can use Claude Code with Gemini, OpenAI, or any other configured provider.

</details>

<details>
<summary><b>Anthropic Python SDK</b></summary>

```python
from anthropic import Anthropic

client = Anthropic(
base_url="http://127.0.0.1:8000",
api_key="your-proxy-api-key"
)

# Use any provider through Anthropic's API format
response = client.messages.create(
model="gemini/gemini-3-flash", # provider/model format
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.content[0].text)
```

</details>

### API Endpoints

| Endpoint | Description |
|----------|-------------|
| `GET /` | Status check — confirms proxy is running |
| `POST /v1/chat/completions` | Chat completions (main endpoint) |
| `POST /v1/chat/completions` | Chat completions (OpenAI format) |
| `POST /v1/messages` | Chat completions (Anthropic format) — Claude Code compatible |
| `POST /v1/messages/count_tokens` | Count tokens for Anthropic-format requests |
| `POST /v1/embeddings` | Text embeddings |
| `GET /v1/models` | List all available models with pricing & capabilities |
| `GET /v1/models/{model_id}` | Get details for a specific model |
Expand Down
Loading