Skip to content

Commit 66cb51b

Browse files
authored
feat: add Claude Sonnet 4.5 support with global cross-region inference (#180)
This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929), Anthropic's most intelligent model with enhanced coding capabilities and complex agent support. Changes: - Added global cross-region inference profile discovery (global.anthropic.*) - Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously) - Fixed reasoning_effort parameter handling to prevent KeyError - Added extended thinking/interleaved thinking support via extra_body parameter - Updated documentation with Claude Sonnet 4.5 examples (English and Chinese) - Updated README with Sonnet 4.5 announcement Technical Details: - src/api/models/bedrock.py: Added global profile support in list_bedrock_models() - src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter - src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError - docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples - docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples - docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0
1 parent 371d11d commit 66cb51b

File tree

4 files changed

+180
-7
lines changed

4 files changed

+180
-7
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,9 @@ OpenAI-compatible RESTful APIs for Amazon Bedrock
44

55
## What's New 🔥
66

7-
This project supports reasoning for both **Claude 3.7 Sonnet** and **DeepSeek R1**, check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.
7+
This project now supports **Claude Sonnet 4.5**, Anthropic's most intelligent model with enhanced coding capabilities and complex agent support, available via global cross-region inference.
8+
9+
It also supports reasoning for both **Claude 3.7 Sonnet** and **DeepSeek R1**. Check [How to Use](./docs/Usage.md#reasoning) for more details. You need to first run the Models API to refresh the model list.
810

911
## Overview
1012

docs/Usage.md

Lines changed: 81 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,43 @@ curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq
5151
]
5252
```
5353

54+
## Chat Completions API
55+
56+
### Basic Example with Claude Sonnet 4.5
57+
58+
Claude Sonnet 4.5 is Anthropic's most intelligent model, excelling at coding, complex reasoning, and agent-based tasks. It's available via global cross-region inference profiles.
59+
60+
**Example Request**
61+
62+
```bash
63+
curl $OPENAI_BASE_URL/chat/completions \
64+
-H "Content-Type: application/json" \
65+
-H "Authorization: Bearer $OPENAI_API_KEY" \
66+
-d '{
67+
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
68+
"messages": [
69+
{
70+
"role": "user",
71+
"content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."
72+
}
73+
]
74+
}'
75+
```
76+
77+
**Example SDK Usage**
78+
79+
```python
80+
from openai import OpenAI
81+
82+
client = OpenAI()
83+
completion = client.chat.completions.create(
84+
model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
85+
messages=[{"role": "user", "content": "Write a Python function to calculate the Fibonacci sequence using dynamic programming."}],
86+
)
87+
88+
print(completion.choices[0].message.content)
89+
```
90+
5491
## Embedding API
5592

5693
**Important Notice**: Please carefully review the following points before using this proxy API for embedding.
@@ -451,10 +488,31 @@ for chunk in response:
451488
Extended thinking with tool use in Claude 4 models supports [interleaved thinking](https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-extended-thinking-tool-use-interleaved) enables Claude 4 models to think between tool calls and run more sophisticated reasoning after receiving tool results. which is helpful for more complex agentic interactions.
452489
With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn.
453490

491+
**Supported Models**: Claude Sonnet 4, Claude Sonnet 4.5
454492

455493
**Example Request**
456494

457-
- Non-Streaming
495+
- Non-Streaming (Claude Sonnet 4.5)
496+
497+
```bash
498+
curl http://127.0.0.1:8000/api/v1/chat/completions \
499+
-H "Content-Type: application/json" \
500+
-H "Authorization: Bearer bedrock" \
501+
-d '{
502+
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
503+
"max_tokens": 2048,
504+
"messages": [{
505+
"role": "user",
506+
"content": "Explain how to implement a binary search tree with self-balancing capabilities."
507+
}],
508+
"extra_body": {
509+
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
510+
"thinking": {"type": "enabled", "budget_tokens": 4096}
511+
}
512+
}'
513+
```
514+
515+
- Non-Streaming (Claude Sonnet 4)
458516

459517
```bash
460518
curl http://127.0.0.1:8000/api/v1/chat/completions \
@@ -474,7 +532,28 @@ curl http://127.0.0.1:8000/api/v1/chat/completions \
474532
}'
475533
```
476534

477-
- Streaming
535+
- Streaming (Claude Sonnet 4.5)
536+
537+
```bash
538+
curl http://127.0.0.1:8000/api/v1/chat/completions \
539+
-H "Content-Type: application/json" \
540+
-H "Authorization: Bearer bedrock" \
541+
-d '{
542+
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
543+
"max_tokens": 2048,
544+
"messages": [{
545+
"role": "user",
546+
"content": "Explain how to implement a binary search tree with self-balancing capabilities."
547+
}],
548+
"stream": true,
549+
"extra_body": {
550+
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
551+
"thinking": {"type": "enabled", "budget_tokens": 4096}
552+
}
553+
}'
554+
```
555+
556+
- Streaming (Claude Sonnet 4)
478557

479558
```bash
480559
curl http://127.0.0.1:8000/api/v1/chat/completions \

docs/Usage_CN.md

Lines changed: 80 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,42 @@ curl -s $OPENAI_BASE_URL/models -H "Authorization: Bearer $OPENAI_API_KEY" | jq
4949
]
5050
```
5151

52+
## Chat Completions API
53+
54+
### Claude Sonnet 4.5 基础示例
55+
56+
Claude Sonnet 4.5 是 Anthropic 最智能的模型,在编码、复杂推理和基于代理的任务方面表现出色。它通过全球跨区域推理配置文件提供。
57+
58+
**Request 示例**
59+
60+
```bash
61+
curl $OPENAI_BASE_URL/chat/completions \
62+
-H "Content-Type: application/json" \
63+
-H "Authorization: Bearer $OPENAI_API_KEY" \
64+
-d '{
65+
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
66+
"messages": [
67+
{
68+
"role": "user",
69+
"content": "编写一个使用动态规划计算斐波那契数列的Python函数。"
70+
}
71+
]
72+
}'
73+
```
74+
75+
**SDK 使用示例**
76+
77+
```python
78+
from openai import OpenAI
79+
80+
client = OpenAI()
81+
completion = client.chat.completions.create(
82+
model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
83+
messages=[{"role": "user", "content": "编写一个使用动态规划计算斐波那契数列的Python函数。"}],
84+
)
85+
86+
print(completion.choices[0].message.content)
87+
```
5288

5389
## Embedding API
5490

@@ -452,10 +488,31 @@ Claude 4 模型支持借助工具使用的扩展思维功能(Extended Thinking
452488

453489
在交错思考模式下,budget_tokens 可以超过 max_tokens 参数,因为它代表一次助手回合中所有思考块的总 Token 预算。
454490

491+
**支持的模型**: Claude Sonnet 4, Claude Sonnet 4.5
455492

456493
**Request 示例**
457494

458-
- Non-Streaming
495+
- Non-Streaming (Claude Sonnet 4.5)
496+
497+
```bash
498+
curl http://127.0.0.1:8000/api/v1/chat/completions \
499+
-H "Content-Type: application/json" \
500+
-H "Authorization: Bearer bedrock" \
501+
-d '{
502+
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
503+
"max_tokens": 2048,
504+
"messages": [{
505+
"role": "user",
506+
"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
507+
}],
508+
"extra_body": {
509+
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
510+
"thinking": {"type": "enabled", "budget_tokens": 4096}
511+
}
512+
}'
513+
```
514+
515+
- Non-Streaming (Claude Sonnet 4)
459516

460517
```bash
461518
curl http://127.0.0.1:8000/api/v1/chat/completions \
@@ -475,7 +532,28 @@ curl http://127.0.0.1:8000/api/v1/chat/completions \
475532
}'
476533
```
477534

478-
- Streaming
535+
- Streaming (Claude Sonnet 4.5)
536+
537+
```bash
538+
curl http://127.0.0.1:8000/api/v1/chat/completions \
539+
-H "Content-Type: application/json" \
540+
-H "Authorization: Bearer bedrock" \
541+
-d '{
542+
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
543+
"max_tokens": 2048,
544+
"messages": [{
545+
"role": "user",
546+
"content": "解释如何实现一个具有自平衡功能的二叉搜索树。"
547+
}],
548+
"stream": true,
549+
"extra_body": {
550+
"anthropic_beta": ["interleaved-thinking-2025-05-14"],
551+
"thinking": {"type": "enabled", "budget_tokens": 4096}
552+
}
553+
}'
554+
```
555+
556+
- Streaming (Claude Sonnet 4)
479557

480558
```bash
481559
curl http://127.0.0.1:8000/api/v1/chat/completions \

src/api/models/bedrock.py

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,11 @@ def list_bedrock_models() -> dict:
158158
if profile_id in profile_list:
159159
model_list[profile_id] = {"modalities": input_modalities}
160160

161+
# Add global cross-region inference profiles
162+
global_profile_id = "global." + model_id
163+
if global_profile_id in profile_list:
164+
model_list[global_profile_id] = {"modalities": input_modalities}
165+
161166
# Add application inference profiles (emit all profiles for this model)
162167
if model_id in app_profiles_by_model:
163168
for profile_arn in app_profiles_by_model[model_id]:
@@ -521,6 +526,11 @@ def _parse_request(self, chat_request: ChatRequest) -> dict:
521526
"topP": chat_request.top_p,
522527
}
523528

529+
# Claude Sonnet 4.5 doesn't support both temperature and topP
530+
# Remove topP for this model
531+
if "claude-sonnet-4-5" in chat_request.model.lower():
532+
inference_config.pop("topP", None)
533+
524534
if chat_request.stop is not None:
525535
stop = chat_request.stop
526536
if isinstance(stop, str):
@@ -547,7 +557,7 @@ def _parse_request(self, chat_request: ChatRequest) -> dict:
547557
)
548558
inference_config["maxTokens"] = max_tokens
549559
# unset topP - Not supported
550-
inference_config.pop("topP")
560+
inference_config.pop("topP", None)
551561

552562
args["additionalModelRequestFields"] = {
553563
"reasoning_config": {"type": "enabled", "budget_tokens": budget_tokens}
@@ -573,8 +583,12 @@ def _parse_request(self, chat_request: ChatRequest) -> dict:
573583
args["toolConfig"] = tool_config
574584
# add Additional fields to enable extend thinking
575585
if chat_request.extra_body:
576-
# reasoning_config will not be used
586+
# reasoning_config will not be used
577587
args["additionalModelRequestFields"] = chat_request.extra_body
588+
# Extended thinking doesn't support both temperature and topP
589+
# Remove topP to avoid validation error
590+
if "thinking" in chat_request.extra_body:
591+
inference_config.pop("topP", None)
578592
return args
579593

580594
def _create_response(

0 commit comments

Comments
 (0)