Support of reasoning

daixba · daixba · commit 4095c2e74e29 · 2025-02-26T13:28:23.000+08:00
diff --git a/README.md b/README.md
@@ -24,10 +24,11 @@ If you find this GitHub repository useful, please consider giving it a free star
 - [x] Support streaming response via server-sent events (SSE)
 - [x] Support Model APIs
 - [x] Support Chat Completion APIs
-- [x] Support Tool Call (**new**)
-- [x] Support Embedding API (**new**)
-- [x] Support Multimodal API (**new**)
-- [x] Support Cross-Region Inference (**new**)
+- [x] Support Tool Call
+- [x] Support Embedding API
+- [x] Support Multimodal API
+- [x] Support Cross-Region Inference
+- [x] Support Reasoning (**new**)
 
 Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs.
 
diff --git a/README_CN.md b/README_CN.md
@@ -26,10 +26,11 @@ OpenAI 的 API 或 SDK 无缝集成并试用 Amazon Bedrock 的模型,而无需
 - [x] 支持 server-sent events (SSE)的流式响应
 - [x] 支持 Model APIs
 - [x] 支持 Chat Completion APIs
-- [x] 支持 Tool Call (**new**)
-- [x] 支持 Embedding API (**new**)
-- [x] 支持 Multimodal API (**new**)
-- [x] 支持 Cross-Region Inference (**new**)
+- [x] 支持 Tool Call
+- [x] 支持 Embedding API
+- [x] 支持 Multimodal API
+- [x] 支持 Cross-Region Inference
+- [x] 支持 Reasoning Mode (**new**)
 
 请查看[使用指南](./docs/Usage_CN.md)以获取有关如何使用新API的更多详细信息。
 
diff --git a/docs/Usage.md b/docs/Usage.md
@@ -317,3 +317,79 @@ curl $OPENAI_BASE_URL/chat/completions \
 You can try it with different questions, such as:
 1. Hello, who are you?  (No tools are needed)
 2. What is the weather like today?  (Should use get_current_location tool first)
+
+
+## Reasoning
+
+**Important Notice**: Please carefully review the following points before using reasoning mode for Chat completion API.
+- The only model supports Reasoning is Claude 3.7 Sonnet (extended thinking) so far. Please make sure the model supports reasoning.
+- The reasoning mode (or thinking mode) is not enabled by default, you must pass additional `reasoning_effort` parameter in your request.
+- The reasoning response (CoT, thoughts) is added in an additional tag 'reasoning_content' which is not officially supported by OpenAI. This is to follow [Deepseek Reasoning Model](https://api-docs.deepseek.com/guides/reasoning_model#api-example). This may be changed in the future.
+- Please provide the right max_tokens (or max_completion_tokens) in your request. The budget_tokens is based on reasoning_effort (low: 30%, medium: 60%, high: 100% of max tokens), ensuring minimum budget_tokens of 1,024 with Anthropic recommending at least 4,000 tokens for comprehensive reasoning. Check [Bedrock Document](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-37.html) for more details.
+
+**Example Request**
+
+```bash
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    "messages": [
+        {
+            "role": "user",
+            "content": "which one is bigger, 3.9 or 3.11?"
+        }
+    ],
+    "max_completion_tokens": 4096,
+    "reasoning_effort": "low",
+    "stream": false
+}'
+```
+
+**Example Response**
+
+```json
+{
+    "id": "chatcmpl-83fb7a88",
+    "created": 1740545278,
+    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    "system_fingerprint": "fp",
+    "choices": [
+        {
+            "index": 0,
+            "finish_reason": "stop",
+            "logprobs": null,
+            "message": {
+                "role": "assistant",
+                "content": "3.9 is bigger than 3.11.\n\nWhen comparing decimal numbers, we need to understand what these numbers actually represent:...",
+                "reasoning_content": "I need to compare the decimal numbers 3.9 and 3.11.\n\nFor decimal numbers, we first compare the whole number parts, and if they're equal, we compare the decimal parts. \n\nBoth numbers ..."
+            }
+        }
+    ],
+    "object": "chat.completion",
+    "usage": {
+        "prompt_tokens": 51,
+        "completion_tokens": 565,
+        "total_tokens": 616
+    }
+}
+```
+
+You can also use OpenAI SDK (run `pip3 install -U openai` first )
+
+```python
+from openai import OpenAI
+client = OpenAI()
+
+messages = [{"role": "user", "content": "which one is bigger, 3.9 or 3.11?"}]
+response = client.chat.completions.create(
+    model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    messages=messages,
+    reasoning_effort="low",
+    max_completion_tokens=4096,
+)
+
+reasoning_content = response.choices[0].message.reasoning_content
+content = response.choices[0].message.content
+```
diff --git a/docs/Usage_CN.md b/docs/Usage_CN.md
@@ -313,4 +313,82 @@ curl $OPENAI_BASE_URL/chat/completions \
 
 You can try it with different questions, such as:
 1. Hello, who are you?  (No tools are needed)
-2. What is the weather like today?  (Should use get_current_location tool first)
+2. What is the weather like today?  (Should use get_current_location tool first)
+
+## Reasoning
+
+
+**重要**: 使用此 reasoning 推理模式前，请仔细阅读以下要点。
+
+- 目前仅 Claude 3.7 Sonnet 模型支持推理功能。使用前请确保所用模型支持推理。
+- 推理模式（或思考模式）默认未启用，您必须在请求中传递额外的 reasoning_effort 参数，参数值可选:low，medium, high
+- 推理结果（思维链结果、思考过程）被添加到名为 'reasoning_content' 的额外标签中，这不是 OpenAI 官方支持的格式。此设计遵循 [Deepseek Reasoning Model](https://api-docs.deepseek.com/guides/reasoning_model#api-example)  的规范。未来可能会有所变动。
+- 请在请求中提供正确的 max_tokens（或 max_completion_tokens）参数。budget_tokens 基于 reasoning_effort 设置（低：30%，中：60%，高：100% 的max tokens），确保最小 budget_tokens 为 1,024，Anthropic 建议至少使用 4,000 个令牌以获得全面的推理。详情请参阅 [Bedrock Document](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-37.html)。
+
+**Request 示例**
+
+```bash
+curl $OPENAI_BASE_URL/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -d '{
+    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    "messages": [
+        {
+            "role": "user",
+            "content": "which one is bigger, 3.9 or 3.11?"
+        }
+    ],
+    "max_completion_tokens": 4096,
+    "reasoning_effort": "low",
+    "stream": false
+}'
+```
+
+
+**Response 示例**
+
+```json
+{
+    "id": "chatcmpl-83fb7a88",
+    "created": 1740545278,
+    "model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    "system_fingerprint": "fp",
+    "choices": [
+        {
+            "index": 0,
+            "finish_reason": "stop",
+            "logprobs": null,
+            "message": {
+                "role": "assistant",
+                "content": "3.9 is bigger than 3.11.\n\nWhen comparing decimal numbers, we need to understand what these numbers actually represent:...",
+                "reasoning_content": "I need to compare the decimal numbers 3.9 and 3.11.\n\nFor decimal numbers, we first compare the whole number parts, and if they're equal, we compare the decimal parts. \n\nBoth numbers ..."
+            }
+        }
+    ],
+    "object": "chat.completion",
+    "usage": {
+        "prompt_tokens": 51,
+        "completion_tokens": 565,
+        "total_tokens": 616
+    }
+}
+```
+
+或者使用 OpenAI SDK (请先运行`pip3 install -U openai` 升级到最新版本)
+
+```python
+from openai import OpenAI
+client = OpenAI()
+
+messages = [{"role": "user", "content": "which one is bigger, 3.9 or 3.11?"}]
+response = client.chat.completions.create(
+    model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
+    messages=messages,
+    reasoning_effort="low",
+    max_completion_tokens=4096,
+)
+
+reasoning_content = response.choices[0].message.reasoning_content
+content = response.choices[0].message.content
+```