Skip to content

Commit 36d7933

Browse files
committed
test
1 parent 2231db7 commit 36d7933

File tree

2 files changed

+230
-8
lines changed

2 files changed

+230
-8
lines changed

articles/ai-foundry/model-inference/includes/use-chat-reasoning/python.md

Lines changed: 109 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ To complete this tutorial, you need:
2929

3030
First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables.
3131

32+
# [API version 2025-04-01](#tab/2025-04-01)
33+
3234
```python
3335
import os
3436
from azure.ai.inference import ChatCompletionsClient
@@ -41,11 +43,30 @@ client = ChatCompletionsClient(
4143
)
4244
```
4345

46+
# [API version 2024-05-01-preview](#tab/2024-05-01-preview)
47+
48+
```python
49+
import os
50+
from azure.ai.inference import ChatCompletionsClient
51+
from azure.core.credentials import AzureKeyCredential
52+
53+
client = ChatCompletionsClient(
54+
endpoint="https://<resource>.services.ai.azure.com/models",
55+
credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]),
56+
model="deepseek-r1",
57+
api_version="2024-05-01-preview"
58+
)
59+
```
60+
61+
---
62+
4463
> [!TIP]
4564
> Verify that you have deployed the model to Azure AI Services resource with the Azure AI model inference API. `Deepseek-R1` is also available as Serverless API Endpoints. However, those endpoints don't take the parameter `model` as explained in this tutorial. You can verify that by going to [Azure AI Foundry portal]() > Models + endpoints, and verify that the model is listed under the section **Azure AI Services**.
4665
4766
If you have configured the resource to with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
4867

68+
# [API version 2025-04-01](#tab/2025-04-01)
69+
4970
```python
5071
import os
5172
from azure.ai.inference import ChatCompletionsClient
@@ -54,11 +75,28 @@ from azure.identity import DefaultAzureCredential
5475
client = ChatCompletionsClient(
5576
endpoint="https://<resource>.services.ai.azure.com/models",
5677
credential=DefaultAzureCredential(),
57-
credential_scopes=["https://cognitiveservices.azure.com/.default"],
5878
model="deepseek-r1"
5979
)
6080
```
6181

82+
# [API version 2024-05-01-preview](#tab/2024-05-01-preview)
83+
84+
```python
85+
import os
86+
from azure.ai.inference import ChatCompletionsClient
87+
from azure.core.credentials import AzureKeyCredential
88+
89+
client = ChatCompletionsClient(
90+
endpoint="https://<resource>.services.ai.azure.com/models",
91+
credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]),
92+
credential_scopes=["https://cognitiveservices.azure.com/.default"],
93+
model="deepseek-r1",
94+
api_version="2024-05-01-preview"
95+
)
96+
```
97+
98+
---
99+
62100
### Create a chat completion request
63101

64102
The following example shows how you can create a basic chat request to the model.
@@ -77,6 +115,28 @@ response = client.complete(
77115

78116
The response is as follows, where you can see the model's usage statistics:
79117

118+
# [API version 2025-04-01](#tab/2025-04-01)
119+
120+
```python
121+
print("Response:", response.choices[0].message.content)
122+
print("Model:", response.model)
123+
print("Usage:")
124+
print("\tPrompt tokens:", response.usage.prompt_tokens)
125+
print("\tTotal tokens:", response.usage.total_tokens)
126+
print("\tCompletion tokens:", response.usage.completion_tokens)
127+
```
128+
129+
```console
130+
Response: As of now, it's estimated that there are about 7,000 languages spoken around the world. However, this number can vary as some languages become extinct and new ones develop. It's also important to note that the number of speakers can greatly vary between languages, with some having millions of speakers and others only a few hundred.
131+
Model: deepseek-r1
132+
Usage:
133+
Prompt tokens: 11
134+
Total tokens: 897
135+
Completion tokens: 886
136+
```
137+
138+
# [API version 2024-05-01-preview](#tab/2024-05-01-preview)
139+
80140
```python
81141
print("Response:", response.choices[0].message.content)
82142
print("Model:", response.model)
@@ -95,10 +155,28 @@ Usage:
95155
Completion tokens: 886
96156
```
97157

158+
---
98159

99160
### Reasoning content
100161

101-
Some reasoning models, like DeepSeek-R1, generate completions and include the reasoning behind it. The reasoning associated with the completion is included in the response's content within the tags `<think>` and `</think>`. The model may select on which scenarios to generate reasoning content. You can extract the reasoning content from the response to understand the model's thought process as follows:
162+
Some reasoning models, like DeepSeek-R1, generate completions and include the reasoning behind it.
163+
164+
# [API version 2025-04-01](#tab/2025-04-01)
165+
166+
The reasoning associated with the completion is included in the response's `reasoning_content` field. The model may select on which scenarios to generate reasoning content.
167+
168+
```python
169+
print("Thinking:", response.choices[0].message.reasoning_content)
170+
```
171+
172+
```console
173+
Thinking: Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer...
174+
```
175+
176+
177+
# [API version 2024-05-01-preview](#tab/2024-05-01-preview)
178+
179+
The reasoning associated with the completion is included in the response's content within the tags `<think>` and `</think>`. The model may select on which scenarios to generate reasoning content. You can extract the reasoning content from the response to understand the model's thought process as follows:
102180

103181
```python
104182
import re
@@ -129,6 +207,8 @@ Usage:
129207
Completion tokens: 886
130208
```
131209

210+
---
211+
132212
When making multi-turn conversations, it's useful to avoid sending the reasoning content in the chat history as reasoning tends to generate long explanations.
133213

134214
### Stream content
@@ -139,7 +219,6 @@ You can _stream_ the content to get it as it's being generated. Streaming conten
139219

140220
To stream completions, set `stream=True` when you call the model.
141221

142-
143222
```python
144223
result = client.complete(
145224
model="deepseek-r1",
@@ -153,6 +232,31 @@ result = client.complete(
153232

154233
To visualize the output, define a helper function to print the stream. The following example implements a routing that stream only the answer without the reasoning content:
155234

235+
# [API version 2025-04-01](#tab/2025-04-01)
236+
237+
```python
238+
def print_stream(result):
239+
"""
240+
Prints the chat completion with streaming.
241+
"""
242+
is_thinking = False
243+
for event in completion:
244+
if event.choices:
245+
content = event.choices[0].delta.get("content")
246+
resoning_content = event.choices[0].delta.get("resoning_content")
247+
if resoning_content:
248+
is_thinking = True
249+
print("🧠 Thinking...", end="", flush=True)
250+
elif is_thinking and content:
251+
is_thinking = False
252+
print("🛑\n\n")
253+
if content:
254+
print(content, end="", flush=True)
255+
```
256+
257+
258+
# [API version 2024-05-01-preview](#tab/2024-05-01-preview)
259+
156260
```python
157261
def print_stream(result):
158262
"""
@@ -172,6 +276,8 @@ def print_stream(result):
172276
print(content, end="", flush=True)
173277
```
174278

279+
---
280+
175281
You can visualize how streaming generates content:
176282

177283

articles/ai-foundry/model-inference/includes/use-chat-reasoning/rest.md

Lines changed: 121 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,23 +27,47 @@ To complete this tutorial, you need:
2727

2828
First, create the client to consume the model. The following code uses an endpoint URL and key that are stored in environment variables.
2929

30+
# [API version 2025-04-01](#tab/2025-04-01)
31+
32+
```http
33+
POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2025-04-01
34+
Content-Type: application/json
35+
api-key: <key>
36+
```
37+
38+
# [API version 2024-05-01-preview](#tab/2024-05-01-preview)
39+
3040
```http
3141
POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
3242
Content-Type: application/json
3343
api-key: <key>
3444
```
3545

46+
---
47+
3648
> [!TIP]
3749
> Verify that you have deployed the model to Azure AI Services resource with the Azure AI model inference API. `Deepseek-R1` is also available as Serverless API Endpoints. However, those endpoints don't take the parameter `model` as explained in this tutorial. You can verify that by going to [Azure AI Foundry portal]() > Models + endpoints, and verify that the model is listed under the section **Azure AI Services**.
3850
3951
If you have configured the resource with **Microsoft Entra ID** support, pass you token in the `Authorization` header with the format `Bearer <token>`. Use scope `https://cognitiveservices.azure.com/.default`.
4052

53+
# [API version 2025-04-01](#tab/2025-04-01)
54+
4155
```http
4256
POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
4357
Content-Type: application/json
4458
Authorization: Bearer <token>
4559
```
4660

61+
# [API version 2024-05-01-preview](#tab/2024-05-01-preview)
62+
63+
```http
64+
POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2025-04-01
65+
Content-Type: application/json
66+
Authorization: Bearer <token>
67+
```
68+
69+
---
70+
4771
Using Microsoft Entra ID may require additional configuration in your resource to grant access. Learn how to [configure key-less authentication with Microsoft Entra ID](../../how-to/configure-entra-id.md).
4872

4973
### Create a chat completion request
@@ -66,6 +90,40 @@ The following example shows how you can create a basic chat request to the model
6690

6791
The response is as follows, where you can see the model's usage statistics:
6892

93+
# [API version 2025-04-01](#tab/2025-04-01)
94+
95+
The reasoning associated with the completion is included in the response's `reasoning_content` field. The model may select on which scenarios to generate reasoning content.
96+
97+
```json
98+
{
99+
"id": "0a1234b5de6789f01gh2i345j6789klm",
100+
"object": "chat.completion",
101+
"created": 1718726686,
102+
"model": "DeepSeek-R1",
103+
"choices": [
104+
{
105+
"index": 0,
106+
"message": {
107+
"role": "assistant",
108+
"content": "The exact number of languages in the world is challenging to determine due to differences in definitions (e.g., distinguishing languages from dialects) and ongoing documentation efforts. However, widely cited estimates suggest there are approximately **7,000 languages** globally.",
109+
"reasoning_content": "Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer. Let's start by recalling the general consensus from linguistic sources. I remember that the number often cited is around 7,000, but maybe I should check some reputable organizations.\n\nEthnologue is a well-known resource for language data, and I think they list about 7,000 languages. But wait, do they update their numbers? It might be around 7,100 or so. Also, the exact count can vary because some sources might categorize dialects differently or have more recent data. \n\nAnother thing to consider is language endangerment. Many languages are endangered, with some having only a few speakers left. Organizations like UNESCO track endangered languages, so mentioning that adds context. Also, the distribution isn't even. Some countries have hundreds of languages, like Papua New Guinea with over 800, while others have just a few. \n\nA user might also wonder why the exact number is hard to pin down. It's because the distinction between a language and a dialect can be political or cultural. For example, Mandarin and Cantonese are considered dialects of Chinese by some, but they're mutually unintelligible, so others classify them as separate languages. Also, some regions are under-researched, making it hard to document all languages. \n\nI should also touch on language families. The 7,000 languages are grouped into families like Indo-European, Sino-Tibetan, Niger-Congo, etc. Maybe mention a few of the largest families. But wait, the question is just about the count, not the families. Still, it's good to provide a bit more context. \n\nI need to make sure the information is up-to-date. Let me think – recent estimates still hover around 7,000. However, languages are dying out rapidly, so the number decreases over time. Including that note about endangerment and language extinction rates could be helpful. For instance, it's often stated that a language dies every few weeks. \n\nAnother point is sign languages. Does the count include them? Ethnologue includes some, but not all sources might. If the user is including sign languages, that adds more to the count, but I think the 7,000 figure typically refers to spoken languages. For thoroughness, maybe mention that there are also over 300 sign languages. \n\nSummarizing, the answer should state around 7,000, mention Ethnologue's figure, explain why the exact number varies, touch on endangerment, and possibly note sign languages as a separate category. Also, a brief mention of Papua New Guinea as the most linguistically diverse country. \n\nWait, let me verify Ethnologue's current number. As of their latest edition (25th, 2022), they list 7,168 living languages. But I should check if that's the case. Some sources might round to 7,000. Also, SIL International publishes Ethnologue, so citing them as reference makes sense. \n\nOther sources, like Glottolog, might have a different count because they use different criteria. Glottolog might list around 7,000 as well, but exact numbers vary. It's important to highlight that the count isn't exact because of differing definitions and ongoing research. \n\nIn conclusion, the approximate number is 7,000, with Ethnologue being a key source, considerations of endangerment, and the challenges in counting due to dialect vs. language distinctions. I should make sure the answer is clear, acknowledges the variability, and provides key points succinctly.",
110+
"tool_calls": null
111+
},
112+
"finish_reason": "stop"
113+
}
114+
],
115+
"usage": {
116+
"prompt_tokens": 11,
117+
"total_tokens": 897,
118+
"completion_tokens": 886
119+
}
120+
}
121+
```
122+
123+
# [API version 2024-05-01-preview](#tab/2024-05-01-preview)
124+
125+
The reasoning associated with the completion is included in the response's content within the tags `<think>` and `</think>`. The model may select on which scenarios to generate reasoning content.
126+
69127
```json
70128
{
71129
"id": "0a1234b5de6789f01gh2i345j6789klm",
@@ -91,11 +149,11 @@ The response is as follows, where you can see the model's usage statistics:
91149
}
92150
```
93151

94-
### Reasoning content
152+
---
95153

96-
Some reasoning models, like DeepSeek-R1, generate completions and include the reasoning behind it. The reasoning associated with the completion is included in the response's content within the tags `<think>` and `</think>`. The model may select on which scenarios to generate reasoning content.
154+
### Reasoning content
97155

98-
When making multi-turn conversations, it's useful to avoid sending the reasoning content in the chat history as reasoning tends to generate long explanations.
156+
Some reasoning models, like DeepSeek-R1, generate completions and include the reasoning behind it. When making multi-turn conversations, it's useful to avoid sending the reasoning content in the chat history as reasoning tends to generate long explanations.
99157

100158
### Stream content
101159

@@ -105,7 +163,6 @@ You can _stream_ the content to get it as it's being generated. Streaming conten
105163

106164
To stream completions, set `"stream": true` when you call the model.
107165

108-
109166
```json
110167
{
111168
"model": "DeepSeek-R1",
@@ -124,7 +181,32 @@ To stream completions, set `"stream": true` when you call the model.
124181
}
125182
```
126183

127-
To visualize the output, define a helper function to print the stream. The following example implements a routing that stream only the answer without the reasoning content:
184+
The output looks as follows:
185+
186+
# [API version 2025-04-01](#tab/2025-04-01)
187+
188+
```json
189+
{
190+
"id": "23b54589eba14564ad8a2e6978775a39",
191+
"object": "chat.completion.chunk",
192+
"created": 1718726371,
193+
"model": "DeepSeek-R1",
194+
"choices": [
195+
{
196+
"index": 0,
197+
"delta": {
198+
"role": "assistant",
199+
"content": "",
200+
"reasoning_content": "",
201+
},
202+
"finish_reason": null,
203+
"logprobs": null
204+
}
205+
]
206+
}
207+
```
208+
209+
# [API version 2024-05-01-preview](#tab/2024-05-01-preview)
128210

129211
```json
130212
{
@@ -146,8 +228,40 @@ To visualize the output, define a helper function to print the stream. The follo
146228
}
147229
```
148230

231+
---
232+
149233
The last message in the stream has `finish_reason` set, indicating the reason for the generation process to stop.
150234

235+
# [API version 2025-04-01](#tab/2025-04-01)
236+
237+
```json
238+
{
239+
"id": "23b54589eba14564ad8a2e6978775a39",
240+
"object": "chat.completion.chunk",
241+
"created": 1718726371,
242+
"model": "DeepSeek-R1",
243+
"choices": [
244+
{
245+
"index": 0,
246+
"delta": {
247+
"content": "",
248+
"reasoning_content": ""
249+
},
250+
"finish_reason": "stop",
251+
"logprobs": null
252+
}
253+
],
254+
"usage": {
255+
"prompt_tokens": 11,
256+
"total_tokens": 897,
257+
"completion_tokens": 886
258+
}
259+
}
260+
```
261+
262+
263+
# [API version 2024-05-01-preview](#tab/2024-05-01-preview)
264+
151265
```json
152266
{
153267
"id": "23b54589eba14564ad8a2e6978775a39",
@@ -172,6 +286,8 @@ The last message in the stream has `finish_reason` set, indicating the reason fo
172286
}
173287
```
174288

289+
---
290+
175291
### Parameters
176292

177293
In general, reasoning models don't support the following parameters you can find in chat completion models:

0 commit comments

Comments
 (0)