Skip to content

Commit a92cb26

Browse files
Merge pull request #255 from msakande/fix-response-format-for-chat-completions-models
Fix response format for chat completions models (AI Studio)
2 parents 54de0f3 + b91a479 commit a92cb26

9 files changed

+70
-74
lines changed

articles/ai-studio/how-to/deploy-models-jais.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ print_stream(result)
201201
Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference).
202202

203203
```python
204-
from azure.ai.inference.models import ChatCompletionsResponseFormatText
204+
from azure.ai.inference.models import ChatCompletionsResponseFormat
205205

206206
response = client.complete(
207207
messages=[
@@ -214,12 +214,12 @@ response = client.complete(
214214
stop=["<|endoftext|>"],
215215
temperature=0,
216216
top_p=1,
217-
response_format=ChatCompletionsResponseFormatText(),
217+
response_format={ "type": ChatCompletionsResponseFormat.TEXT },
218218
)
219219
```
220220

221221
> [!WARNING]
222-
> Jais doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
222+
> Jais models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
223223
224224
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
225225

@@ -482,7 +482,7 @@ var response = await client.path("/chat/completions").post({
482482
```
483483
484484
> [!WARNING]
485-
> Jais doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
485+
> Jais models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
486486
487487
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
488488
@@ -580,7 +580,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip
580580
581581
### The inference package installed
582582
583-
You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites:
583+
You can consume predictions from this model by using the `Azure.AI.Inference` package from [NuGet](https://www.nuget.org/). To install this package, you need the following prerequisites:
584584
585585
* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2).
586586
* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string.
@@ -606,7 +606,7 @@ using Azure.Identity;
606606
using Azure.AI.Inference;
607607
```
608608

609-
This example also use the following namespaces but you may not always need them:
609+
This example also uses the following namespaces but you may not always need them:
610610

611611

612612
```csharp
@@ -775,7 +775,7 @@ Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}");
775775
```
776776
777777
> [!WARNING]
778-
> Jais doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
778+
> Jais models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
779779

780780
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
781781
@@ -1088,7 +1088,7 @@ Explore other parameters that you can specify in the inference client. For a ful
10881088
```
10891089

10901090
> [!WARNING]
1091-
> Jais doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
1091+
> Jais models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
10921092
10931093
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
10941094

@@ -1165,14 +1165,14 @@ The following example shows how to handle events when the model detects harmful
11651165

11661166
## More inference examples
11671167

1168-
For more examples of how to use Jais, see the following examples and tutorials:
1168+
For more examples of how to use Jais models, see the following examples and tutorials:
11691169

11701170
| Description | Language | Sample |
11711171
|-------------------------------------------|-------------------|-----------------------------------------------------------------|
11721172
| Azure AI Inference package for JavaScript | JavaScript | [Link](https://aka.ms/azsdk/azure-ai-inference/javascript/samples) |
11731173
| Azure AI Inference package for Python | Python | [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) |
11741174

1175-
## Cost and quota considerations for Jais family of models deployed as serverless API endpoints
1175+
## Cost and quota considerations for Jais models deployed as serverless API endpoints
11761176

11771177
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
11781178
@@ -1189,4 +1189,4 @@ For more information on how to track costs, see [Monitor costs for models offere
11891189
* [Deploy models as serverless APIs](deploy-models-serverless.md)
11901190
* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md)
11911191
* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)
1192-
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
1192+
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)

articles/ai-studio/how-to/deploy-models-llama.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -255,7 +255,7 @@ print_stream(result)
255255
Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference).
256256

257257
```python
258-
from azure.ai.inference.models import ChatCompletionsResponseFormatText
258+
from azure.ai.inference.models import ChatCompletionsResponseFormat
259259

260260
response = client.complete(
261261
messages=[
@@ -268,12 +268,12 @@ response = client.complete(
268268
stop=["<|endoftext|>"],
269269
temperature=0,
270270
top_p=1,
271-
response_format=ChatCompletionsResponseFormatText(),
271+
response_format={ "type": ChatCompletionsResponseFormat.TEXT },
272272
)
273273
```
274274

275275
> [!WARNING]
276-
> Meta Llama doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
276+
> Meta Llama models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
277277
278278
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
279279

@@ -610,7 +610,7 @@ var response = await client.path("/chat/completions").post({
610610
```
611611
612612
> [!WARNING]
613-
> Meta Llama doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
613+
> Meta Llama models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
614614
615615
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
616616
@@ -765,7 +765,7 @@ For deployment to a self-hosted managed compute, you must have enough quota in y
765765
766766
### The inference package installed
767767
768-
You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites:
768+
You can consume predictions from this model by using the `Azure.AI.Inference` package from [NuGet](https://www.nuget.org/). To install this package, you need the following prerequisites:
769769
770770
* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2).
771771
* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string.
@@ -791,7 +791,7 @@ using Azure.Identity;
791791
using Azure.AI.Inference;
792792
```
793793

794-
This example also use the following namespaces but you may not always need them:
794+
This example also uses the following namespaces but you may not always need them:
795795

796796

797797
```csharp
@@ -973,7 +973,7 @@ Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}");
973973
```
974974
975975
> [!WARNING]
976-
> Meta Llama doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
976+
> Meta Llama models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
977977

978978
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
979979
@@ -1348,7 +1348,7 @@ Explore other parameters that you can specify in the inference client. For a ful
13481348
```
13491349

13501350
> [!WARNING]
1351-
> Meta Llama doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
1351+
> Meta Llama models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
13521352
13531353
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
13541354

@@ -1441,7 +1441,7 @@ The following example shows how to handle events when the model detects harmful
14411441
14421442
## More inference examples
14431443
1444-
For more examples of how to use Meta Llama, see the following examples and tutorials:
1444+
For more examples of how to use Meta Llama models, see the following examples and tutorials:
14451445
14461446
| Description | Language | Sample |
14471447
|-------------------------------------------|-------------------|------------------------------------------------------------------- |
@@ -1453,7 +1453,7 @@ For more examples of how to use Meta Llama, see the following examples and tutor
14531453
| LangChain | Python | [Link](https://aka.ms/meta-llama-3.1-405B-instruct-langchain) |
14541454
| LiteLLM | Python | [Link](https://aka.ms/meta-llama-3.1-405B-instruct-litellm) |
14551455
1456-
## Cost and quota considerations for Meta Llama family of models deployed as serverless API endpoints
1456+
## Cost and quota considerations for Meta Llama models deployed as serverless API endpoints
14571457
14581458
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
14591459
@@ -1463,7 +1463,7 @@ Each time a project subscribes to a given offer from the Azure Marketplace, a ne
14631463
14641464
For more information on how to track costs, see [Monitor costs for models offered through the Azure Marketplace](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace).
14651465
1466-
## Cost and quota considerations for Meta Llama family of models deployed to managed compute
1466+
## Cost and quota considerations for Meta Llama models deployed to managed compute
14671467
14681468
Meta Llama models deployed to managed compute are billed based on core hours of the associated compute instance. The cost of the compute instance is determined by the size of the instance, the number of instances running, and the run duration.
14691469

articles/ai-studio/how-to/deploy-models-mistral-nemo.md

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn how to use Mistral Nemo chat model with Azure AI Studio.
55
ms.service: azure-ai-studio
66
manager: scottpolly
77
ms.topic: how-to
8-
ms.date: 08/08/2024
8+
ms.date: 09/12/2024
99
ms.reviewer: kritifaujdar
1010
reviewer: fkriti
1111
ms.author: mopeakande
@@ -209,7 +209,7 @@ print_stream(result)
209209
Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference).
210210

211211
```python
212-
from azure.ai.inference.models import ChatCompletionsResponseFormatText
212+
from azure.ai.inference.models import ChatCompletionsResponseFormat
213213

214214
response = client.complete(
215215
messages=[
@@ -222,7 +222,7 @@ response = client.complete(
222222
stop=["<|endoftext|>"],
223223
temperature=0,
224224
top_p=1,
225-
response_format=ChatCompletionsResponseFormatText(),
225+
response_format={ "type": ChatCompletionsResponseFormat.TEXT },
226226
)
227227
```
228228

@@ -234,15 +234,13 @@ Mistral Nemo chat model can create JSON outputs. Set `response_format` to `json_
234234

235235

236236
```python
237-
from azure.ai.inference.models import ChatCompletionsResponseFormatJSON
238-
239237
response = client.complete(
240238
messages=[
241239
SystemMessage(content="You are a helpful assistant that always generate responses in JSON format, using."
242240
" the following format: { ""answer"": ""response"" }."),
243241
UserMessage(content="How many languages are in the world?"),
244242
],
245-
response_format=ChatCompletionsResponseFormatJSON()
243+
response_format={ "type": ChatCompletionsResponseFormat.JSON_OBJECT }
246244
)
247245
```
248246

@@ -962,7 +960,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip
962960
963961
### The inference package installed
964962
965-
You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites:
963+
You can consume predictions from this model by using the `Azure.AI.Inference` package from [NuGet](https://www.nuget.org/). To install this package, you need the following prerequisites:
966964
967965
* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2).
968966
* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string.
@@ -988,7 +986,7 @@ using Azure.Identity;
988986
using Azure.AI.Inference;
989987
```
990988

991-
This example also use the following namespaces but you may not always need them:
989+
This example also uses the following namespaces but you may not always need them:
992990

993991

994992
```csharp
@@ -2010,7 +2008,7 @@ The following example shows how to handle events when the model detects harmful
20102008

20112009
## More inference examples
20122010

2013-
For more examples of how to use Mistral, see the following examples and tutorials:
2011+
For more examples of how to use Mistral models, see the following examples and tutorials:
20142012

20152013
| Description | Language | Sample |
20162014
|-------------------------------------------|-------------------|-----------------------------------------------------------------|
@@ -2024,7 +2022,7 @@ For more examples of how to use Mistral, see the following examples and tutorial
20242022
| LiteLLM | Python | [Link](https://aka.ms/mistral-large/litellm-sample) |
20252023

20262024

2027-
## Cost and quota considerations for Mistral family of models deployed as serverless API endpoints
2025+
## Cost and quota considerations for Mistral models deployed as serverless API endpoints
20282026

20292027
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
20302028
@@ -2041,4 +2039,4 @@ For more information on how to track costs, see [Monitor costs for models offere
20412039
* [Deploy models as serverless APIs](deploy-models-serverless.md)
20422040
* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md)
20432041
* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)
2044-
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
2042+
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)

0 commit comments

Comments
 (0)