Skip to content

Commit 2a015e2

Browse files
authored
Revert "Fix response format for chat completions models (AI Studio)"
1 parent 4e3e48c commit 2a015e2

9 files changed

+74
-70
lines changed

articles/ai-studio/how-to/deploy-models-jais.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ print_stream(result)
201201
Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference).
202202

203203
```python
204-
from azure.ai.inference.models import ChatCompletionsResponseFormat
204+
from azure.ai.inference.models import ChatCompletionsResponseFormatText
205205

206206
response = client.complete(
207207
messages=[
@@ -214,12 +214,12 @@ response = client.complete(
214214
stop=["<|endoftext|>"],
215215
temperature=0,
216216
top_p=1,
217-
response_format={ "type": ChatCompletionsResponseFormat.TEXT },
217+
response_format=ChatCompletionsResponseFormatText(),
218218
)
219219
```
220220

221221
> [!WARNING]
222-
> Jais models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
222+
> Jais doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
223223
224224
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
225225

@@ -482,7 +482,7 @@ var response = await client.path("/chat/completions").post({
482482
```
483483
484484
> [!WARNING]
485-
> Jais models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
485+
> Jais doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
486486
487487
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
488488
@@ -580,7 +580,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip
580580
581581
### The inference package installed
582582
583-
You can consume predictions from this model by using the `Azure.AI.Inference` package from [NuGet](https://www.nuget.org/). To install this package, you need the following prerequisites:
583+
You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites:
584584
585585
* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2).
586586
* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string.
@@ -606,7 +606,7 @@ using Azure.Identity;
606606
using Azure.AI.Inference;
607607
```
608608

609-
This example also uses the following namespaces but you may not always need them:
609+
This example also use the following namespaces but you may not always need them:
610610

611611

612612
```csharp
@@ -775,7 +775,7 @@ Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}");
775775
```
776776
777777
> [!WARNING]
778-
> Jais models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
778+
> Jais doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
779779

780780
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
781781
@@ -1088,7 +1088,7 @@ Explore other parameters that you can specify in the inference client. For a ful
10881088
```
10891089

10901090
> [!WARNING]
1091-
> Jais models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
1091+
> Jais doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
10921092
10931093
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
10941094

@@ -1165,14 +1165,14 @@ The following example shows how to handle events when the model detects harmful
11651165

11661166
## More inference examples
11671167

1168-
For more examples of how to use Jais models, see the following examples and tutorials:
1168+
For more examples of how to use Jais, see the following examples and tutorials:
11691169

11701170
| Description | Language | Sample |
11711171
|-------------------------------------------|-------------------|-----------------------------------------------------------------|
11721172
| Azure AI Inference package for JavaScript | JavaScript | [Link](https://aka.ms/azsdk/azure-ai-inference/javascript/samples) |
11731173
| Azure AI Inference package for Python | Python | [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) |
11741174

1175-
## Cost and quota considerations for Jais models deployed as serverless API endpoints
1175+
## Cost and quota considerations for Jais family of models deployed as serverless API endpoints
11761176

11771177
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
11781178
@@ -1189,4 +1189,4 @@ For more information on how to track costs, see [Monitor costs for models offere
11891189
* [Deploy models as serverless APIs](deploy-models-serverless.md)
11901190
* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md)
11911191
* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)
1192-
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
1192+
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)

articles/ai-studio/how-to/deploy-models-llama.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -255,7 +255,7 @@ print_stream(result)
255255
Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference).
256256

257257
```python
258-
from azure.ai.inference.models import ChatCompletionsResponseFormat
258+
from azure.ai.inference.models import ChatCompletionsResponseFormatText
259259

260260
response = client.complete(
261261
messages=[
@@ -268,12 +268,12 @@ response = client.complete(
268268
stop=["<|endoftext|>"],
269269
temperature=0,
270270
top_p=1,
271-
response_format={ "type": ChatCompletionsResponseFormat.TEXT },
271+
response_format=ChatCompletionsResponseFormatText(),
272272
)
273273
```
274274

275275
> [!WARNING]
276-
> Meta Llama models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
276+
> Meta Llama doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
277277
278278
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
279279

@@ -610,7 +610,7 @@ var response = await client.path("/chat/completions").post({
610610
```
611611
612612
> [!WARNING]
613-
> Meta Llama models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
613+
> Meta Llama doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
614614
615615
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
616616
@@ -765,7 +765,7 @@ For deployment to a self-hosted managed compute, you must have enough quota in y
765765
766766
### The inference package installed
767767
768-
You can consume predictions from this model by using the `Azure.AI.Inference` package from [NuGet](https://www.nuget.org/). To install this package, you need the following prerequisites:
768+
You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites:
769769
770770
* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2).
771771
* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string.
@@ -791,7 +791,7 @@ using Azure.Identity;
791791
using Azure.AI.Inference;
792792
```
793793

794-
This example also uses the following namespaces but you may not always need them:
794+
This example also use the following namespaces but you may not always need them:
795795

796796

797797
```csharp
@@ -973,7 +973,7 @@ Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}");
973973
```
974974
975975
> [!WARNING]
976-
> Meta Llama models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
976+
> Meta Llama doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
977977

978978
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
979979
@@ -1348,7 +1348,7 @@ Explore other parameters that you can specify in the inference client. For a ful
13481348
```
13491349

13501350
> [!WARNING]
1351-
> Meta Llama models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
1351+
> Meta Llama doesn't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
13521352
13531353
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
13541354

@@ -1441,7 +1441,7 @@ The following example shows how to handle events when the model detects harmful
14411441
14421442
## More inference examples
14431443
1444-
For more examples of how to use Meta Llama models, see the following examples and tutorials:
1444+
For more examples of how to use Meta Llama, see the following examples and tutorials:
14451445
14461446
| Description | Language | Sample |
14471447
|-------------------------------------------|-------------------|------------------------------------------------------------------- |
@@ -1453,7 +1453,7 @@ For more examples of how to use Meta Llama models, see the following examples an
14531453
| LangChain | Python | [Link](https://aka.ms/meta-llama-3.1-405B-instruct-langchain) |
14541454
| LiteLLM | Python | [Link](https://aka.ms/meta-llama-3.1-405B-instruct-litellm) |
14551455
1456-
## Cost and quota considerations for Meta Llama models deployed as serverless API endpoints
1456+
## Cost and quota considerations for Meta Llama family of models deployed as serverless API endpoints
14571457
14581458
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
14591459
@@ -1463,7 +1463,7 @@ Each time a project subscribes to a given offer from the Azure Marketplace, a ne
14631463
14641464
For more information on how to track costs, see [Monitor costs for models offered through the Azure Marketplace](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace).
14651465
1466-
## Cost and quota considerations for Meta Llama models deployed to managed compute
1466+
## Cost and quota considerations for Meta Llama family of models deployed to managed compute
14671467
14681468
Meta Llama models deployed to managed compute are billed based on core hours of the associated compute instance. The cost of the compute instance is determined by the size of the instance, the number of instances running, and the run duration.
14691469

articles/ai-studio/how-to/deploy-models-mistral-nemo.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn how to use Mistral Nemo chat model with Azure AI Studio.
55
ms.service: azure-ai-studio
66
manager: scottpolly
77
ms.topic: how-to
8-
ms.date: 09/12/2024
8+
ms.date: 08/08/2024
99
ms.reviewer: kritifaujdar
1010
reviewer: fkriti
1111
ms.author: mopeakande
@@ -209,7 +209,7 @@ print_stream(result)
209209
Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference).
210210

211211
```python
212-
from azure.ai.inference.models import ChatCompletionsResponseFormat
212+
from azure.ai.inference.models import ChatCompletionsResponseFormatText
213213

214214
response = client.complete(
215215
messages=[
@@ -222,7 +222,7 @@ response = client.complete(
222222
stop=["<|endoftext|>"],
223223
temperature=0,
224224
top_p=1,
225-
response_format={ "type": ChatCompletionsResponseFormat.TEXT },
225+
response_format=ChatCompletionsResponseFormatText(),
226226
)
227227
```
228228

@@ -234,13 +234,15 @@ Mistral Nemo chat model can create JSON outputs. Set `response_format` to `json_
234234

235235

236236
```python
237+
from azure.ai.inference.models import ChatCompletionsResponseFormatJSON
238+
237239
response = client.complete(
238240
messages=[
239241
SystemMessage(content="You are a helpful assistant that always generate responses in JSON format, using."
240242
" the following format: { ""answer"": ""response"" }."),
241243
UserMessage(content="How many languages are in the world?"),
242244
],
243-
response_format={ "type": ChatCompletionsResponseFormat.JSON_OBJECT }
245+
response_format=ChatCompletionsResponseFormatJSON()
244246
)
245247
```
246248

@@ -960,7 +962,7 @@ Deployment to a serverless API endpoint doesn't require quota from your subscrip
960962
961963
### The inference package installed
962964
963-
You can consume predictions from this model by using the `Azure.AI.Inference` package from [NuGet](https://www.nuget.org/). To install this package, you need the following prerequisites:
965+
You can consume predictions from this model by using the `Azure.AI.Inference` package from [Nuget](https://www.nuget.org/). To install this package, you need the following prerequisites:
964966
965967
* The endpoint URL. To construct the client library, you need to pass in the endpoint URL. The endpoint URL has the form `https://your-host-name.your-azure-region.inference.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (for example, eastus2).
966968
* Depending on your model deployment and authentication preference, you need either a key to authenticate against the service, or Microsoft Entra ID credentials. The key is a 32-character string.
@@ -986,7 +988,7 @@ using Azure.Identity;
986988
using Azure.AI.Inference;
987989
```
988990

989-
This example also uses the following namespaces but you may not always need them:
991+
This example also use the following namespaces but you may not always need them:
990992

991993

992994
```csharp
@@ -2008,7 +2010,7 @@ The following example shows how to handle events when the model detects harmful
20082010

20092011
## More inference examples
20102012

2011-
For more examples of how to use Mistral models, see the following examples and tutorials:
2013+
For more examples of how to use Mistral, see the following examples and tutorials:
20122014

20132015
| Description | Language | Sample |
20142016
|-------------------------------------------|-------------------|-----------------------------------------------------------------|
@@ -2022,7 +2024,7 @@ For more examples of how to use Mistral models, see the following examples and t
20222024
| LiteLLM | Python | [Link](https://aka.ms/mistral-large/litellm-sample) |
20232025

20242026

2025-
## Cost and quota considerations for Mistral models deployed as serverless API endpoints
2027+
## Cost and quota considerations for Mistral family of models deployed as serverless API endpoints
20262028

20272029
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
20282030
@@ -2039,4 +2041,4 @@ For more information on how to track costs, see [Monitor costs for models offere
20392041
* [Deploy models as serverless APIs](deploy-models-serverless.md)
20402042
* [Consume serverless API endpoints from a different Azure AI Studio project or hub](deploy-models-serverless-connect.md)
20412043
* [Region availability for models in serverless API endpoints](deploy-models-serverless-availability.md)
2042-
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)
2044+
* [Plan and manage costs (marketplace)](costs-plan-manage.md#monitor-costs-for-models-offered-through-the-azure-marketplace)

0 commit comments

Comments
 (0)