You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/deploy-models-phi-3-5-vision.md
+216-3Lines changed: 216 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ description: Learn how to use Phi-3.5 chat model with vision with Azure AI Studi
5
5
ms.service: azure-ai-studio
6
6
manager: scottpolly
7
7
ms.topic: how-to
8
-
ms.date: 08/19/2024
8
+
ms.date: 08/29/2024
9
9
ms.reviewer: kritifaujdar
10
10
reviewer: fkriti
11
11
ms.author: mopeakande
@@ -41,6 +41,15 @@ To use Phi-3.5 chat model with vision with Azure AI Studio, you need the followi
41
41
42
42
### A model deployment
43
43
44
+
**Deployment to serverless APIs**
45
+
46
+
Phi-3.5 chat model with vision can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
47
+
48
+
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
49
+
50
+
> [!div class="nextstepaction"]
51
+
> [Deploy the model to serverless API endpoints](deploy-models-serverless.md)
52
+
44
53
**Deployment to a self-hosted managed compute**
45
54
46
55
Phi-3.5 chat model with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
117
+
106
118
### Get the model's capabilities
107
119
108
120
The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method:
@@ -215,7 +227,7 @@ print_stream(result)
215
227
Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference).
216
228
217
229
```python
218
-
from azure.ai.inference.models importChatCompletionsResponseFormatText
230
+
from azure.ai.inference.models importChatCompletionsResponseFormat
@@ -266,6 +278,42 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
266
278
|`n`| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. |`int`|
267
279
268
280
281
+
### Apply content safety
282
+
283
+
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
284
+
285
+
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
286
+
287
+
288
+
```python
289
+
from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage
290
+
291
+
try:
292
+
response = client.complete(
293
+
messages=[
294
+
SystemMessage(content="You are an AI assistant that helps people find information."),
295
+
UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."),
print(f"Your request triggered an {response['error']['code']} error:\n\t{response['error']['message']}")
306
+
else:
307
+
raise
308
+
raise
309
+
```
310
+
311
+
> [!TIP]
312
+
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
313
+
314
+
> [!NOTE]
315
+
> Azure AI content safety is only available for models deployed as serverless API endpoints.
316
+
269
317
## Use chat completions with images
270
318
271
319
Phi-3.5-vision-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3.5-vision-Instruct for vision in a chat fashion:
@@ -361,6 +409,15 @@ To use Phi-3.5 chat model with vision with Azure AI Studio, you need the followi
361
409
362
410
### A model deployment
363
411
412
+
**Deployment to serverless APIs**
413
+
414
+
Phi-3.5 chat model with vision can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
415
+
416
+
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
417
+
418
+
> [!div class="nextstepaction"]
419
+
> [Deploy the model to serverless API endpoints](deploy-models-serverless.md)
420
+
364
421
**Deployment to a self-hosted managed compute**
365
422
366
423
Phi-3.5 chat model with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
@@ -421,6 +478,9 @@ const client = new ModelClient(
421
478
);
422
479
```
423
480
481
+
> [!NOTE]
482
+
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
483
+
424
484
### Get the model's capabilities
425
485
426
486
The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method:
@@ -603,6 +663,48 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
603
663
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
604
664
605
665
666
+
### Apply content safety
667
+
668
+
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
669
+
670
+
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
671
+
672
+
673
+
```javascript
674
+
try {
675
+
var messages = [
676
+
{ role:"system", content:"You are an AI assistant that helps people find information." },
677
+
{ role:"user", content:"Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." },
678
+
];
679
+
680
+
var response =awaitclient.path("/chat/completions").post({
var response =JSON.parse(error.response._content);
691
+
if (response.error) {
692
+
console.log(`Your request triggered an ${response.error.code} error:\n\t${response.error.message}`);
693
+
}
694
+
else
695
+
{
696
+
throw error;
697
+
}
698
+
}
699
+
}
700
+
```
701
+
702
+
> [!TIP]
703
+
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
704
+
705
+
> [!NOTE]
706
+
> Azure AI content safety is only available for models deployed as serverless API endpoints.
707
+
606
708
## Use chat completions with images
607
709
608
710
Phi-3.5-vision-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3.5-vision-Instruct for vision in a chat fashion:
@@ -704,6 +806,15 @@ To use Phi-3.5 chat model with vision with Azure AI Studio, you need the followi
704
806
705
807
### A model deployment
706
808
809
+
**Deployment to serverless APIs**
810
+
811
+
Phi-3.5 chat model with vision can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
812
+
813
+
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
814
+
815
+
> [!div class="nextstepaction"]
816
+
> [Deploy the model to serverless API endpoints](deploy-models-serverless.md)
817
+
707
818
**Deployment to a self-hosted managed compute**
708
819
709
820
Phi-3.5 chat model with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
@@ -779,6 +890,9 @@ client = new ChatCompletionsClient(
779
890
);
780
891
```
781
892
893
+
> [!NOTE]
894
+
> Currently, serverless API endpoints do not support using Microsoft Entra IDfor authentication.
895
+
782
896
### Get the model's capabilities
783
897
784
898
The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method:
@@ -958,6 +1072,48 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
958
1072
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
959
1073
960
1074
1075
+
### Apply content safety
1076
+
1077
+
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
1078
+
1079
+
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
1080
+
1081
+
1082
+
```csharp
1083
+
try
1084
+
{
1085
+
requestOptions = new ChatCompletionsOptions()
1086
+
{
1087
+
Messages = {
1088
+
new ChatRequestSystemMessage("You are an AI assistant that helps people find information."),
1089
+
new ChatRequestUserMessage(
1090
+
"Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
Console.WriteLine($"Your query has trigger Azure Content Safety: {ex.Message}");
1103
+
}
1104
+
else
1105
+
{
1106
+
throw;
1107
+
}
1108
+
}
1109
+
```
1110
+
1111
+
> [!TIP]
1112
+
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
1113
+
1114
+
> [!NOTE]
1115
+
> Azure AI content safety is only available for models deployed as serverless API endpoints.
1116
+
961
1117
## Use chat completions with images
962
1118
963
1119
Phi-3.5-vision-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3.5-vision-Instruct for vision in a chat fashion:
@@ -1044,6 +1200,15 @@ To use Phi-3.5 chat model with vision with Azure AI Studio, you need the followi
1044
1200
1045
1201
### A model deployment
1046
1202
1203
+
**Deployment to serverless APIs**
1204
+
1205
+
Phi-3.5 chat model with vision can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
1206
+
1207
+
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDKfor Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
1208
+
1209
+
> [!div class="nextstepaction"]
1210
+
> [Deploy the model to serverless API endpoints](deploy-models-serverless.md)
1211
+
1047
1212
**Deployment to a self-hosted managed compute**
1048
1213
1049
1214
Phi-3.5 chat model with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
@@ -1073,6 +1238,9 @@ First, create the client to consume the model. The following code uses an endpoi
1073
1238
1074
1239
When you deploy the model to a self-hosted online endpoint with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
1075
1240
1241
+
> [!NOTE]
1242
+
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
1243
+
1076
1244
### Get the model's capabilities
1077
1245
1078
1246
The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method:
@@ -1323,6 +1491,47 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
1323
1491
|`n`| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. |`int`|
1324
1492
1325
1493
1494
+
### Apply content safety
1495
+
1496
+
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
1497
+
1498
+
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
1499
+
1500
+
1501
+
```json
1502
+
{
1503
+
"messages": [
1504
+
{
1505
+
"role": "system",
1506
+
"content": "You are an AI assistant that helps people find information."
1507
+
},
1508
+
{
1509
+
"role": "user",
1510
+
"content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
1511
+
}
1512
+
]
1513
+
}
1514
+
```
1515
+
1516
+
1517
+
```json
1518
+
{
1519
+
"error": {
1520
+
"message": "The response was filtered due to the prompt triggering Microsoft's content management policy. Please modify your prompt and retry.",
1521
+
"type": null,
1522
+
"param": "prompt",
1523
+
"code": "content_filter",
1524
+
"status": 400
1525
+
}
1526
+
}
1527
+
```
1528
+
1529
+
> [!TIP]
1530
+
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
1531
+
1532
+
> [!NOTE]
1533
+
> Azure AI content safety is only available for models deployed as serverless API endpoints.
1534
+
1326
1535
## Use chat completions with images
1327
1536
1328
1537
Phi-3.5-vision-Instruct can reason across text and images and generate text completions based on both kinds ofinput. Inthis section, you explore the capabilities of Phi-3.5-vision-Instruct for vision in a chat fashion:
@@ -1413,6 +1622,10 @@ For more examples of how to use Phi-3 family models, see the following examples
## Cost and quota considerations for Phi-3 family models deployed as serverless API endpoints
1626
+
1627
+
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
1628
+
1416
1629
## Cost and quota considerations for Phi-3 family models deployed to managed compute
1417
1630
1418
1631
Phi-3 family models deployed to managed compute are billed based on core hours of the associated compute instance. The cost of the compute instance is determined by the size of the instance, the number of instances running, and the run duration.
0 commit comments