Skip to content

Commit ac40dda

Browse files
committed
update azure ML article
1 parent 56e9ea4 commit ac40dda

File tree

1 file changed

+214
-1
lines changed

1 file changed

+214
-1
lines changed

articles/machine-learning/how-to-deploy-models-phi-3-5-vision.md

Lines changed: 214 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.service: azure-machine-learning
66
ms.subservice: inferencing
77
manager: scottpolly
88
ms.topic: how-to
9-
ms.date: 08/19/2024
9+
ms.date: 08/29/2024
1010
ms.reviewer: kritifaujdar
1111
reviewer: fkriti
1212
ms.author: mopeakande
@@ -40,6 +40,16 @@ To use Phi-3.5 chat model with vision with Azure Machine Learning, you need the
4040

4141
### A model deployment
4242

43+
**Deployment to serverless APIs**
44+
45+
Phi-3.5 chat model with vision can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
46+
47+
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure Machine Learning studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](how-to-deploy-models-serverless.md).
48+
49+
> [!div class="nextstepaction"]
50+
> [Deploy models as serverless API endpoints](how-to-deploy-models-serverless.md)
51+
52+
4353
**Deployment to a self-hosted managed compute**
4454

4555
Phi-3.5 chat model with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
@@ -102,6 +112,10 @@ client = ChatCompletionsClient(
102112
)
103113
```
104114

115+
> [!NOTE]
116+
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
117+
118+
105119
### Get the model's capabilities
106120

107121
The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method:
@@ -265,6 +279,44 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
265279
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
266280

267281

282+
### Apply content safety
283+
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
284+
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
285+
```csharp
286+
try
287+
{
288+
requestOptions = new ChatCompletionsOptions()
289+
{
290+
Messages = {
291+
new ChatRequestSystemMessage("You are an AI assistant that helps people find information."),
292+
new ChatRequestUserMessage(
293+
"Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
294+
),
295+
},
296+
};
297+
response = client.Complete(requestOptions);
298+
Console.WriteLine(response.Value.Choices[0].Message.Content);
299+
}
300+
catch (RequestFailedException ex)
301+
{
302+
if (ex.ErrorCode == "content_filter")
303+
{
304+
Console.WriteLine($"Your query has trigger Azure Content Safety: {ex.Message}");
305+
}
306+
else
307+
{
308+
throw;
309+
}
310+
}
311+
```
312+
313+
314+
> [!TIP]
315+
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
316+
317+
> [!NOTE]
318+
> Azure AI content safety is only available for models deployed as serverless API endpoints.
319+
268320
## Use chat completions with images
269321

270322
Phi-3.5-vision-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3.5-vision-Instruct for vision in a chat fashion:
@@ -360,6 +412,16 @@ To use Phi-3.5 chat model with vision with Azure Machine Learning studio, you ne
360412

361413
### A model deployment
362414

415+
**Deployment to serverless APIs**
416+
417+
Phi-3.5 chat model with vision can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
418+
419+
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure Machine Learning studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](how-to-deploy-models-serverless.md).
420+
421+
> [!div class="nextstepaction"]
422+
> [Deploy models as serverless API endpoints](how-to-deploy-models-serverless.md)
423+
424+
363425
**Deployment to a self-hosted managed compute**
364426

365427
Phi-3.5 chat model with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
@@ -420,6 +482,10 @@ const client = new ModelClient(
420482
);
421483
```
422484

485+
> [!NOTE]
486+
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
487+
488+
423489
### Get the model's capabilities
424490

425491
The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method:
@@ -602,6 +668,44 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
602668
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
603669
604670
671+
### Apply content safety
672+
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
673+
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
674+
```csharp
675+
try
676+
{
677+
requestOptions = new ChatCompletionsOptions()
678+
{
679+
Messages = {
680+
new ChatRequestSystemMessage("You are an AI assistant that helps people find information."),
681+
new ChatRequestUserMessage(
682+
"Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
683+
),
684+
},
685+
};
686+
response = client.Complete(requestOptions);
687+
Console.WriteLine(response.Value.Choices[0].Message.Content);
688+
}
689+
catch (RequestFailedException ex)
690+
{
691+
if (ex.ErrorCode == "content_filter")
692+
{
693+
Console.WriteLine($"Your query has trigger Azure Content Safety: {ex.Message}");
694+
}
695+
else
696+
{
697+
throw;
698+
}
699+
}
700+
```
701+
702+
703+
> [!TIP]
704+
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
705+
706+
> [!NOTE]
707+
> Azure AI content safety is only available for models deployed as serverless API endpoints.
708+
605709
## Use chat completions with images
606710
607711
Phi-3.5-vision-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3.5-vision-Instruct for vision in a chat fashion:
@@ -703,6 +807,16 @@ To use Phi-3.5 chat model with vision with Azure Machine Learning studio, you ne
703807
704808
### A model deployment
705809
810+
**Deployment to serverless APIs**
811+
812+
Phi-3.5 chat model with vision can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
813+
814+
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure Machine Learning studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](how-to-deploy-models-serverless.md).
815+
816+
> [!div class="nextstepaction"]
817+
> [Deploy models as serverless API endpoints](how-to-deploy-models-serverless.md)
818+
819+
706820
**Deployment to a self-hosted managed compute**
707821
708822
Phi-3.5 chat model with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
@@ -778,6 +892,10 @@ client = new ChatCompletionsClient(
778892
);
779893
```
780894

895+
> [!NOTE]
896+
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
897+
898+
781899
### Get the model's capabilities
782900
783901
The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method:
@@ -957,6 +1075,44 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
9571075
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
9581076
9591077
1078+
### Apply content safety
1079+
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
1080+
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
1081+
```csharp
1082+
try
1083+
{
1084+
requestOptions = new ChatCompletionsOptions()
1085+
{
1086+
Messages = {
1087+
new ChatRequestSystemMessage("You are an AI assistant that helps people find information."),
1088+
new ChatRequestUserMessage(
1089+
"Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
1090+
),
1091+
},
1092+
};
1093+
response = client.Complete(requestOptions);
1094+
Console.WriteLine(response.Value.Choices[0].Message.Content);
1095+
}
1096+
catch (RequestFailedException ex)
1097+
{
1098+
if (ex.ErrorCode == "content_filter")
1099+
{
1100+
Console.WriteLine($"Your query has trigger Azure Content Safety: {ex.Message}");
1101+
}
1102+
else
1103+
{
1104+
throw;
1105+
}
1106+
}
1107+
```
1108+
1109+
1110+
> [!TIP]
1111+
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
1112+
1113+
> [!NOTE]
1114+
> Azure AI content safety is only available for models deployed as serverless API endpoints.
1115+
9601116
## Use chat completions with images
9611117
9621118
Phi-3.5-vision-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3.5-vision-Instruct for vision in a chat fashion:
@@ -1043,6 +1199,16 @@ To use Phi-3.5 chat model with vision with Azure Machine Learning studio, you ne
10431199

10441200
### A model deployment
10451201

1202+
**Deployment to serverless APIs**
1203+
1204+
Phi-3.5 chat model with vision can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
1205+
1206+
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure Machine Learning studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](how-to-deploy-models-serverless.md).
1207+
1208+
> [!div class="nextstepaction"]
1209+
> [Deploy models as serverless API endpoints](how-to-deploy-models-serverless.md)
1210+
1211+
10461212
**Deployment to a self-hosted managed compute**
10471213

10481214
Phi-3.5 chat model with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
@@ -1072,6 +1238,9 @@ First, create the client to consume the model. The following code uses an endpoi
10721238
10731239
When you deploy the model to a self-hosted online endpoint with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
10741240
1241+
> [!NOTE]
1242+
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
1243+
10751244
### Get the model's capabilities
10761245

10771246
The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method:
@@ -1322,6 +1491,47 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
13221491
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
13231492

13241493

1494+
### Apply content safety
1495+
1496+
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
1497+
1498+
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
1499+
1500+
1501+
```json
1502+
{
1503+
"messages": [
1504+
{
1505+
"role": "system",
1506+
"content": "You are an AI assistant that helps people find information."
1507+
},
1508+
{
1509+
"role": "user",
1510+
"content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
1511+
}
1512+
]
1513+
}
1514+
```
1515+
1516+
1517+
```json
1518+
{
1519+
"error": {
1520+
"message": "The response was filtered due to the prompt triggering Microsoft's content management policy. Please modify your prompt and retry.",
1521+
"type": null,
1522+
"param": "prompt",
1523+
"code": "content_filter",
1524+
"status": 400
1525+
}
1526+
}
1527+
```
1528+
1529+
> [!TIP]
1530+
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
1531+
1532+
> [!NOTE]
1533+
> Azure AI content safety is only available for models deployed as serverless API endpoints.
1534+
13251535
## Use chat completions with images
13261536

13271537
Phi-3.5-vision-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3.5-vision-Instruct for vision in a chat fashion:
@@ -1412,6 +1622,9 @@ For more examples of how to use Phi-3 family models, see the following examples
14121622
| LiteLLM | Python | [Link](https://aka.ms/phi-3/litellm-sample) |
14131623
14141624
1625+
## Cost and quota considerations for Phi-3 family models deployed as serverless API endpoints
1626+
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
1627+
14151628
## Cost and quota considerations for Phi-3 family models deployed to managed compute
14161629
14171630
Phi-3 family models deployed to managed compute are billed based on core hours of the associated compute instance. The cost of the compute instance is determined by the size of the instance, the number of instances running, and the run duration.

0 commit comments

Comments
 (0)