Skip to content

Commit 0d3e3d5

Browse files
committed
add Phi 3.5 MoE and vision (MaaP only)
1 parent 5e9708d commit 0d3e3d5

File tree

2 files changed

+1151
-213
lines changed

2 files changed

+1151
-213
lines changed

articles/ai-studio/how-to/deploy-models-phi-3-5-vision.md

Lines changed: 0 additions & 213 deletions
Original file line numberDiff line numberDiff line change
@@ -39,15 +39,6 @@ To use Phi-3.5 chat model with vision with Azure AI Studio, you need the followi
3939

4040
### A model deployment
4141

42-
**Deployment to serverless APIs**
43-
44-
Phi-3.5 chat model with vision can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
45-
46-
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
47-
48-
> [!div class="nextstepaction"]
49-
> [Deploy the model to serverless API endpoints](deploy-models-serverless.md)
50-
5142
**Deployment to a self-hosted managed compute**
5243

5344
Phi-3.5 chat model with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
@@ -110,9 +101,6 @@ client = ChatCompletionsClient(
110101
)
111102
```
112103

113-
> [!NOTE]
114-
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
115-
116104
### Get the model's capabilities
117105

118106
The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method:
@@ -276,42 +264,6 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
276264
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
277265

278266

279-
### Apply content safety
280-
281-
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
282-
283-
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
284-
285-
286-
```python
287-
from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage
288-
289-
try:
290-
response = client.complete(
291-
messages=[
292-
SystemMessage(content="You are an AI assistant that helps people find information."),
293-
UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."),
294-
]
295-
)
296-
297-
print(response.choices[0].message.content)
298-
299-
except HttpResponseError as ex:
300-
if ex.status_code == 400:
301-
response = ex.response.json()
302-
if isinstance(response, dict) and "error" in response:
303-
print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}")
304-
else:
305-
raise
306-
raise
307-
```
308-
309-
> [!TIP]
310-
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
311-
312-
> [!NOTE]
313-
> Azure AI content safety is only available for models deployed as serverless API endpoints.
314-
315267
## Use chat completions with images
316268

317269
Phi-3.5-vision-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3.5-vision-Instruct for vision in a chat fashion:
@@ -407,15 +359,6 @@ To use Phi-3.5 chat model with vision with Azure AI Studio, you need the followi
407359

408360
### A model deployment
409361

410-
**Deployment to serverless APIs**
411-
412-
Phi-3.5 chat model with vision can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
413-
414-
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
415-
416-
> [!div class="nextstepaction"]
417-
> [Deploy the model to serverless API endpoints](deploy-models-serverless.md)
418-
419362
**Deployment to a self-hosted managed compute**
420363

421364
Phi-3.5 chat model with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
@@ -476,9 +419,6 @@ const client = new ModelClient(
476419
);
477420
```
478421

479-
> [!NOTE]
480-
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
481-
482422
### Get the model's capabilities
483423

484424
The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method:
@@ -661,48 +601,6 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
661601
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
662602
663603
664-
### Apply content safety
665-
666-
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
667-
668-
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
669-
670-
671-
```javascript
672-
try {
673-
var messages = [
674-
{ role: "system", content: "You are an AI assistant that helps people find information." },
675-
{ role: "user", content: "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills." },
676-
];
677-
678-
var response = await client.path("/chat/completions").post({
679-
body: {
680-
messages: messages,
681-
}
682-
});
683-
684-
console.log(response.body.choices[0].message.content);
685-
}
686-
catch (error) {
687-
if (error.status_code == 400) {
688-
var response = JSON.parse(error.response._content);
689-
if (response.error) {
690-
console.log(`Your request triggered an ${response.error.code} error:\n\t ${response.error.message}`);
691-
}
692-
else
693-
{
694-
throw error;
695-
}
696-
}
697-
}
698-
```
699-
700-
> [!TIP]
701-
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
702-
703-
> [!NOTE]
704-
> Azure AI content safety is only available for models deployed as serverless API endpoints.
705-
706604
## Use chat completions with images
707605
708606
Phi-3.5-vision-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3.5-vision-Instruct for vision in a chat fashion:
@@ -804,15 +702,6 @@ To use Phi-3.5 chat model with vision with Azure AI Studio, you need the followi
804702
805703
### A model deployment
806704
807-
**Deployment to serverless APIs**
808-
809-
Phi-3.5 chat model with vision can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
810-
811-
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
812-
813-
> [!div class="nextstepaction"]
814-
> [Deploy the model to serverless API endpoints](deploy-models-serverless.md)
815-
816705
**Deployment to a self-hosted managed compute**
817706
818707
Phi-3.5 chat model with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
@@ -888,9 +777,6 @@ client = new ChatCompletionsClient(
888777
);
889778
```
890779

891-
> [!NOTE]
892-
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
893-
894780
### Get the model's capabilities
895781
896782
The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method:
@@ -1070,48 +956,6 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
1070956
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
1071957
1072958
1073-
### Apply content safety
1074-
1075-
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
1076-
1077-
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
1078-
1079-
1080-
```csharp
1081-
try
1082-
{
1083-
requestOptions = new ChatCompletionsOptions()
1084-
{
1085-
Messages = {
1086-
new ChatRequestSystemMessage("You are an AI assistant that helps people find information."),
1087-
new ChatRequestUserMessage(
1088-
"Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
1089-
),
1090-
},
1091-
};
1092-
1093-
response = client.Complete(requestOptions);
1094-
Console.WriteLine(response.Value.Choices[0].Message.Content);
1095-
}
1096-
catch (RequestFailedException ex)
1097-
{
1098-
if (ex.ErrorCode == "content_filter")
1099-
{
1100-
Console.WriteLine($"Your query has trigger Azure Content Safety: {ex.Message}");
1101-
}
1102-
else
1103-
{
1104-
throw;
1105-
}
1106-
}
1107-
```
1108-
1109-
> [!TIP]
1110-
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
1111-
1112-
> [!NOTE]
1113-
> Azure AI content safety is only available for models deployed as serverless API endpoints.
1114-
1115959
## Use chat completions with images
1116960
1117961
Phi-3.5-vision-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3.5-vision-Instruct for vision in a chat fashion:
@@ -1198,15 +1042,6 @@ To use Phi-3.5 chat model with vision with Azure AI Studio, you need the followi
11981042

11991043
### A model deployment
12001044

1201-
**Deployment to serverless APIs**
1202-
1203-
Phi-3.5 chat model with vision can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need.
1204-
1205-
Deployment to a serverless API endpoint doesn't require quota from your subscription. If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to [deploy the model as a serverless API](deploy-models-serverless.md).
1206-
1207-
> [!div class="nextstepaction"]
1208-
> [Deploy the model to serverless API endpoints](deploy-models-serverless.md)
1209-
12101045
**Deployment to a self-hosted managed compute**
12111046

12121047
Phi-3.5 chat model with vision can be deployed to our self-hosted managed inference solution, which allows you to customize and control all the details about how the model is served.
@@ -1236,9 +1071,6 @@ First, create the client to consume the model. The following code uses an endpoi
12361071
12371072
When you deploy the model to a self-hosted online endpoint with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
12381073
1239-
> [!NOTE]
1240-
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
1241-
12421074
### Get the model's capabilities
12431075

12441076
The `/info` route returns information about the model that is deployed to the endpoint. Return the model's information by calling the following method:
@@ -1489,47 +1321,6 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
14891321
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
14901322

14911323

1492-
### Apply content safety
1493-
1494-
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
1495-
1496-
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
1497-
1498-
1499-
```json
1500-
{
1501-
"messages": [
1502-
{
1503-
"role": "system",
1504-
"content": "You are an AI assistant that helps people find information."
1505-
},
1506-
{
1507-
"role": "user",
1508-
"content": "Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."
1509-
}
1510-
]
1511-
}
1512-
```
1513-
1514-
1515-
```json
1516-
{
1517-
"error": {
1518-
"message": "The response was filtered due to the prompt triggering Microsoft's content management policy. Please modify your prompt and retry.",
1519-
"type": null,
1520-
"param": "prompt",
1521-
"code": "content_filter",
1522-
"status": 400
1523-
}
1524-
}
1525-
```
1526-
1527-
> [!TIP]
1528-
> To learn more about how you can configure and control Azure AI content safety settings, check the [Azure AI content safety documentation](https://aka.ms/azureaicontentsafety).
1529-
1530-
> [!NOTE]
1531-
> Azure AI content safety is only available for models deployed as serverless API endpoints.
1532-
15331324
## Use chat completions with images
15341325

15351326
Phi-3.5-vision-Instruct can reason across text and images and generate text completions based on both kinds of input. In this section, you explore the capabilities of Phi-3.5-vision-Instruct for vision in a chat fashion:
@@ -1620,10 +1411,6 @@ For more examples of how to use Phi-3 family models, see the following examples
16201411
| LiteLLM | Python | [Link](https://aka.ms/phi-3/litellm-sample) |
16211412
16221413
1623-
## Cost and quota considerations for Phi-3 family models deployed as serverless API endpoints
1624-
1625-
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
1626-
16271414
## Cost and quota considerations for Phi-3 family models deployed to managed compute
16281415
16291416
Phi-3 family models deployed to managed compute are billed based on core hours of the associated compute instance. The cost of the compute instance is determined by the size of the instance, the number of instances running, and the run duration.

0 commit comments

Comments
 (0)