Skip to content

Commit 1a8cbce

Browse files
Merge pull request #2231 from msakande/phi-4-maas-support
Phi 4 maas support
2 parents 4f27b8e + e58dc6e commit 1a8cbce

File tree

6 files changed

+325
-111
lines changed

6 files changed

+325
-111
lines changed

articles/ai-studio/how-to/deploy-models-phi-3-5-vision.md

Lines changed: 24 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ client = ChatCompletionsClient(
113113
```
114114

115115
> [!NOTE]
116-
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
116+
> Currently, serverless API endpoints don't support using Microsoft Entra ID for authentication.
117117
118118
### Get the model's capabilities
119119

@@ -227,7 +227,7 @@ print_stream(result)
227227
Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference).
228228

229229
```python
230-
from azure.ai.inference.models import ChatCompletionsResponseFormat
230+
from azure.ai.inference.models import ChatCompletionsResponseFormatText
231231

232232
response = client.complete(
233233
messages=[
@@ -240,12 +240,12 @@ response = client.complete(
240240
stop=["<|endoftext|>"],
241241
temperature=0,
242242
top_p=1,
243-
response_format={ "type": ChatCompletionsResponseFormat.TEXT },
243+
response_format={ "type": ChatCompletionsResponseFormatText() },
244244
)
245245
```
246246

247247
> [!WARNING]
248-
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
248+
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs aren't guaranteed to be valid JSON.
249249
250250
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
251251

@@ -272,10 +272,10 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
272272

273273
| Name | Description | Type |
274274
| -------------- | --------------------- | --------------- |
275-
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
275+
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
276276
| `logprobs` | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. | `bool` |
277277
| `top_logprobs` | An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. | `int` |
278-
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
278+
| `n` | How many chat completion choices to generate for each input message. You're charged based on the number of generated tokens across all of the choices. | `int` |
279279

280280

281281
### Apply content safety
@@ -479,7 +479,7 @@ const client = new ModelClient(
479479
```
480480

481481
> [!NOTE]
482-
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
482+
> Currently, serverless API endpoints don't support using Microsoft Entra ID for authentication.
483483
484484
### Get the model's capabilities
485485

@@ -625,7 +625,7 @@ var response = await client.path("/chat/completions").post({
625625
```
626626
627627
> [!WARNING]
628-
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
628+
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs aren't guaranteed to be valid JSON.
629629
630630
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
631631
@@ -657,10 +657,10 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
657657
658658
| Name | Description | Type |
659659
| -------------- | --------------------- | --------------- |
660-
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
660+
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
661661
| `logprobs` | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. | `bool` |
662662
| `top_logprobs` | An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. | `int` |
663-
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
663+
| `n` | How many chat completion choices to generate for each input message. You're charged based on the number of generated tokens across all of the choices. | `int` |
664664
665665
666666
### Apply content safety
@@ -891,7 +891,7 @@ client = new ChatCompletionsClient(
891891
```
892892

893893
> [!NOTE]
894-
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
894+
> Currently, serverless API endpoints don't support using Microsoft Entra ID for authentication.
895895
896896
### Get the model's capabilities
897897

@@ -1037,7 +1037,7 @@ Console.WriteLine($"Response: {response.Value.Choices[0].Message.Content}");
10371037
```
10381038

10391039
> [!WARNING]
1040-
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
1040+
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs aren't guaranteed to be valid JSON.
10411041

10421042
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
10431043
@@ -1066,10 +1066,10 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
10661066
10671067
| Name | Description | Type |
10681068
| -------------- | --------------------- | --------------- |
1069-
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
1069+
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
10701070
| `logprobs` | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. | `bool` |
10711071
| `top_logprobs` | An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. | `int` |
1072-
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
1072+
| `n` | How many chat completion choices to generate for each input message. You're charged based on the number of generated tokens across all of the choices. | `int` |
10731073

10741074

10751075
### Apply content safety
@@ -1239,7 +1239,7 @@ First, create the client to consume the model. The following code uses an endpoi
12391239
When you deploy the model to a self-hosted online endpoint with **Microsoft Entra ID** support, you can use the following code snippet to create a client.
12401240

12411241
> [!NOTE]
1242-
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
1242+
> Currently, serverless API endpoints don't support using Microsoft Entra ID for authentication.
12431243
12441244
### Get the model's capabilities
12451245

@@ -1446,7 +1446,7 @@ Explore other parameters that you can specify in the inference client. For a ful
14461446
```
14471447

14481448
> [!WARNING]
1449-
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
1449+
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs aren't guaranteed to be valid JSON.
14501450

14511451
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
14521452
@@ -1485,10 +1485,10 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
14851485
14861486
| Name | Description | Type |
14871487
| -------------- | --------------------- | --------------- |
1488-
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
1488+
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
14891489
| `logprobs` | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. | `bool` |
14901490
| `top_logprobs` | An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. | `int` |
1491-
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
1491+
| `n` | How many chat completion choices to generate for each input message. You're charged based on the number of generated tokens across all of the choices. | `int` |
14921492

14931493

14941494
### Apply content safety
@@ -1542,7 +1542,7 @@ Phi-3.5-vision-Instruct can reason across text and images and generate text comp
15421542
To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
15431543

15441544
> [!TIP]
1545-
> You will need to construct the data URL using an scripting or programming language. This tutorial use [this sample image](../media/how-to/sdks/small-language-models-chart-example.jpg) in JPEG format. A data URL has a format as follows: `...`.
1545+
> You need to construct the data URL using a scripting or programming language. This article uses [this sample image](../media/how-to/sdks/small-language-models-chart-example.jpg) in JPEG format. A data URL has a format as follows: `...`.
15461546

15471547
Visualize the image:
15481548

@@ -1613,14 +1613,11 @@ For more examples of how to use Phi-3 family models, see the following examples
16131613
16141614
| Description | Language | Sample |
16151615
|-------------------------------------------|-------------------|-----------------------------------------------------------------|
1616-
| CURL request | Bash | [Link](https://aka.ms/phi-3/webrequests-sample) |
1617-
| Azure AI Inference package for C# | C# | [Link](https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/ai/Azure.AI.Inference/samples) |
1618-
| Azure AI Inference package for JavaScript | JavaScript | [Link](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-inference-rest/samples) |
1616+
| Azure AI Inference package for C# | C# | [Link](https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/ai/Azure.AI.Inference/samples) |
1617+
| Azure AI Inference package for JavaScript | JavaScript | [Link](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-inference-rest/samples) |
16191618
| Azure AI Inference package for Python | Python | [Link](https://aka.ms/azsdk/azure-ai-inference/python/samples) |
1620-
| Python web requests | Python | [Link](https://aka.ms/phi-3/webrequests-sample) |
1621-
| OpenAI SDK (experimental) | Python | [Link](https://aka.ms/phi-3/openaisdk) |
1622-
| LangChain | Python | [Link](https://aka.ms/phi-3/langchain-sample) |
1623-
| LiteLLM | Python | [Link](https://aka.ms/phi-3/litellm-sample) |
1619+
| LangChain | Python | [Link](https://aka.ms/azureai/langchain) |
1620+
| Llama-Index | Python | [Link](https://aka.ms/azureai/llamaindex) |
16241621
16251622
16261623
## Cost and quota considerations for Phi-3 family models deployed as serverless API endpoints
@@ -1631,7 +1628,7 @@ Quota is managed per deployment. Each deployment has a rate limit of 200,000 tok
16311628
16321629
Phi-3 family models deployed to managed compute are billed based on core hours of the associated compute instance. The cost of the compute instance is determined by the size of the instance, the number of instances running, and the run duration.
16331630
1634-
It is a good practice to start with a low number of instances and scale up as needed. You can monitor the cost of the compute instance in the Azure portal.
1631+
It's a good practice to start with a low number of instances and scale up as needed. You can monitor the cost of the compute instance in the Azure portal.
16351632
16361633
## Related content
16371634

0 commit comments

Comments
 (0)