You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
116
+
> Currently, serverless API endpoints don't support using Microsoft Entra ID for authentication.
117
117
118
118
### Get the model's capabilities
119
119
@@ -227,7 +227,7 @@ print_stream(result)
227
227
Explore other parameters that you can specify in the inference client. For a full list of all the supported parameters and their corresponding documentation, see [Azure AI Model Inference API reference](https://aka.ms/azureai/modelinference).
228
228
229
229
```python
230
-
from azure.ai.inference.models importChatCompletionsResponseFormat
230
+
from azure.ai.inference.models importChatCompletionsResponseFormatText
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
248
+
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs aren't guaranteed to be valid JSON.
249
249
250
250
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
251
251
@@ -272,10 +272,10 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
|`logit_bias`| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. |`object`|
275
+
|`logit_bias`| Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. |`object`|
276
276
|`logprobs`| Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. |`bool`|
277
277
|`top_logprobs`| An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. |`int`|
278
-
|`n`| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. |`int`|
278
+
|`n`| How many chat completion choices to generate for each input message. You're charged based on the number of generated tokens across all of the choices. |`int`|
279
279
280
280
281
281
### Apply content safety
@@ -479,7 +479,7 @@ const client = new ModelClient(
479
479
```
480
480
481
481
> [!NOTE]
482
-
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
482
+
> Currently, serverless API endpoints don't support using Microsoft Entra ID for authentication.
483
483
484
484
### Get the model's capabilities
485
485
@@ -625,7 +625,7 @@ var response = await client.path("/chat/completions").post({
625
625
```
626
626
627
627
> [!WARNING]
628
-
> Phi-3 family models don't support JSON output formatting (`response_format = { "type":"json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
628
+
> Phi-3 family models don't support JSON output formatting (`response_format = { "type":"json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs aren't guaranteed to be valid JSON.
629
629
630
630
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
631
631
@@ -657,10 +657,10 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
660
+
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
661
661
| `logprobs` | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. | `bool` |
662
662
| `top_logprobs` | An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. | `int` |
663
-
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
663
+
| `n` | How many chat completion choices to generate for each input message. You're charged based on the number of generated tokens across all of the choices. | `int` |
664
664
665
665
666
666
### Apply content safety
@@ -891,7 +891,7 @@ client = new ChatCompletionsClient(
891
891
```
892
892
893
893
> [!NOTE]
894
-
> Currently, serverless API endpoints do not support using Microsoft Entra IDfor authentication.
894
+
> Currently, serverless API endpoints don't support using Microsoft Entra ID for authentication.
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSONoutputs. However, such outputs are not guaranteed to be valid JSON.
1040
+
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs aren't guaranteed to be valid JSON.
1041
1041
1042
1042
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
1043
1043
@@ -1066,10 +1066,10 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
1069
+
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
1070
1070
| `logprobs` | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. | `bool` |
1071
1071
| `top_logprobs` | An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. | `int` |
1072
-
| `n` | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. | `int` |
1072
+
| `n` | How many chat completion choices to generate for each input message. You're charged based on the number of generated tokens across all of the choices. |`int`|
1073
1073
1074
1074
1075
1075
### Apply content safety
@@ -1239,7 +1239,7 @@ First, create the client to consume the model. The following code uses an endpoi
1239
1239
When you deploy the model to a self-hosted online endpoint with**Microsoft Entra ID** support, you can use the following code snippet to create a client.
1240
1240
1241
1241
> [!NOTE]
1242
-
> Currently, serverless API endpoints do not support using Microsoft Entra ID for authentication.
1242
+
> Currently, serverless API endpoints don't support using Microsoft Entra ID for authentication.
1243
1243
1244
1244
### Get the model's capabilities
1245
1245
@@ -1446,7 +1446,7 @@ Explore other parameters that you can specify in the inference client. For a ful
1446
1446
```
1447
1447
1448
1448
> [!WARNING]
1449
-
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs are not guaranteed to be valid JSON.
1449
+
> Phi-3 family models don't support JSON output formatting (`response_format = { "type": "json_object" }`). You can always prompt the model to generate JSON outputs. However, such outputs aren't guaranteed to be valid JSON.
1450
1450
1451
1451
If you want to pass a parameter that isn't in the list of supported parameters, you can pass it to the underlying model using *extra parameters*. See [Pass extra parameters to the model](#pass-extra-parameters-to-the-model).
1452
1452
@@ -1485,10 +1485,10 @@ The following extra parameters can be passed to Phi-3.5 chat model with vision:
|`logit_bias`| Accepts a JSON object that maps tokens (specified by their token IDin the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. |`object`|
1488
+
| `logit_bias` | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect varies per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. | `object` |
1489
1489
| `logprobs` | Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the `content` of `message`. | `bool` |
1490
1490
| `top_logprobs` | An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. `logprobs` must be set to `true` if this parameter is used. | `int` |
1491
-
|`n`| How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. |`int`|
1491
+
| `n` | How many chat completion choices to generate for each input message. You're charged based on the number of generated tokens across all of the choices. |`int`|
1492
1492
1493
1493
1494
1494
### Apply content safety
@@ -1542,7 +1542,7 @@ Phi-3.5-vision-Instruct can reason across text and images and generate text comp
1542
1542
To see this capability, download an image and encode the information as `base64`string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
1543
1543
1544
1544
> [!TIP]
1545
-
> You will need to construct the data URL using an scripting or programming language. Thistutorial use [this sample image](../media/how-to/sdks/small-language-models-chart-example.jpg) inJPEGformat. A data URL has a format as follows:`...`.
1545
+
> You need to construct the data URL using a scripting or programming language. Thisarticle uses [this sample image](../media/how-to/sdks/small-language-models-chart-example.jpg) inJPEGformat. A data URL has a format as follows:`...`.
1546
1546
1547
1547
Visualize the image:
1548
1548
@@ -1613,14 +1613,11 @@ For more examples of how to use Phi-3 family models, see the following examples
## Cost and quota considerations for Phi-3 family models deployed as serverless API endpoints
@@ -1631,7 +1628,7 @@ Quota is managed per deployment. Each deployment has a rate limit of 200,000 tok
1631
1628
1632
1629
Phi-3 family models deployed to managed compute are billed based on core hours of the associated compute instance. The cost of the compute instance is determined by the size of the instance, the number of instances running, and the run duration.
1633
1630
1634
-
It is a good practice to start with a low number of instances and scale up as needed. You can monitor the cost of the compute instance in the Azure portal.
1631
+
It's a good practice to start with a low number of instances and scale up as needed. You can monitor the cost of the compute instance in the Azure portal.
0 commit comments