Update reference-model-inference-api.md

santiagxf · web-flow · commit 78dd81d1dbd8 · 2024-07-15T23:14:04.000-04:00
diff --git a/articles/machine-learning/reference-model-inference-api.md b/articles/machine-learning/reference-model-inference-api.md
@@ -45,6 +45,8 @@ Models deployed to [serverless API endpoints](how-to-deploy-models-serverless.md
 > * [Meta Llama 3 instruct](how-to-deploy-models-llama.md) family of models
 > * [Mistral-Small](how-to-deploy-models-mistral.md)
 > * [Mistral-Large](how-to-deploy-models-mistral.md)
+> * [Jais](how-to-deploy-jais-models.md) family of models
+> * [Jamba](how-to-deploy-models-jamba.md) family of models
 > * [Phi-3](how-to-deploy-models-phi-3.md) family of models
 
 Models deployed to [managed inference](concept-endpoints-online.md):
@@ -56,6 +58,9 @@ Models deployed to [managed inference](concept-endpoints-online.md):
 
 The API is compatible with Azure OpenAI model deployments.
 
+> [!NOTE]
+> The Azure AI model inference API is available in managed inference (Managed Online Endpoints) for __models deployed after June 24th, 2024__. To take advance of the API, redeploy your endpoint if the model has been deployed before such date.
+
 ## Capabilities
 
 The following section describes some of the capabilities the API exposes. For a full specification of the API, view the [reference section](reference-model-inference-info.md).
@@ -95,6 +100,19 @@ model = ChatCompletionsClient(
 )
 ```
 
+If you are using an endpoint with support for Entra ID, you can create your client as follows:
+
+```python
+import os
+from azure.ai.inference import ChatCompletionsClient
+from azure.identity import AzureDefaultCredential
+
+model = ChatCompletionsClient(
+    endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
+    credential=AzureDefaultCredential(),
+)
+```
+
 Explore our [samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples) and read the [API reference documentation](https://aka.ms/azsdk/azure-ai-inference/python/reference) to get yourself started.
 
 # [JavaScript](#tab/javascript)
@@ -118,6 +136,19 @@ const client = new ModelClient(
 );
 ```
 
+For endpoint with support for Microsoft Entra ID, you can create your client as follows:
+
+```javascript
+import ModelClient from "@azure-rest/ai-inference";
+import { isUnexpected } from "@azure-rest/ai-inference";
+import { AzureDefaultCredential } from "@azure/identity";
+
+const client = new ModelClient(
+    process.env.AZUREAI_ENDPOINT_URL, 
+    new AzureDefaultCredential()
+);
+```
+
 Explore our [samples](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-inference-rest/samples) and read the [API reference documentation](https://aka.ms/AAp1kxa) to get yourself started.
 
 # [REST](#tab/rest)
@@ -153,8 +184,13 @@ response = model.complete(
         "safe_mode": True
     }
 )
+
+print(response.choices[0].message.content)
 ```
 
+> [!TIP]
+> When using Azure AI Inference SDK, using passing extra parameters using `model_extras` configures the request with `extra-parameters: pass-through` automatically for you.
+
 # [JavaScript](#tab/javascript)
 
 ```javascript
@@ -170,6 +206,8 @@ var response = await client.path("/chat/completions").post({
         safe_mode: true
     }
 });
+
+console.log(response.choices[0].message.content)
 ```
 
 # [REST](#tab/rest)
@@ -204,8 +242,8 @@ extra-parameters: pass-through
 
 ---
 
-> [!TIP]
-> The default value for `extra-parameters` is `error` which returns an error if an extra parameter is indicated in the payload. Alternatively, you can set `extra-parameters: ignore` to drop any unknown parameter in the request. Use this capability in case you happen to be sending requests with extra parameters that you know the model won't support but you want the request to completes anyway. A typical example of this is indicating `seed` parameter.
+> [!NOTE]
+> The default value for `extra-parameters` is `error` which returns an error if an extra parameter is indicated in the payload. Alternatively, you can set `extra-parameters: drop` to drop any unknown parameter in the request. Use this capability in case you happen to be sending requests with extra parameters that you know the model won't support but you want the request to completes anyway. A typical example of this is indicating `seed` parameter.
 
 ### Models with disparate set of capabilities
 
@@ -216,9 +254,9 @@ The following example shows the response for a chat completion request indicatin
 # [Python](#tab/python)
 
 ```python
-from azure.ai.inference.models import ChatCompletionsResponseFormat
-from azure.core.exceptions import HttpResponseError
 import json
+from azure.ai.inference.models import SystemMessage, UserMessage, ChatCompletionsResponseFormat
+from azure.core.exceptions import HttpResponseError
 
 try:
     response = model.complete(
@@ -332,6 +370,7 @@ The following example shows the response for a chat completion request that has
 
 ```python
 from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage
+from azure.core.exceptions import HttpResponseError
 
 try:
     response = model.complete(