MicrosoftDocs
diff --git a/‎.openpublishing.redirection.azure-monitor.json
Lines changed: 35 additions & 0 deletions b/‎.openpublishing.redirection.azure-monitor.json
Lines changed: 35 additions & 0 deletions
diff --git a/‎articles/ai-services/openai/how-to/monitoring.md
Lines changed: 3 additions & 1 deletion b/‎articles/ai-services/openai/how-to/monitoring.md
Lines changed: 3 additions & 1 deletion
diff --git a/‎articles/ai-studio/how-to/model-catalog-overview.md
Lines changed: 2 additions & 1 deletion b/‎articles/ai-studio/how-to/model-catalog-overview.md
Lines changed: 2 additions & 1 deletion
diff --git a/‎articles/ai-studio/includes/content-safety-serverless-models.md
Lines changed: 24 additions & 0 deletions b/‎articles/ai-studio/includes/content-safety-serverless-models.md
Lines changed: 24 additions & 0 deletions
diff --git a/‎articles/ai-studio/reference/reference-model-inference-api.md
Lines changed: 39 additions & 4 deletions b/‎articles/ai-studio/reference/reference-model-inference-api.md
Lines changed: 39 additions & 4 deletions
diff --git a/‎articles/aks/TOC.yml
Lines changed: 12 additions & 2 deletions b/‎articles/aks/TOC.yml
Lines changed: 12 additions & 2 deletions
@@ -5564,6 +5564,41 @@
             "redirect_url": "/azure/azure-monitor/essentials/resource-manager-diagnostic-settings#diagnostic-setting-for-activity-log",
             "redirect_document_id": false
         },
+        {
+            "source_path_from_root": "/articles/azure-monitor/app/availability-overview.md",
+            "redirect_url": "/azure/azure-monitor/app/availability",
+            "redirect_document_id": false
+        },
+        {
+            "source_path_from_root": "/articles/azure-monitor/app/availability-standard-tests.md",
+            "redirect_url": "/azure/azure-monitor/app/availability",
+            "redirect_document_id": false
+        },
+        {
+            "source_path_from_root": "/articles/azure-monitor/app/availability-azure-functions.md",
+            "redirect_url": "/azure/azure-monitor/app/availability",
+            "redirect_document_id": false
+        },
+        {
+            "source_path_from_root": "/articles/azure-monitor/app/availability-private-test.md",
+            "redirect_url": "/azure/azure-monitor/app/availability",
+            "redirect_document_id": false
+        },
+        {
+            "source_path_from_root": "/articles/azure-monitor/app/availability-alerts.md",
+            "redirect_url": "/azure/azure-monitor/app/availability",
+            "redirect_document_id": false
+        },
+        {
+            "source_path_from_root": "/articles/azure-monitor/app/availability-test-migration.md",
+            "redirect_url": "/azure/azure-monitor/app/availability",
+            "redirect_document_id": false
+        },
+        {
+            "source_path_from_root": "/articles/azure-monitor/app/sla-report.md",
+            "redirect_url": "/azure/azure-monitor/app/availability",
+            "redirect_document_id": false
+        },
         {
             "source_path_from_root": "/articles/azure-monitor/app/tutorial-alert.md",
             "redirect_url": "/azure/azure-monitor/app/availability-standard-tests",
 
@@ -6,7 +6,7 @@ ms.author: mbullwin
 ms.service: azure-ai-openai
 ms.topic: how-to
 ms.custom: subject-monitoring
-ms.date: 04/16/2024
+ms.date: 07/12/2024
 ---
 
 # Monitoring Azure OpenAI Service
@@ -56,6 +56,7 @@ The following table summarizes the current subset of metrics available in Azure
 |Metric|Category|Aggregation|Description|Dimensions|
 |---|---|---|---|---|
 |`Azure OpenAI Requests`|HTTP|Count|Total number of calls made to the Azure OpenAI API over a period of time. Applies to PayGo, PTU, and PTU-managed SKUs.| `ApiName`, `ModelDeploymentName`,`ModelName`,`ModelVersion`, `OperationName`, `Region`, `StatusCode`, `StreamType`|
+| `Active Tokens` | Usage | Total tokens minus cached tokens over a period of time. Applies to PTU and PTU-managed deployments. Use this metric to understand your TPS or TPM based utilization for PTUs and compare to your benchmarks for target TPS or TPM for your scenarios. | `ModelDeploymentName`,`ModelName`,`ModelVersion` |
 | `Generated Completion Tokens` | Usage | Sum | Number of generated tokens (output) from an Azure OpenAI model. Applies to PayGo, PTU, and PTU-manged SKUs | `ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
 | `Processed FineTuned Training Hours` | Usage |Sum| Number of training hours processed on an Azure OpenAI fine-tuned model. |  `ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
 | `Processed Inference Tokens` | Usage | Sum|  Number of inference tokens processed by an Azure OpenAI model. Calculated as prompt tokens (input) + generated tokens. Applies to PayGo, PTU, and PTU-manged SKUs.|`ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
@@ -64,6 +65,7 @@ The following table summarizes the current subset of metrics available in Azure
 |`Prompt Token Cache Match Rate` | HTTP | Average | **Provisioned-managed only**. The prompt token cache hit ration expressed as a percentage. | `ModelDeploymentName`, `ModelVersion`, `ModelName`, `Region`|
 |`Time to Response` | HTTP | Average | Recommended latency (responsiveness) measure for streaming requests. **Applies to PTU, and PTU-managed deployments**. This metric does not apply to standard pay-go deployments. Calculated as time taken for the first response to appear after a user sends a prompt, as measured by the API gateway. This number increases as the prompt size increases and/or cache hit size reduces. Note: this metric is an approximation as measured latency is heavily dependent on multiple factors, including concurrent calls and overall workload pattern. In addition, it does not account for any client- side latency that may exist between your client and the API endpoint. Please refer to your own logging for optimal latency tracking.| `ModelDepIoymentName`, `ModelName`, and `ModelVersion` |
 
+
 ## Configure diagnostic settings
 
 All of the metrics are exportable with [diagnostic settings in Azure Monitor](/azure/azure-monitor/essentials/diagnostic-settings). To analyze logs and metrics data with Azure Monitor Log Analytics queries, you need to configure diagnostic settings for your Azure OpenAI resource and your Log Analytics workspace.
 
@@ -149,7 +149,8 @@ Phi-3-mini-128k-instruct <br> Phi-3-medium-4k-instruct <br> Phi-3-medium-128k-in
 
 [!INCLUDE [Feature preview](~/reusable-content/ce-skilling/azure/includes/ai-studio/includes/feature-preview.md)]
 
-Azure AI Studio implements a default configuration of [Azure AI Content Safety](../../ai-services/content-safety/overview.md) text moderation filters for harmful content (hate, self-harm, sexual, and violence) in language models deployed with MaaS. To learn more about content filtering (preview), see [harm categories in Azure AI Content Safety](../../ai-services/content-safety/concepts/harm-categories.md). Content filtering (preview) occurs synchronously as the service processes prompts to generate content, and you may be billed separately as per [AACS pricing](https://azure.microsoft.com/pricing/details/cognitive-services/content-safety/) for such use. You can disable content filtering for individual serverless endpoints when you first deploy a language model or in the deployment details page by clicking the content filtering toggle. You may be at higher risk of exposing users to harmful content if you turn off content filters. 
+[!INCLUDE [content-safety-serverless-models](../includes/content-safety-serverless-models.md)]
+
 
 ### Network isolation for models deployed via Serverless APIs
 
 
@@ -0,0 +1,24 @@
+---
+title: include file
+description: include file
+ms.service: azure-ai-studio
+ms.topic: include
+ms.date: 07/12/2024
+ms.author: mopeakande
+author: msakande
+ms.reviewer: osiotugo
+reviewer: ositanachi
+ms.custom: include file
+
+# Also used in Azure Machine Learning documentation
+---
+
+For language models deployed via serverless APIs, Azure AI implements a default configuration of [Azure AI Content Safety](/azure/ai-services/content-safety/overview) text moderation filters that detect harmful content such as hate, self-harm, sexual, and violent content. To learn more about content filtering (preview), see [harm categories in Azure AI Content Safety](/azure/ai-services/content-safety/concepts/harm-categories).
+
+> [!TIP]
+> Content filtering (preview) is not available for certain model types that are deployed via serverless APIs. These model types include embed models and time series models.
+
+Content filtering (preview) occurs synchronously as the service processes prompts to generate content, and you might be billed separately as per [AACS pricing](https://azure.microsoft.com/pricing/details/cognitive-services/content-safety/) for such use. You can disable content filtering (preview) for individual serverless endpoints either at the time when you first deploy a language model or later in the deployment details page by selecting the content filtering toggle.
+
+Suppose you decide to use an API other than the [Azure AI Model Inference API](/azure/ai-studio/reference/reference-model-inference-api) to work with a model that's deployed via a serverless API. In such a situation, content filtering (preview) isn't enabled unless you implement it separately by using Azure AI Content Safety. To learn more about getting started with Azure AI Content Safety, see [Quickstart: Analyze text content](/azure/ai-services/content-safety/quickstart-text). If you don't use content filtering (preview) when working with models that are deployed via serverless APIs, you run a higher risk of exposing users to harmful content.
+
@@ -103,6 +103,19 @@ model = ChatCompletionsClient(
 )
 ```
 
+If you are using an endpoint with support for Entra ID, you can create your client as follows:
+
+```python
+import os
+from azure.ai.inference import ChatCompletionsClient
+from azure.identity import AzureDefaultCredential
+
+model = ChatCompletionsClient(
+    endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
+    credential=AzureDefaultCredential(),
+)
+```
+
 # [JavaScript](#tab/javascript)
 
 Install the package `@azure-rest/ai-inference` using npm:
@@ -124,6 +137,19 @@ const client = new ModelClient(
 );
 ```
 
+For endpoint with support for Microsoft Entra ID, you can create your client as follows:
+
+```javascript
+import ModelClient from "@azure-rest/ai-inference";
+import { isUnexpected } from "@azure-rest/ai-inference";
+import { AzureDefaultCredential } from "@azure/identity";
+
+const client = new ModelClient(
+    process.env.AZUREAI_ENDPOINT_URL, 
+    new AzureDefaultCredential()
+);
+```
+
 # [REST](#tab/rest)
 
 Use the reference section to explore the API design and which parameters are available. For example, the reference section for [Chat completions](reference-model-inference-chat-completions.md) details how to use the route `/chat/completions` to generate predictions based on chat-formatted instructions:
@@ -143,11 +169,13 @@ The Azure AI Model Inference API specifies a set of modalities and parameters th
 
 By setting a header `extra-parameters: pass-through`, the API will attempt to pass any unknown parameter directly to the underlying model. If the model can handle that parameter, the request completes.
 
-The following example shows a request passing the parameter `safe_prompt` supported by Mistral-Large, which isn't specified in the Azure AI Model Inference API:
+The following example shows a request passing the parameter `safe_prompt` supported by Mistral-Large, which isn't specified in the Azure AI Model Inference API. 
 
 # [Python](#tab/python)
 
 ```python
+from azure.ai.inference.models import SystemMessage, UserMessage
+
 response = model.complete(
     messages=[
         SystemMessage(content="You are a helpful assistant."),
@@ -157,8 +185,13 @@ response = model.complete(
         "safe_mode": True
     }
 )
+
+print(response.choices[0].message.content)
 ```
 
+> [!TIP]
+> When using Azure AI Inference SDK, using passing extra parameters using `model_extras` configures the request with `extra-parameters: pass-through` automatically for you.
+
 # [JavaScript](#tab/javascript)
 
 ```javascript
@@ -174,6 +207,8 @@ var response = await client.path("/chat/completions").post({
         safe_mode: true
     }
 });
+
+console.log(response.choices[0].message.content)
 ```
 
 # [REST](#tab/rest)
@@ -208,8 +243,8 @@ extra-parameters: pass-through
 
 ---
 
-> [!TIP]
-> The default value for `extra-parameters` is `error` which returns an error if an extra parameter is indicated in the payload. Alternatively, you can set `extra-parameters: ignore` to drop any unknown parameter in the request. Use this capability in case you happen to be sending requests with extra parameters that you know the model won't support but you want the request to completes anyway. A typical example of this is indicating `seed` parameter.
+> [!NOTE]
+> The default value for `extra-parameters` is `error` which returns an error if an extra parameter is indicated in the payload. Alternatively, you can set `extra-parameters: drop` to drop any unknown parameter in the request. Use this capability in case you happen to be sending requests with extra parameters that you know the model won't support but you want the request to completes anyway. A typical example of this is indicating `seed` parameter.
 
 ### Models with disparate set of capabilities
 
@@ -220,7 +255,7 @@ The following example shows the response for a chat completion request indicatin
 # [Python](#tab/python)
 
 ```python
-from azure.ai.inference.models import ChatCompletionsResponseFormat
+from azure.ai.inference.models import SystemMessage, UserMessage, ChatCompletionsResponseFormat
 from azure.core.exceptions import HttpResponseError
 import json
 
 
@@ -337,7 +337,17 @@
             - name: Use the Azure portal
               href: virtual-nodes-portal.md
     - name: Workloads
-      items:    
+      items:   
+        - name: Stateful workloads 
+          items:
+            - name: Deploy a highly available PostgreSQL database
+              items:
+                - name: Overview
+                  href: postgresql-ha-overview.md
+                - name: Create infrastructure resources
+                  href: create-postgresql-ha.md
+                - name: Deploy and test PostgreSQL
+                  href: deploy-postgresql-ha.md
         - name: GPU workloads
           items:
             - name: Use GPUs
@@ -973,4 +983,4 @@
     - name: Support options for AKS
       href: aks-support-help.md
     - name: Troubleshooting documentation for AKS
-      href: /troubleshoot/azure/azure-kubernetes/welcome-azure-kubernetes
+      href: /troubleshoot/azure/azure-kubernetes/welcome-azure-kubernetes