edit pass: deploy-and-monitor-flows

paulth1 · paulth1 · commit a5ff3a59de91 · 2025-03-07T17:41:55.000-08:00
diff --git a/articles/ai-foundry/how-to/develop/trace-production-sdk.md b/articles/ai-foundry/how-to/develop/trace-production-sdk.md
@@ -27,8 +27,8 @@ In this article, you learn to enable tracing, collect aggregated metrics, and co
 ## Prerequisites
 
 - The Azure CLI and the Azure Machine Learning extension to the Azure CLI.
-- An Azure AI Foundry project. If you don't already have a project, you can [create one here](../../how-to/create-projects.md).
-- An Application Insights resource. If you don't already have an Application Insights resource, you can [create one here](/azure/azure-monitor/app/create-workspace-resource).
+- An Azure AI Foundry project. If you don't already have a project, you can [create one](../../how-to/create-projects.md).
+- An Application Insights resource. If you don't already have an Application Insights resource, you can [create one](/azure/azure-monitor/app/create-workspace-resource).
 - Azure role-based access controls are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, you must have Owner or Contributor permissions on the selected resource group. For more information, see [Role-based access control in the Azure AI Foundry portal](../../concepts/rbac-ai-foundry.md).
 
 ## Deploy a flow for real-time inference
@@ -43,13 +43,13 @@ Use the latest prompt flow base image to deploy the flow so that it supports the
 
 If you're using the Azure AI Foundry portal to deploy, select **Deployment** > **Application Insights diagnostics** > **Advanced settings** in the deployment wizard. In this way, the tracing data and system metrics are collected to the project linked to Application Insights.
 
-If you're using the SDK or the CLI, add the `app_insights_enabled: true` property in the deployment yaml file that collects data to the project linked to Application Insights.
+If you're using the SDK or the CLI, add the `app_insights_enabled: true` property in the deployment .yaml file that collects data to the project linked to Application Insights.
 
 ```yaml
 app_insights_enabled: true
 ```
 
-You can also specify other application insights by the environment variable `APPLICATIONINSIGHTS_CONNECTION_STRING` in the deployment yaml file. You can find the connection string for Application Insights on the **Overview** page in the Azure portal.
+You can also specify other application insights by the environment variable `APPLICATIONINSIGHTS_CONNECTION_STRING` in the deployment .yaml file. You can find the connection string for Application Insights on the **Overview** page in the Azure portal.
 
 ```yaml
 environment_variables:
@@ -75,14 +75,14 @@ The **Dependency** type event records calls from your deployments. The name of t
 
 | Metrics name                         | Type      | Dimensions                                | Description                                                                     |
 |--------------------------------------|-----------|-------------------------------------------|---------------------------------------------------------------------------------|
-| `token_consumption`                    | counter   | - flow <br> - node<br> - `llm_engine`<br> - `token_type`:  `prompt_tokens`: LLM API input tokens;  `completion_tokens`: LLM API response tokens; `total_tokens` = `prompt_tokens + completion tokens`          | OpenAI token consumption metrics                                                |
-| `flow_latency`                         | histogram | flow, `response_code`, streaming, `response_type` | request execution cost, `response_type` means whether it's full/firstbyte/lastbyte|
-| `flow_request`                         | counter   | flow, `response_code`, exception, streaming    | flow request count                                                              |
-| `node_latency`                         | histogram | flow, node, `run_status`                      | node execution cost                                                             |
-| `node_request`                         | counter   | flow, node, exception, `run_status`            | node execution count                                                    |
-| `rpc_latency`                          | histogram | flow, node, `api_call`                        | rpc cost                                                                        |
-| `rpc_request`                          | counter   | flow, node, `api_call`, exception              | rpc count                                                                       |
-| `flow_streaming_response_duration`     | histogram | flow                                      | streaming response sending cost, from sending first byte to sending last byte   |
+| `token_consumption`                    | counter   | - `flow` <br> - `node`<br> - `llm_engine`<br> - `token_type`:  `prompt_tokens`: LLM API input tokens;  `completion_tokens`: LLM API response tokens; `total_tokens` = `prompt_tokens + completion tokens`          | OpenAI token consumption metrics.                                                |
+| `flow_latency`                         | histogram | `flow`, `response_code`, `streaming`, `response_type` | The request execution cost, `response_type`, means whether it's full or first byte or last byte.|
+| `flow_request`                         | counter   | `flow`, `response_code`, `exception`, `streaming`    | The flow request count.                                                              |
+| `node_latency`                         | histogram | `flow`, `node`, `run_status`                      | The node execution cost.                                                             |
+| `node_request`                         | counter   | `flow`, `node`, `exception`, `run_status`            | The node execution count.                                                    |
+| `rpc_latency`                          | histogram | `flow`, `node`, `api_call`                        | The rpc cost.                                                                        |
+| `rpc_request`                          | counter   | `flow`, `node`, `api_call`, `exception`              | The rpc count.                                                                       |
+| `flow_streaming_response_duration`     | histogram | `flow`                                      | The streaming response sending cost, ranging from sending the first byte to sending the last byte.   |
 
 You can find the workspace default Application Insights metrics on your workspace overview page in the Azure portal.
 
@@ -93,7 +93,7 @@ You can find the workspace default Application Insights metrics on your workspac
 
 Prompt flow serving provides a new `/feedback` API to help customers collect the feedback. The feedback payload can be any JSON format data. Prompt flow serving helps the customer save the feedback data to a trace span. Data is saved to the trace exporter target that the customer configured. Prompt flow serving also supports OpenTelemetry standard trace context propagation. It respects the trace context set in the request header and uses that context as the request parent span context. You can use the distributed tracing functionality to correlate the feedback trace to its chat request trace.
 
-The following sample code shows how to score a flow deployed to a managed endpoint that was enabled for tracing and send the feedback to the same trace span of a scoring request. The flow has the inputs `question` and `chat_history`. The output is `answer`. After the endpoint is scored, feedback is collected and sent to application insights that are specified when you deploy the flow.
+The following sample code shows how to score a flow deployed to a managed endpoint that was enabled for tracing and send the feedback to the same trace span of a scoring request. The flow has the inputs `question` and `chat_history`. The output is `answer`. After the endpoint is scored, feedback is collected and sent to Application Insights as specified when you deploy the flow.
 
 ```python
 import urllib.request
diff --git a/articles/ai-foundry/how-to/flow-deploy.md b/articles/ai-foundry/how-to/flow-deploy.md
@@ -74,14 +74,10 @@ To deploy a prompt flow as an online endpoint in the Azure AI Foundry portal:
 
     :::image type="content" source="../media/prompt-flow/how-to-deploy-for-real-time-inference/deployments-score-url-samples.png" alt-text="Screenshot that shows the deployment endpoint and code samples." lightbox = "../media/prompt-flow/how-to-deploy-for-real-time-inference/deployments-score-url-samples.png":::
 
-For more information, see the following sections.
-
 For information about how to deploy a base model, see [Deploy models with Azure AI Foundry](deploy-models-managed.md).
 
 ## Settings and configurations
 
-This section discusses settings and configurations.
-
 ### Requirements text file
 
 Optionally, you can specify extra packages that you need in `requirements.txt`. You can find `requirements.txt` in the root folder of your flow folder. When you deploy a prompt flow to a managed online endpoint in the UI, by default, the deployment uses the environment that was created based on the base image specified in `flow.dag.yaml` and the dependencies specified in `requirements.txt` of the flow.
@@ -126,7 +122,9 @@ System-assigned identity is autocreated after your endpoint is created. The user
 
 ##### System assigned
 
-Notice the option **Enforce access to connection secrets (preview)**. If your flow uses connections, the endpoint needs to access connections to perform inference. The option is enabled by default. The endpoint is granted the Azure Machine Learning Workspace Connection Secrets Reader role to access connections automatically if you have connection secrets reader permission. If you disable this option, you need to grant this role to the system-assigned identity manually or ask your admin for help. For more information, see [Grant permission to the endpoint identity](#grant-permissions-to-the-endpoint).
+Notice the option **Enforce access to connection secrets (preview)**. If your flow uses connections, the endpoint needs to access connections to perform inference. The option is enabled by default. 
+
+The endpoint is granted the Azure Machine Learning Workspace Connection Secrets Reader role to access connections automatically if you have connection secrets reader permission. If you disable this option, you need to grant this role to the system-assigned identity manually or ask your admin for help. For more information, see [Grant permission to the endpoint identity](#grant-permissions-to-the-endpoint).
 
 ##### User assigned
 
@@ -136,9 +134,9 @@ If you created the associated endpoint with the **User Assigned Identity** optio
 
 |Scope|Role|Why it's needed|
 |---|---|---|
-|Azure AI Foundry project|**Azure Machine Learning Workspace Connection Secrets Reader** role or a customized role with `Microsoft.MachineLearningServices/workspaces/connections/listsecrets/action` | Get project connections.|
-|Azure AI Foundry project container registry |**ACR Pull** |Pull container image. |
-|Azure AI Foundry project default storage| **Storage Blob Data Reader**| Load model from storage. |
+|Azure AI Foundry project|**Azure Machine Learning Workspace Connection Secrets Reader** role or a customized role with `Microsoft.MachineLearningServices/workspaces/connections/listsecrets/action` | Gets project connections.|
+|Azure AI Foundry project container registry |**ACR Pull** |Pulls container images. |
+|Azure AI Foundry project default storage| **Storage Blob Data Reader**| Loads a model from storage. |
 |Azure AI Foundry project|**Azure Machine Learning Metrics Writer (preview)**| After you deploy the endpoint, if you want to monitor the endpoint-related metrics like CPU/GPU/Disk/Memory utilization, give this permission to the identity.<br/><br/>Optional|
 
 For more information about how to grant permissions to the endpoint identity, see [Grant permissions to the endpoint](#grant-permissions-to-the-endpoint).
@@ -214,7 +212,7 @@ For endpoints deployed from standard flow, you can input values in the form edit
 
 For endpoints deployed from a chat flow, you can test it in an immersive chat window.
 
-The `chat_input` was set during development of the chat flow. You can input the `chat_input` message in the input box. If your flow has multiple inputs, you can specify the values for other inputs besides the `chat_input` in the **Inputs** pane on the right side.
+The `chat_input` message was set during the development of the chat flow. You can put the `chat_input` message in the input box. If your flow has multiple inputs, you can specify the values for other inputs besides the `chat_input` message on the **Inputs** pane on the right side.
 
 ## Consume the endpoint
 
diff --git a/articles/ai-foundry/how-to/monitor-quality-safety.md b/articles/ai-foundry/how-to/monitor-quality-safety.md
@@ -133,7 +133,7 @@ In this section, you learn how to deploy your prompt flow with inferencing data
 
     :::image type="content" source="../media/deploy-monitor/monitor/deployment-with-data-collection-enabled.png" alt-text="Screenshot that shows the Review page in the deployment wizard with all settings completed." lightbox = "../media/deploy-monitor/monitor/deployment-with-data-collection-enabled.png":::
 
-    By default, all inputs and outputs of your deployed prompt flow application are collected to your Blob Storage. As users invoke the deployment, the data is collected for your monitor to use.
+    By default, all inputs and outputs of your deployed prompt flow application are collected to your blob storage. As users invoke the deployment, the data is collected for your monitor to use.
 
 1. Select the **Test** tab on the deployment page. Then test your deployment to ensure that it's working properly.
 
@@ -164,7 +164,7 @@ In this section, you learn how to configure monitoring for your deployed prompt
 
    :::image type="content" source="../media/deploy-monitor/monitor/column-map-advanced-options.png" alt-text="Screenshot that shows advanced options when you map columns for monitoring metrics." lightbox = "../media/deploy-monitor/monitor/column-map-advanced-options.png":::
 
-    If data collection isn't enabled for your deployment, creation of a monitor enables collection of inferencing data to your Blob Storage. This task takes the deployment offline for a few minutes.
+    If data collection isn't enabled for your deployment, creation of a monitor enables collection of inferencing data to your blob storage. This task takes the deployment offline for a few minutes.
 
 1. Select **Create** to create your monitor.
 
@@ -196,7 +196,7 @@ from azure.identity import DefaultAzureCredential
 
 credential = DefaultAzureCredential()
 
-# Update your azure resources details
+# Update your Azure resources details
 subscription_id = "INSERT YOUR SUBSCRIPTION ID"
 resource_group = "INSERT YOUR RESOURCE GROUP NAME"
 project_name = "INSERT YOUR PROJECT NAME" # This is the same as your Azure AI Foundry project name
@@ -212,7 +212,7 @@ monitor_name ="gen_ai_monitor_both_signals"
 defaulttokenstatisticssignalname ="token-usage-signal" 
 defaultgsqsignalname ="gsq-signal"
 
-# Determine the frequency to run the monitor, and the emails to recieve email alerts
+# Determine the frequency to run the monitor, and the emails to receive email alerts
 trigger_schedule = CronTrigger(expression="15 10 * * *")
 notification_emails_list = ["test@example.com", "def@example.com"]
 
@@ -235,7 +235,7 @@ aggregated_relevance_pass_rate = 0.7
 aggregated_coherence_pass_rate = 0.7
 aggregated_fluency_pass_rate = 0.7
 
-# Create an instance of gsq signal
+# Create an instance of a gsq signal
 generation_quality_thresholds = GenerationSafetyQualityMonitoringMetricThreshold(
     groundedness = {"aggregated_groundedness_pass_rate": aggregated_groundedness_pass_rate},
     relevance={"aggregated_relevance_pass_rate": aggregated_relevance_pass_rate},
@@ -265,7 +265,7 @@ gsq_signal = GenerationSafetyQualitySignal(
     },
 )
 
-# Create an instance of token statistic signal
+# Create an instance of a token statistic signal
 token_statistic_signal = GenerationTokenStatisticsSignal()
 
 monitoring_signals = {
@@ -301,7 +301,7 @@ After you create your monitor, it runs daily to compute the token usage and gene
     - **Prompt token count**: The number of prompt tokens used by the deployment during the selected time window.
     - **Completion token count**: The number of completion tokens used by the deployment during the selected time window.
 
-1. View the metrics on the **Token usage** tab. (This tab is selected by default.) Here, you can view the token usage of your application over time. You can also view the distribution of prompt and completion tokens over time. You can change the **Trendline scope** value to monitor all tokens in the entire application or token usage for a particular deployment (for example, gpt-4) used within your application.
+1. View the metrics on the **Token usage** tab. (This tab is selected by default.) Here, you can view the token usage of your application over time. You can also view the distribution of prompt and completion tokens over time. You can change the **Trendline scope** value to monitor all tokens in the entire application or token usage for a particular deployment (for example, GPT-4) used within your application.
 
     :::image type="content" source="../media/deploy-monitor/monitor/monitor-token-usage.png" alt-text="Screenshot that shows the token usage on the deployment's monitoring page." lightbox = "../media/deploy-monitor/monitor/monitor-token-usage.png":::
 
@@ -362,7 +362,7 @@ from azure.identity import DefaultAzureCredential
 
 credential = DefaultAzureCredential()
 
-# Update your azure resources details
+# Update your Azure resources details
 subscription_id = "INSERT YOUR SUBSCRIPTION ID"
 resource_group = "INSERT YOUR RESOURCE GROUP NAME"
 project_name = "INSERT YOUR PROJECT NAME" # This is the same as your Azure AI Foundry project name
@@ -390,7 +390,7 @@ monitoring_target = MonitoringTarget(
     endpoint_deployment_id=f"azureml:{endpoint_name}:{deployment_name}",
 )
 
-# Create an instance of token statistic signal
+# Create an instance of a token statistic signal
 token_statistic_signal = GenerationTokenStatisticsSignal()
 
 monitoring_signals = {
@@ -439,7 +439,7 @@ from azure.identity import DefaultAzureCredential
 
 credential = DefaultAzureCredential()
 
-# Update your azure resources details
+# Update your Azure resources details
 subscription_id = "INSERT YOUR SUBSCRIPTION ID"
 resource_group = "INSERT YOUR RESOURCE GROUP NAME"
 project_name = "INSERT YOUR PROJECT NAME" # This is the same as your Azure AI Foundry project name
@@ -454,7 +454,7 @@ app_trace_Version = "1"
 monitor_name ="gen_ai_monitor_generation_quality" 
 defaultgsqsignalname ="gsq-signal"
 
-# Determine the frequency to run the monitor, and the emails to recieve email alerts
+# Determine the frequency to run the monitor and the emails to receive email alerts
 trigger_schedule = CronTrigger(expression="15 10 * * *")
 notification_emails_list = ["test@example.com", "def@example.com"]
 
@@ -471,13 +471,13 @@ monitoring_target = MonitoringTarget(
     endpoint_deployment_id=f"azureml:{endpoint_name}:{deployment_name}",
 )
 
-# Set thresholds for passing rate (0.7 = 70%)
+# Set thresholds for the passing rate (0.7 = 70%)
 aggregated_groundedness_pass_rate = 0.7
 aggregated_relevance_pass_rate = 0.7
 aggregated_coherence_pass_rate = 0.7
 aggregated_fluency_pass_rate = 0.7
 
-# Create an instance of gsq signal
+# Create an instance of a gsq signal
 generation_quality_thresholds = GenerationSafetyQualityMonitoringMetricThreshold(
     groundedness = {"aggregated_groundedness_pass_rate": aggregated_groundedness_pass_rate},
     relevance={"aggregated_relevance_pass_rate": aggregated_relevance_pass_rate},
diff --git a/articles/ai-foundry/how-to/prompt-flow-troubleshoot.md b/articles/ai-foundry/how-to/prompt-flow-troubleshoot.md