Skip to content

Commit a5ff3a5

Browse files
committed
edit pass: deploy-and-monitor-flows
1 parent a019dff commit a5ff3a5

File tree

4 files changed

+40
-52
lines changed

4 files changed

+40
-52
lines changed

articles/ai-foundry/how-to/develop/trace-production-sdk.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@ In this article, you learn to enable tracing, collect aggregated metrics, and co
2727
## Prerequisites
2828

2929
- The Azure CLI and the Azure Machine Learning extension to the Azure CLI.
30-
- An Azure AI Foundry project. If you don't already have a project, you can [create one here](../../how-to/create-projects.md).
31-
- An Application Insights resource. If you don't already have an Application Insights resource, you can [create one here](/azure/azure-monitor/app/create-workspace-resource).
30+
- An Azure AI Foundry project. If you don't already have a project, you can [create one](../../how-to/create-projects.md).
31+
- An Application Insights resource. If you don't already have an Application Insights resource, you can [create one](/azure/azure-monitor/app/create-workspace-resource).
3232
- Azure role-based access controls are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, you must have Owner or Contributor permissions on the selected resource group. For more information, see [Role-based access control in the Azure AI Foundry portal](../../concepts/rbac-ai-foundry.md).
3333

3434
## Deploy a flow for real-time inference
@@ -43,13 +43,13 @@ Use the latest prompt flow base image to deploy the flow so that it supports the
4343

4444
If you're using the Azure AI Foundry portal to deploy, select **Deployment** > **Application Insights diagnostics** > **Advanced settings** in the deployment wizard. In this way, the tracing data and system metrics are collected to the project linked to Application Insights.
4545

46-
If you're using the SDK or the CLI, add the `app_insights_enabled: true` property in the deployment yaml file that collects data to the project linked to Application Insights.
46+
If you're using the SDK or the CLI, add the `app_insights_enabled: true` property in the deployment .yaml file that collects data to the project linked to Application Insights.
4747

4848
```yaml
4949
app_insights_enabled: true
5050
```
5151
52-
You can also specify other application insights by the environment variable `APPLICATIONINSIGHTS_CONNECTION_STRING` in the deployment yaml file. You can find the connection string for Application Insights on the **Overview** page in the Azure portal.
52+
You can also specify other application insights by the environment variable `APPLICATIONINSIGHTS_CONNECTION_STRING` in the deployment .yaml file. You can find the connection string for Application Insights on the **Overview** page in the Azure portal.
5353

5454
```yaml
5555
environment_variables:
@@ -75,14 +75,14 @@ The **Dependency** type event records calls from your deployments. The name of t
7575

7676
| Metrics name | Type | Dimensions | Description |
7777
|--------------------------------------|-----------|-------------------------------------------|---------------------------------------------------------------------------------|
78-
| `token_consumption` | counter | - flow <br> - node<br> - `llm_engine`<br> - `token_type`: `prompt_tokens`: LLM API input tokens; `completion_tokens`: LLM API response tokens; `total_tokens` = `prompt_tokens + completion tokens` | OpenAI token consumption metrics |
79-
| `flow_latency` | histogram | flow, `response_code`, streaming, `response_type` | request execution cost, `response_type` means whether it's full/firstbyte/lastbyte|
80-
| `flow_request` | counter | flow, `response_code`, exception, streaming | flow request count |
81-
| `node_latency` | histogram | flow, node, `run_status` | node execution cost |
82-
| `node_request` | counter | flow, node, exception, `run_status` | node execution count |
83-
| `rpc_latency` | histogram | flow, node, `api_call` | rpc cost |
84-
| `rpc_request` | counter | flow, node, `api_call`, exception | rpc count |
85-
| `flow_streaming_response_duration` | histogram | flow | streaming response sending cost, from sending first byte to sending last byte |
78+
| `token_consumption` | counter | - `flow` <br> - `node`<br> - `llm_engine`<br> - `token_type`: `prompt_tokens`: LLM API input tokens; `completion_tokens`: LLM API response tokens; `total_tokens` = `prompt_tokens + completion tokens` | OpenAI token consumption metrics. |
79+
| `flow_latency` | histogram | `flow`, `response_code`, `streaming`, `response_type` | The request execution cost, `response_type`, means whether it's full or first byte or last byte.|
80+
| `flow_request` | counter | `flow`, `response_code`, `exception`, `streaming` | The flow request count. |
81+
| `node_latency` | histogram | `flow`, `node`, `run_status` | The node execution cost. |
82+
| `node_request` | counter | `flow`, `node`, `exception`, `run_status` | The node execution count. |
83+
| `rpc_latency` | histogram | `flow`, `node`, `api_call` | The rpc cost. |
84+
| `rpc_request` | counter | `flow`, `node`, `api_call`, `exception` | The rpc count. |
85+
| `flow_streaming_response_duration` | histogram | `flow` | The streaming response sending cost, ranging from sending the first byte to sending the last byte. |
8686

8787
You can find the workspace default Application Insights metrics on your workspace overview page in the Azure portal.
8888

@@ -93,7 +93,7 @@ You can find the workspace default Application Insights metrics on your workspac
9393

9494
Prompt flow serving provides a new `/feedback` API to help customers collect the feedback. The feedback payload can be any JSON format data. Prompt flow serving helps the customer save the feedback data to a trace span. Data is saved to the trace exporter target that the customer configured. Prompt flow serving also supports OpenTelemetry standard trace context propagation. It respects the trace context set in the request header and uses that context as the request parent span context. You can use the distributed tracing functionality to correlate the feedback trace to its chat request trace.
9595

96-
The following sample code shows how to score a flow deployed to a managed endpoint that was enabled for tracing and send the feedback to the same trace span of a scoring request. The flow has the inputs `question` and `chat_history`. The output is `answer`. After the endpoint is scored, feedback is collected and sent to application insights that are specified when you deploy the flow.
96+
The following sample code shows how to score a flow deployed to a managed endpoint that was enabled for tracing and send the feedback to the same trace span of a scoring request. The flow has the inputs `question` and `chat_history`. The output is `answer`. After the endpoint is scored, feedback is collected and sent to Application Insights as specified when you deploy the flow.
9797

9898
```python
9999
import urllib.request

articles/ai-foundry/how-to/flow-deploy.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -74,14 +74,10 @@ To deploy a prompt flow as an online endpoint in the Azure AI Foundry portal:
7474

7575
:::image type="content" source="../media/prompt-flow/how-to-deploy-for-real-time-inference/deployments-score-url-samples.png" alt-text="Screenshot that shows the deployment endpoint and code samples." lightbox = "../media/prompt-flow/how-to-deploy-for-real-time-inference/deployments-score-url-samples.png":::
7676

77-
For more information, see the following sections.
78-
7977
For information about how to deploy a base model, see [Deploy models with Azure AI Foundry](deploy-models-managed.md).
8078

8179
## Settings and configurations
8280

83-
This section discusses settings and configurations.
84-
8581
### Requirements text file
8682

8783
Optionally, you can specify extra packages that you need in `requirements.txt`. You can find `requirements.txt` in the root folder of your flow folder. When you deploy a prompt flow to a managed online endpoint in the UI, by default, the deployment uses the environment that was created based on the base image specified in `flow.dag.yaml` and the dependencies specified in `requirements.txt` of the flow.
@@ -126,7 +122,9 @@ System-assigned identity is autocreated after your endpoint is created. The user
126122

127123
##### System assigned
128124

129-
Notice the option **Enforce access to connection secrets (preview)**. If your flow uses connections, the endpoint needs to access connections to perform inference. The option is enabled by default. The endpoint is granted the Azure Machine Learning Workspace Connection Secrets Reader role to access connections automatically if you have connection secrets reader permission. If you disable this option, you need to grant this role to the system-assigned identity manually or ask your admin for help. For more information, see [Grant permission to the endpoint identity](#grant-permissions-to-the-endpoint).
125+
Notice the option **Enforce access to connection secrets (preview)**. If your flow uses connections, the endpoint needs to access connections to perform inference. The option is enabled by default.
126+
127+
The endpoint is granted the Azure Machine Learning Workspace Connection Secrets Reader role to access connections automatically if you have connection secrets reader permission. If you disable this option, you need to grant this role to the system-assigned identity manually or ask your admin for help. For more information, see [Grant permission to the endpoint identity](#grant-permissions-to-the-endpoint).
130128

131129
##### User assigned
132130

@@ -136,9 +134,9 @@ If you created the associated endpoint with the **User Assigned Identity** optio
136134

137135
|Scope|Role|Why it's needed|
138136
|---|---|---|
139-
|Azure AI Foundry project|**Azure Machine Learning Workspace Connection Secrets Reader** role or a customized role with `Microsoft.MachineLearningServices/workspaces/connections/listsecrets/action` | Get project connections.|
140-
|Azure AI Foundry project container registry |**ACR Pull** |Pull container image. |
141-
|Azure AI Foundry project default storage| **Storage Blob Data Reader**| Load model from storage. |
137+
|Azure AI Foundry project|**Azure Machine Learning Workspace Connection Secrets Reader** role or a customized role with `Microsoft.MachineLearningServices/workspaces/connections/listsecrets/action` | Gets project connections.|
138+
|Azure AI Foundry project container registry |**ACR Pull** |Pulls container images. |
139+
|Azure AI Foundry project default storage| **Storage Blob Data Reader**| Loads a model from storage. |
142140
|Azure AI Foundry project|**Azure Machine Learning Metrics Writer (preview)**| After you deploy the endpoint, if you want to monitor the endpoint-related metrics like CPU/GPU/Disk/Memory utilization, give this permission to the identity.<br/><br/>Optional|
143141

144142
For more information about how to grant permissions to the endpoint identity, see [Grant permissions to the endpoint](#grant-permissions-to-the-endpoint).
@@ -214,7 +212,7 @@ For endpoints deployed from standard flow, you can input values in the form edit
214212

215213
For endpoints deployed from a chat flow, you can test it in an immersive chat window.
216214

217-
The `chat_input` was set during development of the chat flow. You can input the `chat_input` message in the input box. If your flow has multiple inputs, you can specify the values for other inputs besides the `chat_input` in the **Inputs** pane on the right side.
215+
The `chat_input` message was set during the development of the chat flow. You can put the `chat_input` message in the input box. If your flow has multiple inputs, you can specify the values for other inputs besides the `chat_input` message on the **Inputs** pane on the right side.
218216

219217
## Consume the endpoint
220218

articles/ai-foundry/how-to/monitor-quality-safety.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ In this section, you learn how to deploy your prompt flow with inferencing data
133133

134134
:::image type="content" source="../media/deploy-monitor/monitor/deployment-with-data-collection-enabled.png" alt-text="Screenshot that shows the Review page in the deployment wizard with all settings completed." lightbox = "../media/deploy-monitor/monitor/deployment-with-data-collection-enabled.png":::
135135

136-
By default, all inputs and outputs of your deployed prompt flow application are collected to your Blob Storage. As users invoke the deployment, the data is collected for your monitor to use.
136+
By default, all inputs and outputs of your deployed prompt flow application are collected to your blob storage. As users invoke the deployment, the data is collected for your monitor to use.
137137

138138
1. Select the **Test** tab on the deployment page. Then test your deployment to ensure that it's working properly.
139139

@@ -164,7 +164,7 @@ In this section, you learn how to configure monitoring for your deployed prompt
164164

165165
:::image type="content" source="../media/deploy-monitor/monitor/column-map-advanced-options.png" alt-text="Screenshot that shows advanced options when you map columns for monitoring metrics." lightbox = "../media/deploy-monitor/monitor/column-map-advanced-options.png":::
166166

167-
If data collection isn't enabled for your deployment, creation of a monitor enables collection of inferencing data to your Blob Storage. This task takes the deployment offline for a few minutes.
167+
If data collection isn't enabled for your deployment, creation of a monitor enables collection of inferencing data to your blob storage. This task takes the deployment offline for a few minutes.
168168

169169
1. Select **Create** to create your monitor.
170170

@@ -196,7 +196,7 @@ from azure.identity import DefaultAzureCredential
196196

197197
credential = DefaultAzureCredential()
198198

199-
# Update your azure resources details
199+
# Update your Azure resources details
200200
subscription_id = "INSERT YOUR SUBSCRIPTION ID"
201201
resource_group = "INSERT YOUR RESOURCE GROUP NAME"
202202
project_name = "INSERT YOUR PROJECT NAME" # This is the same as your Azure AI Foundry project name
@@ -212,7 +212,7 @@ monitor_name ="gen_ai_monitor_both_signals"
212212
defaulttokenstatisticssignalname ="token-usage-signal"
213213
defaultgsqsignalname ="gsq-signal"
214214

215-
# Determine the frequency to run the monitor, and the emails to recieve email alerts
215+
# Determine the frequency to run the monitor, and the emails to receive email alerts
216216
trigger_schedule = CronTrigger(expression="15 10 * * *")
217217
notification_emails_list = ["[email protected]", "[email protected]"]
218218

@@ -235,7 +235,7 @@ aggregated_relevance_pass_rate = 0.7
235235
aggregated_coherence_pass_rate = 0.7
236236
aggregated_fluency_pass_rate = 0.7
237237

238-
# Create an instance of gsq signal
238+
# Create an instance of a gsq signal
239239
generation_quality_thresholds = GenerationSafetyQualityMonitoringMetricThreshold(
240240
groundedness = {"aggregated_groundedness_pass_rate": aggregated_groundedness_pass_rate},
241241
relevance={"aggregated_relevance_pass_rate": aggregated_relevance_pass_rate},
@@ -265,7 +265,7 @@ gsq_signal = GenerationSafetyQualitySignal(
265265
},
266266
)
267267

268-
# Create an instance of token statistic signal
268+
# Create an instance of a token statistic signal
269269
token_statistic_signal = GenerationTokenStatisticsSignal()
270270

271271
monitoring_signals = {
@@ -301,7 +301,7 @@ After you create your monitor, it runs daily to compute the token usage and gene
301301
- **Prompt token count**: The number of prompt tokens used by the deployment during the selected time window.
302302
- **Completion token count**: The number of completion tokens used by the deployment during the selected time window.
303303

304-
1. View the metrics on the **Token usage** tab. (This tab is selected by default.) Here, you can view the token usage of your application over time. You can also view the distribution of prompt and completion tokens over time. You can change the **Trendline scope** value to monitor all tokens in the entire application or token usage for a particular deployment (for example, gpt-4) used within your application.
304+
1. View the metrics on the **Token usage** tab. (This tab is selected by default.) Here, you can view the token usage of your application over time. You can also view the distribution of prompt and completion tokens over time. You can change the **Trendline scope** value to monitor all tokens in the entire application or token usage for a particular deployment (for example, GPT-4) used within your application.
305305

306306
:::image type="content" source="../media/deploy-monitor/monitor/monitor-token-usage.png" alt-text="Screenshot that shows the token usage on the deployment's monitoring page." lightbox = "../media/deploy-monitor/monitor/monitor-token-usage.png":::
307307

@@ -362,7 +362,7 @@ from azure.identity import DefaultAzureCredential
362362

363363
credential = DefaultAzureCredential()
364364

365-
# Update your azure resources details
365+
# Update your Azure resources details
366366
subscription_id = "INSERT YOUR SUBSCRIPTION ID"
367367
resource_group = "INSERT YOUR RESOURCE GROUP NAME"
368368
project_name = "INSERT YOUR PROJECT NAME" # This is the same as your Azure AI Foundry project name
@@ -390,7 +390,7 @@ monitoring_target = MonitoringTarget(
390390
endpoint_deployment_id=f"azureml:{endpoint_name}:{deployment_name}",
391391
)
392392

393-
# Create an instance of token statistic signal
393+
# Create an instance of a token statistic signal
394394
token_statistic_signal = GenerationTokenStatisticsSignal()
395395

396396
monitoring_signals = {
@@ -439,7 +439,7 @@ from azure.identity import DefaultAzureCredential
439439

440440
credential = DefaultAzureCredential()
441441

442-
# Update your azure resources details
442+
# Update your Azure resources details
443443
subscription_id = "INSERT YOUR SUBSCRIPTION ID"
444444
resource_group = "INSERT YOUR RESOURCE GROUP NAME"
445445
project_name = "INSERT YOUR PROJECT NAME" # This is the same as your Azure AI Foundry project name
@@ -454,7 +454,7 @@ app_trace_Version = "1"
454454
monitor_name ="gen_ai_monitor_generation_quality"
455455
defaultgsqsignalname ="gsq-signal"
456456

457-
# Determine the frequency to run the monitor, and the emails to recieve email alerts
457+
# Determine the frequency to run the monitor and the emails to receive email alerts
458458
trigger_schedule = CronTrigger(expression="15 10 * * *")
459459
notification_emails_list = ["[email protected]", "[email protected]"]
460460

@@ -471,13 +471,13 @@ monitoring_target = MonitoringTarget(
471471
endpoint_deployment_id=f"azureml:{endpoint_name}:{deployment_name}",
472472
)
473473

474-
# Set thresholds for passing rate (0.7 = 70%)
474+
# Set thresholds for the passing rate (0.7 = 70%)
475475
aggregated_groundedness_pass_rate = 0.7
476476
aggregated_relevance_pass_rate = 0.7
477477
aggregated_coherence_pass_rate = 0.7
478478
aggregated_fluency_pass_rate = 0.7
479479

480-
# Create an instance of gsq signal
480+
# Create an instance of a gsq signal
481481
generation_quality_thresholds = GenerationSafetyQualityMonitoringMetricThreshold(
482482
groundedness = {"aggregated_groundedness_pass_rate": aggregated_groundedness_pass_rate},
483483
relevance={"aggregated_relevance_pass_rate": aggregated_relevance_pass_rate},

0 commit comments

Comments
 (0)