Skip to content

Commit 91e31ff

Browse files
authored
Merge pull request #250389 from lgayhardt/pfupdates0823
Prompt flow updates to existing docs part 1
2 parents 1f3a90e + 1090dda commit 91e31ff

File tree

59 files changed

+186
-171
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+186
-171
lines changed

articles/machine-learning/prompt-flow/how-to-bulk-test-evaluate-flow.md

Lines changed: 73 additions & 60 deletions
Large diffs are not rendered by default.

articles/machine-learning/prompt-flow/how-to-deploy-for-real-time-inference.md

Lines changed: 41 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: how-to
99
author: likebupt
1010
ms.author: keli19
1111
ms.reviewer: lagayhar
12-
ms.date: 07/07/2023
12+
ms.date: 09/12/2023
1313
---
1414

1515

@@ -45,9 +45,6 @@ If you didn't complete the tutorial, you need to build a flow. Testing the flow
4545

4646
We'll use the sample flow **Web Classification** as example to show how to deploy the flow. This sample flow is a standard flow. Deploying chat flows is similar. Evaluation flow doesn't support deployment.
4747

48-
> [!NOTE]
49-
> Currently Prompt flow only supports **single deployment** of managed online endpoints, so we will simplify the *deployment* configuration in the UI.
50-
5148
## Create an online endpoint
5249

5350
Now that you have built a flow and tested it properly, it's time to create your online endpoint for real-time inference.
@@ -105,9 +102,17 @@ Select the identity you want to use, and you'll notice a warning message to remi
105102

106103
See detailed guidance about how to grant permissions to the endpoint identity in [Grant permissions to the endpoint](#grant-permissions-to-the-endpoint).
107104

108-
#### Allow sharing sample input data for testing purpose only
105+
### Deployment
106+
107+
In this step, you can specify the following properties:
108+
109+
|Property| Description |
110+
|---|-----|
111+
|Deployment name| - Within the same endpoint, deployment name should be unique. <br> - If you select an existing endpoint in the previous step, and input an existing deployment name, then that deployment will be overwritten with the new configurations. |
112+
|Inference data collection| If you enable this, the flow inputs and outputs will be auto collected in an Azure Machine Learning data asset, and can be used for later monitoring. To learn more, see [model monitoring.](how-to-monitor-generative-ai-applications.md)|
113+
|Application Insights diagnostics| If you enable this, system metrics during inference time (such as token count, flow latency, flow request, and etc.) will be collected into workspace default Application Insights. To learn more, see [prompt flow serving metrics](#view-prompt-flow-endpoints-specific-metrics-optional).|
109114

110-
If the checkbox is selected, the first row of your input data will be used as sample input data for testing the endpoint later.
115+
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/deploy-wizard-deployment.png" alt-text="Screenshot of the deployment step in the deploy wizard in the studio UI." lightbox = "./media/how-to-deploy-for-real-time-inference/deploy-wizard-deployment.png":::
111116

112117
### Outputs
113118

@@ -228,7 +233,9 @@ The `chat_input` was set during development of the chat flow. You can input the
228233

229234
In the endpoint detail page, switch to the **Consume** tab. You can find the REST endpoint and key/token to consume your endpoint. There is also sample code for you to consume the endpoint in different languages.
230235

231-
## View metrics using Azure Monitor (optional)
236+
## View endpoint metrics
237+
238+
### View managed online endpoints common metrics using Azure Monitor (optional)
232239

233240
You can view various metrics (request numbers, request latency, network bytes, CPU/GPU/Disk/Memory utilization, and more) for an online endpoint and its deployments by following links from the endpoint's **Details** page in the studio. Following these links take you to the exact metrics page in the Azure portal for the endpoint or deployment.
234241

@@ -239,6 +246,33 @@ You can view various metrics (request numbers, request latency, network bytes, C
239246

240247
For more information on how to view online endpoint metrics, see [Monitor online endpoints](../how-to-monitor-online-endpoints.md#metrics).
241248

249+
### View prompt flow endpoints specific metrics (optional)
250+
251+
If you enable **Application Insights diagnostics** in the UI deploy wizard, or set `app_insights_enabled=true` in the deployment definition using code, there will be following prompt flow endpoints specific metrics collected in the workspace default Application Insights.
252+
253+
| Metrics Name | Type | Dimensions | Description |
254+
|--------------------------------------|-----------|-------------------------------------------|---------------------------------------------------------------------------------|
255+
| token_consumption | counter | - flow <br> - node<br> - llm_engine<br> - token_type: `prompt_tokens`: LLM API input tokens; `completion_tokens`: LLM API response tokens ; `total_tokens` = `prompt_tokens + completion tokens` | openai token consumption metrics |
256+
| flow_latency | histogram | flow,response_code,streaming,response_type| request execution cost, response_type means whether it's full/firstbyte/lastbyte|
257+
| flow_request | counter | flow,response_code,exception,streaming | flow request count |
258+
| node_latency | histogram | flow,node,run_status | node execution cost |
259+
| node_request | counter | flow,node,exception,run_status | node execution failure count |
260+
| rpc_latency | histogram | flow,node,api_call | rpc cost |
261+
| rpc_request | counter | flow,node,api_call,exception | rpc count |
262+
| flow_streaming_response_duration | histogram | flow | streaming response sending cost, from sending first byte to sending last byte |
263+
264+
You can find the workspace default Application Insights in your workspace page in Azure portal.
265+
266+
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/workspace-default-app-insights.png" alt-text="Screenshot of the workspace default Application Insights. " lightbox = "./media/how-to-deploy-for-real-time-inference/workspace-default-app-insights.png":::
267+
268+
Open the Application Insights, and select **Usage and estimated costs** from the left navigation. Select **Custom metrics (Preview)**, and select **With dimensions**, and save the change.
269+
270+
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/enable-multidimensional-metrics.png" alt-text="Screenshot of enable multidimensional metrics. " lightbox = "./media/how-to-deploy-for-real-time-inference/enable-multidimensional-metrics.png":::
271+
272+
Select **Metrics** tab in the left navigation. Select **promptflow standard metrics** from the **Metric Namespace**, and you can explore the metrics from the **Metric** dropdown list with different aggregation methods.
273+
274+
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/prompt-flow-metrics.png" alt-text="Screenshot of prompt flow endpoint metrics. " lightbox = "./media/how-to-deploy-for-real-time-inference/prompt-flow-metrics.png":::
275+
242276
## Troubleshoot endpoints deployed from prompt flow
243277

244278
### Unable to fetch deployment schema
@@ -261,8 +295,6 @@ If you aren't going use the endpoint after completing this tutorial, you should
261295
> [!NOTE]
262296
> The complete deletion may take approximately 20 minutes.
263297
264-
265-
266298
## Next Steps
267299

268300
- [Iterate and optimize your flow by tuning prompts using variants](how-to-tune-prompts-using-variants.md)

0 commit comments

Comments
 (0)