You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/prompt-flow/how-to-deploy-for-real-time-inference.md
+41-9Lines changed: 41 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.topic: how-to
9
9
author: likebupt
10
10
ms.author: keli19
11
11
ms.reviewer: lagayhar
12
-
ms.date: 07/07/2023
12
+
ms.date: 09/12/2023
13
13
---
14
14
15
15
@@ -45,9 +45,6 @@ If you didn't complete the tutorial, you need to build a flow. Testing the flow
45
45
46
46
We'll use the sample flow **Web Classification** as example to show how to deploy the flow. This sample flow is a standard flow. Deploying chat flows is similar. Evaluation flow doesn't support deployment.
47
47
48
-
> [!NOTE]
49
-
> Currently Prompt flow only supports **single deployment** of managed online endpoints, so we will simplify the *deployment* configuration in the UI.
50
-
51
48
## Create an online endpoint
52
49
53
50
Now that you have built a flow and tested it properly, it's time to create your online endpoint for real-time inference.
@@ -105,9 +102,17 @@ Select the identity you want to use, and you'll notice a warning message to remi
105
102
106
103
See detailed guidance about how to grant permissions to the endpoint identity in [Grant permissions to the endpoint](#grant-permissions-to-the-endpoint).
107
104
108
-
#### Allow sharing sample input data for testing purpose only
105
+
### Deployment
106
+
107
+
In this step, you can specify the following properties:
108
+
109
+
|Property| Description |
110
+
|---|-----|
111
+
|Deployment name| - Within the same endpoint, deployment name should be unique. <br> - If you select an existing endpoint in the previous step, and input an existing deployment name, then that deployment will be overwritten with the new configurations. |
112
+
|Inference data collection| If you enable this, the flow inputs and outputs will be auto collected in an Azure Machine Learning data asset, and can be used for later monitoring. To learn more, see [model monitoring.](how-to-monitor-generative-ai-applications.md)|
113
+
|Application Insights diagnostics| If you enable this, system metrics during inference time (such as token count, flow latency, flow request, and etc.) will be collected into workspace default Application Insights. To learn more, see [prompt flow serving metrics](#view-prompt-flow-endpoints-specific-metrics-optional).|
109
114
110
-
If the checkbox is selected, the first row of your input data will be used as sample input data for testing the endpoint later.
115
+
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/deploy-wizard-deployment.png" alt-text="Screenshot of the deployment step in the deploy wizard in the studio UI." lightbox = "./media/how-to-deploy-for-real-time-inference/deploy-wizard-deployment.png":::
111
116
112
117
### Outputs
113
118
@@ -228,7 +233,9 @@ The `chat_input` was set during development of the chat flow. You can input the
228
233
229
234
In the endpoint detail page, switch to the **Consume** tab. You can find the REST endpoint and key/token to consume your endpoint. There is also sample code for you to consume the endpoint in different languages.
230
235
231
-
## View metrics using Azure Monitor (optional)
236
+
## View endpoint metrics
237
+
238
+
### View managed online endpoints common metrics using Azure Monitor (optional)
232
239
233
240
You can view various metrics (request numbers, request latency, network bytes, CPU/GPU/Disk/Memory utilization, and more) for an online endpoint and its deployments by following links from the endpoint's **Details** page in the studio. Following these links take you to the exact metrics page in the Azure portal for the endpoint or deployment.
234
241
@@ -239,6 +246,33 @@ You can view various metrics (request numbers, request latency, network bytes, C
239
246
240
247
For more information on how to view online endpoint metrics, see [Monitor online endpoints](../how-to-monitor-online-endpoints.md#metrics).
241
248
249
+
### View prompt flow endpoints specific metrics (optional)
250
+
251
+
If you enable **Application Insights diagnostics** in the UI deploy wizard, or set `app_insights_enabled=true` in the deployment definition using code, there will be following prompt flow endpoints specific metrics collected in the workspace default Application Insights.
252
+
253
+
| Metrics Name | Type | Dimensions | Description |
| flow_streaming_response_duration | histogram | flow | streaming response sending cost, from sending first byte to sending last byte |
263
+
264
+
You can find the workspace default Application Insights in your workspace page in Azure portal.
265
+
266
+
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/workspace-default-app-insights.png" alt-text="Screenshot of the workspace default Application Insights. " lightbox = "./media/how-to-deploy-for-real-time-inference/workspace-default-app-insights.png":::
267
+
268
+
Open the Application Insights, and select **Usage and estimated costs** from the left navigation. Select **Custom metrics (Preview)**, and select **With dimensions**, and save the change.
Select **Metrics** tab in the left navigation. Select **promptflow standard metrics** from the **Metric Namespace**, and you can explore the metrics from the **Metric** dropdown list with different aggregation methods.
0 commit comments