Skip to content

Commit ae43c1c

Browse files
authored
Merge pull request #270774 from likebupt/pf-deploy-tsg
add article for troubleshooting prompt flow deployments
2 parents 014f161 + aba3ae3 commit ae43c1c

File tree

6 files changed

+153
-9
lines changed

6 files changed

+153
-9
lines changed

articles/ai-studio/tutorials/deploy-copilot-ai-studio.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -414,6 +414,7 @@ Now that you have your evaluation dataset, you can evaluate your flow by followi
414414

415415
> [!NOTE]
416416
> Evaluation with AI-assisted metrics needs to call another GPT model to do the calculation. For best performance, use a GPT-4 or gpt-35-turbo-16k model. If you didn't previously deploy a GPT-4 or gpt-35-turbo-16k model, you can deploy another model by following the steps in [Deploy a chat model](#deploy-a-chat-model). Then return to this step and select the model you deployed.
417+
> The evaluation process may take up lots of tokens, so it's recommended to use a model which can support >=16k tokens.
417418

418419
1. Select **Add new dataset**. Then select **Next**.
419420

articles/machine-learning/prompt-flow/how-to-bulk-test-evaluate-flow.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ If an evaluation method uses Large Language Models (LLMs) to measure the perform
8181

8282
> [!NOTE]
8383
> Some evaluation methods require GPT-4 or GPT-3 to run. You must provide valid connections for these evaluation methods before using them.
84+
> Some evaluation process may take up lots of tokens, so it's recommended to use a model which can support >=16k tokens.
8485
8586
After you finish the input mapping, select on **"Next"** to review your settings and select on **"Submit"** to start the batch run with evaluation.
8687

@@ -125,7 +126,7 @@ You can select **Evaluate** to start another round of evaluation.
125126

126127
After setting up the configuration, you can select **"Submit"** for this new round of evaluation. After submission, you'll be able to see a new record in the prompt flow run list.
127128

128-
After the evaluation run completed, similarly, you can check the result of evaluation in the **"Outputs"** tab of the batch run detail panel. You need select the new evaluation run to view its result.
129+
After the evaluation run completed, similarly, you can check the result of evaluation in the **"Outputs"** tab of the batch run detail panel. You need to select the new evaluation run to view its result.
129130

130131
:::image type="content" source="./media/how-to-bulk-test-evaluate-flow/batch-run-detail-output-new-evaluation.png" alt-text="Screenshot of batch run detail page on the output tab with checking the new evaluation output." lightbox = "./media/how-to-bulk-test-evaluate-flow/batch-run-detail-output-new-evaluation.png":::
131132

@@ -182,7 +183,7 @@ System message, sometimes referred to as a metaprompt or [system prompt](../../c
182183

183184
## Further reading: Guidance for creating Golden Datasets used for Copilot quality assurance
184185

185-
The creation of copilot that use Large Language Models (LLMs) typically involves grounding the model in reality using source datasets. However, to ensure that the LLMs provide the most accurate and useful responses to customer queries, a "Golden Dataset" is necessary.
186+
The creation of a copilot that use Large Language Models (LLMs) typically involves grounding the model in reality using source datasets. However, to ensure that the LLMs provide the most accurate and useful responses to customer queries, a "Golden Dataset" is necessary.
186187

187188
A Golden Dataset is a collection of realistic customer questions and expertly crafted answers. It serves as a Quality Assurance tool for LLMs used by your copilot. Golden Datasets are not used to train an LLM or inject context into an LLM prompt. Instead, they are utilized to assess the quality of the answers generated by the LLM.
188189

articles/machine-learning/prompt-flow/how-to-deploy-for-real-time-inference.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,12 @@ Select **Metrics** tab in the left navigation. Select **promptflow standard metr
315315

316316
## Troubleshoot endpoints deployed from prompt flow
317317

318+
### Lack authorization to perform action "Microsoft.MachineLearningService/workspaces/datastores/read"
319+
320+
If your flow contains Index Look Up tool, after deploying the flow, the endpoint needs to access workspace datastore to read MLIndex yaml file or FAISS folder containing chunks and embeddings. Hence, you need to manually grant the endpoint identity permission to do so.
321+
322+
You can either grant the endpoint identity **AzureML Data Scientist** on workspace scope, or a custom role which contains "MachineLearningService/workspace/datastore/reader" action.
323+
318324
### MissingDriverProgram Error
319325

320326
If you deploy your flow with custom environment and encounter the following error, it might be because you didn't specify the `inference_config` in your custom environment definition.
@@ -335,7 +341,14 @@ If you deploy your flow with custom environment and encounter the following erro
335341

336342
There are 2 ways to fix this error.
337343

338-
1. You can fix this error by adding `inference_config` in your custom environment definition. Learn more about [how to use customized environment](#use-customized-environment).
344+
- (Recommended) You can find the container image uri in your custom environment detail page, and set it as the flow base image in the flow.dag.yaml file. When you deploy the flow in UI, you just select **Use environment of current flow definition**, and the backend service will create the customized environment based on this base image and `requirement.txt` for your deployment. Learn more about [the environment specified in the flow definition](#use-environment-of-current-flow-definition).
345+
346+
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/custom-environment-image-uri.png" alt-text="Screenshot of custom environment detail page. " lightbox = "./media/how-to-deploy-for-real-time-inference/custom-environment-image-uri.png":::
347+
348+
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/flow-environment-image.png" alt-text="Screenshot of specifying base image in raw yaml file of the flow. " lightbox = "./media/how-to-deploy-for-real-time-inference/flow-environment-image.png":::
349+
350+
351+
- You can fix this error by adding `inference_config` in your custom environment definition. Learn more about [how to use customized environment](#use-customized-environment).
339352

340353
Following is an example of customized environment definition.
341354

@@ -358,12 +371,6 @@ inference_config:
358371
path: /score
359372
```
360373

361-
2. You can find the container image uri in your custom environment detail page, and set it as the flow base image in the flow.dag.yaml file. When you deploy the flow in UI, you just select **Use environment of current flow definition**, and the backend service will create the customized environment based on this base image and `requirement.txt` for your deployment. Learn more about [the environment specified in the flow definition](#use-environment-of-current-flow-definition).
362-
363-
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/custom-environment-image-uri.png" alt-text="Screenshot of custom environment detail page. " lightbox = "./media/how-to-deploy-for-real-time-inference/custom-environment-image-uri.png":::
364-
365-
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/flow-environment-image.png" alt-text="Screenshot of specifying base image in raw yaml file of the flow. " lightbox = "./media/how-to-deploy-for-real-time-inference/flow-environment-image.png":::
366-
367374
### Model response taking too long
368375

369376
Sometimes, you might notice that the deployment is taking too long to respond. There are several potential factors for this to occur.
@@ -398,3 +405,4 @@ If you aren't going use the endpoint after completing this tutorial, you should
398405

399406
- [Iterate and optimize your flow by tuning prompts using variants](how-to-tune-prompts-using-variants.md)
400407
- [View costs for an Azure Machine Learning managed online endpoint](../how-to-view-online-endpoints-costs.md)
408+
- [Troubleshoot prompt flow deployments.](how-to-troubleshoot-prompt-flow-deployment.md)

articles/machine-learning/prompt-flow/how-to-deploy-to-code.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -462,5 +462,6 @@ request_settings:
462462
- Learn more about [managed online endpoint schema](../reference-yaml-endpoint-online.md) and [managed online deployment schema](../reference-yaml-deployment-managed-online.md).
463463
- Learn more about how to [test the endpoint in UI](./how-to-deploy-for-real-time-inference.md#test-the-endpoint-with-sample-data) and [monitor the endpoint](./how-to-deploy-for-real-time-inference.md#view-managed-online-endpoints-common-metrics-using-azure-monitor-optional).
464464
- Learn more about how to [troubleshoot managed online endpoints](../how-to-troubleshoot-online-endpoints.md).
465+
- [Troubleshoot prompt flow deployments.](how-to-troubleshoot-prompt-flow-deployment.md)
465466
- Once you improve your flow, and would like to deploy the improved version with safe rollout strategy, see [Safe rollout for online endpoints](../how-to-safely-rollout-online-endpoints.md).
466467
- Learn more about [deploy flows to other platforms, such as a local development service, Docker container, Azure APP service, etc.](https://microsoft.github.io/promptflow/how-to-guides/deploy-a-flow/index.html)
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
title: Troubleshoot prompt flow deployments
3+
titleSuffix: Azure Machine Learning
4+
description: This article provides instructions on how to troubleshoot your prompt flow deployments.
5+
manager: scottpolly
6+
ms.service: machine-learning
7+
ms.topic: how-to
8+
ms.date: 04/01/2024
9+
ms.reviewer: lagayhar
10+
ms.author: keli19
11+
author: likebupt
12+
---
13+
14+
# Troubleshoot prompt flow deployments
15+
16+
This article provides instructions on how to troubleshoot your deployments from prompt flow.
17+
18+
## Lack authorization to perform action "Microsoft.MachineLearningService/workspaces/datastores/read"
19+
20+
If your flow contains Index Look Up tool, after deploying the flow, the endpoint needs to access workspace datastore to read MLIndex yaml file or FAISS folder containing chunks and embeddings. Hence, you need to manually grant the endpoint identity permission to do so.
21+
22+
You can either grant the endpoint identity **AzureML Data Scientist** on workspace scope, or a custom role that contains "MachineLearningService/workspace/datastore/reader" action.
23+
24+
## Upstream request timeout issue when consuming the endpoint
25+
26+
If you use CLI or SDK to deploy the flow, you may encounter timeout error. By default the `request_timeout_ms` is 5000. You can specify at max to 5 minutes, which is 300,000 ms. Following is example showing how to specify request time-out in the deployment yaml file. To learn more, see [deployment schema](../reference-yaml-deployment-managed-online.md).
27+
28+
```yaml
29+
request_settings:
30+
request_timeout_ms: 300000
31+
```
32+
33+
## OpenAI API hits authentication error
34+
35+
If you regenerate your Azure OpenAI key and manually update the connection used in prompt flow, you may encounter errors like "Unauthorized. Access token is missing, invalid, audience is incorrect or have expired." when invoking an existing endpoint created before key regenerating.
36+
37+
This is because the connections used in the endpoints/deployments won't be automatically updated. Any change for key or secrets in deployments should be done by manual update, which aims to avoid impacting online production deployment due to unintentional offline operation.
38+
39+
- If the endpoint was deployed in the studio UI, you can just redeploy the flow to the existing endpoint using the same deployment name.
40+
- If the endpoint was deployed using SDK or CLI, you need to make some modification to the deployment definition such as adding a dummy environment variable, and then use `az ml online-deployment update` to update your deployment.
41+
42+
43+
## Vulnerability issues in prompt flow deployments
44+
45+
For prompt flow runtime related vulnerabilities, following are approaches, which can help mitigate:
46+
47+
- Update the dependency packages in your requirements.txt in your flow folder.
48+
- If you're using customized base image for your flow, you need to update the prompt flow runtime to latest version and rebuild your base image, then redeploy the flow.
49+
50+
For any other vulnerabilities of managed online deployments, Azure Machine Learning fixes the issues in a monthly manner.
51+
52+
## "MissingDriverProgram Error" or "Could not find driver program in the request"
53+
54+
If you deploy your flow and encounter the following error, it might be related to the deployment environment.
55+
56+
```text
57+
'error':
58+
{
59+
'code': 'BadRequest',
60+
'message': 'The request is invalid.',
61+
'details':
62+
{'code': 'MissingDriverProgram',
63+
'message': 'Could not find driver program in the request.',
64+
'details': [],
65+
'additionalInfo': []
66+
}
67+
}
68+
```
69+
70+
```text
71+
Could not find driver program in the request
72+
```
73+
74+
There are two ways to fix this error.
75+
76+
- (Recommended) You can find the container image uri in your custom environment detail page, and set it as the flow base image in the flow.dag.yaml file. When you deploy the flow in UI, you just select **Use environment of current flow definition**, and the backend service will create the customized environment based on this base image and `requirement.txt` for your deployment. Learn more about [the environment specified in the flow definition](how-to-deploy-for-real-time-inference.md#use-environment-of-current-flow-definition).
77+
78+
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/custom-environment-image-uri.png" alt-text="Screenshot of custom environment detail page. " lightbox = "./media/how-to-deploy-for-real-time-inference/custom-environment-image-uri.png":::
79+
80+
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/flow-environment-image.png" alt-text="Screenshot of specifying base image in raw yaml file of the flow. " lightbox = "./media/how-to-deploy-for-real-time-inference/flow-environment-image.png":::
81+
82+
- You can fix this error by adding `inference_config` in your custom environment definition.
83+
84+
Following is an example of customized environment definition.
85+
86+
```yaml
87+
$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
88+
name: pf-customized-test
89+
build:
90+
path: ./image_build
91+
dockerfile_path: Dockerfile
92+
description: promptflow customized runtime
93+
inference_config:
94+
liveness_route:
95+
port: 8080
96+
path: /health
97+
readiness_route:
98+
port: 8080
99+
path: /health
100+
scoring_route:
101+
port: 8080
102+
path: /score
103+
```
104+
105+
## Model response taking too long
106+
107+
Sometimes, you might notice that the deployment is taking too long to respond. There are several potential factors for this to occur.
108+
109+
- The model used in the flow isn't powerful enough (example: use GPT 3.5 instead of text-ada)
110+
- Index query isn't optimized and taking too long
111+
- Flow has many steps to process
112+
113+
Consider optimizing the endpoint with above considerations to improve the performance of the model.
114+
115+
## Unable to fetch deployment schema
116+
117+
After you deploy the endpoint and want to test it in the **Test tab** in the endpoint detail page, if the **Test tab** shows **Unable to fetch deployment schema**, you can try the following two methods to mitigate this issue:
118+
119+
:::image type="content" source="./media/how-to-deploy-for-real-time-inference/unable-to-fetch-deployment-schema.png" alt-text="Screenshot of the error unable to fetch deployment schema in Test tab in endpoint detail page. " lightbox = "./media/how-to-deploy-for-real-time-inference/unable-to-fetch-deployment-schema.png":::
120+
121+
- Make sure you have granted the correct permission to the endpoint identity. Learn more about [how to grant permission to the endpoint identity](how-to-deploy-for-real-time-inference.md#grant-permissions-to-the-endpoint).
122+
- It might be because you ran your flow in an old version runtime and then deployed the flow, the deployment used the environment of the runtime that was in old version as well. To update the runtime, follow [Update a runtime on the UI](./how-to-create-manage-runtime.md#update-a-runtime-on-the-ui) and rerun the flow in the latest runtime and then deploy the flow again.
123+
124+
## Access denied to list workspace secret
125+
126+
If you encounter an error like "Access denied to list workspace secret", check whether you have granted the correct permission to the endpoint identity. Learn more about [how to grant permission to the endpoint identity](how-to-deploy-for-real-time-inference.md#grant-permissions-to-the-endpoint).
127+
128+
## Next steps
129+
130+
- Learn more about [managed online endpoint schema](../reference-yaml-endpoint-online.md) and [managed online deployment schema](../reference-yaml-deployment-managed-online.md).
131+
- Learn more about how to [troubleshoot managed online endpoints](../how-to-troubleshoot-online-endpoints.md).

articles/machine-learning/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -678,6 +678,8 @@
678678
href: ./prompt-flow/concept-llmops-maturity.md
679679
- name: Monitor generative AI applications in production
680680
href: ./prompt-flow/how-to-monitor-generative-ai-applications.md
681+
- name: Troubleshoot prompt flow deployments
682+
href: ./prompt-flow/how-to-troubleshoot-prompt-flow-deployment.md
681683
- name: Transparency note
682684
href: ./prompt-flow/transparency-note.md
683685
- name: Tools Reference

0 commit comments

Comments
 (0)