You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/prompt-flow/how-to-create-manage-runtime.md
-74Lines changed: 0 additions & 74 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -109,80 +109,6 @@ Go to runtime detail page and select update button at the top. You can change ne
109
109
> If you used a custom environment, you need to rebuild it using latest prompt flow image first, and then update your runtime with the new custom environment.
110
110
111
111
112
-
### Common issues
113
-
114
-
#### My runtime is failed with a system error **runtime not ready** when using a custom environment
115
-
116
-
:::image type="content" source="./media/how-to-create-manage-runtime/ci-failed-runtime-not-ready.png" alt-text="Screenshot of a failed run on the runtime detail page. " lightbox = "./media/how-to-create-manage-runtime/ci-failed-runtime-not-ready.png":::
117
-
118
-
First, go to the Compute Instance terminal and run `docker ps` to find the root cause.
119
-
120
-
Use `docker images` to check if the image was pulled successfully. If your image was pulled successfully, check if the Docker container is running. If it's already running, locate this runtime, which will attempt to restart the runtime and compute instance.
121
-
122
-
#### Run failed due to "No module named XXX"
123
-
124
-
This type error usually related to runtime lack required packages. If you're using default environment, make sure image of your runtime is using the latest version, learn more: [runtime update](#update-runtime-from-ui), if you're using custom image and you're using conda environment, make sure you have installed all required packages in your conda environment, learn more: [customize Prompt flow environment](how-to-customize-environment-runtime.md#customize-environment-with-docker-context-for-runtime).
125
-
126
-
#### Request timeout issue
127
-
128
-
##### Request timeout error shown in UI
129
-
130
-
**MIR runtime request timeout error in the UI:**
131
-
132
-
:::image type="content" source="./media/how-to-create-manage-runtime/mir-runtime-request-timeout.png" alt-text="Screenshot of a MIR runtime timeout error in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/mir-runtime-request-timeout.png":::
133
-
134
-
Error in the example says "UserError: Upstream request timeout".
:::image type="content" source="./media/how-to-create-manage-runtime/ci-runtime-request-timeout.png" alt-text="Screenshot of a compute instance runtime timeout error in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/ci-runtime-request-timeout.png":::
139
-
140
-
Error in the example says "UserError: Invoking runtime gega-ci timeout, error message: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing".
141
-
142
-
#### How to identify which node consume the most time
143
-
144
-
1. Check the runtime logs
145
-
146
-
2. Trying to find below warning log format
147
-
148
-
{node_name} has been running for {duration} seconds.
149
-
150
-
For example:
151
-
152
-
- Case 1: Python script node running for long time.
153
-
154
-
:::image type="content" source="./media/how-to-create-manage-runtime/runtime-timeout-running-for-long-time.png" alt-text="Screenshot of a timeout run logs in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/runtime-timeout-running-for-long-time.png":::
155
-
156
-
In this case, you can find that the `PythonScriptNode` was running for a long time (almost 300s), then you can check the node details to see what's the problem.
157
-
158
-
- Case 2: LLM node running for long time.
159
-
160
-
:::image type="content" source="./media/how-to-create-manage-runtime/runtime-timeout-by-language-model-timeout.png" alt-text="Screenshot of a timeout logs caused by LLM timeout in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/runtime-timeout-by-language-model-timeout.png":::
161
-
162
-
In this case, if you find the message `request canceled` in the logs, it may be due to the OpenAI API call taking too long and exceeding the runtime limit.
163
-
164
-
An OpenAI API Timeout could be caused by a network issue or a complex request that requires more processing time. For more information, see [OpenAI API Timeout](https://help.openai.com/en/articles/6897186-timeout).
165
-
166
-
You can try waiting a few seconds and retrying your request. This usually resolves any network issues.
167
-
168
-
If retrying doesn't work, check whether you're using a long context model, such as ‘gpt-4-32k’, and have set a large value for `max_tokens`. If so, it's expected behavior because your prompt may generate a very long response that takes longer than the interactive mode upper threshold. In this situation, we recommend trying 'Bulk test', as this mode doesn't have a timeout setting.
169
-
170
-
3. If you can't find anything in runtime logs to indicate it's a specific node issue
171
-
172
-
Contact the Prompt Flow team ([promptflow-eng](mailto:[email protected])) with the runtime logs. We'll try to identify the root cause.
173
-
174
-
### Compute instance runtime related
175
-
176
-
#### How to find the compute instance runtime log for further investigation?
177
-
178
-
Go to the compute instance terminal and run `docker logs -<runtime_container_name>`
179
-
180
-
#### User doesn't have access to this compute instance. Please check if this compute instance is assigned to you and you have access to the workspace. Additionally, verify that you are on the correct network to access this compute instance.
181
-
182
-
:::image type="content" source="./media/how-to-create-manage-runtime/ci-flow-clone-others.png" alt-text="Screenshot of a don't have access error on the flow page. " lightbox = "./media/how-to-create-manage-runtime/ci-flow-clone-others.png":::
183
-
184
-
This because you're cloning a flow from others that is using compute instance as runtime. As compute instance runtime is user isolated, you need to create your own compute instance runtime or select a managed online deployment/endpoint runtime, which can be shared with others.
185
-
186
112
## Next steps
187
113
188
114
-[Develop a standard flow](how-to-develop-a-standard-flow.md)
Copy file name to clipboardExpand all lines: articles/machine-learning/prompt-flow/how-to-secure-prompt-flow.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,6 +55,7 @@ Workspace managed virtual network is the recommended way to support network isol
55
55
:::image type="content" source="./media/how-to-secure-prompt-flow/outbound-rule-non-azure-resources.png" alt-text="Screenshot of user defined outbound rule for non Azure resource." lightbox = "./media/how-to-secure-prompt-flow/outbound-rule-non-azure-resources.png":::
56
56
57
57
4. In workspace which enable managed VNet, you can only deployment to managed online endpoint. You can follow [Secure your managed online endpoints with network isolation](../how-to-secure-kubernetes-inferencing-environment.md) to secure your managed online endpoint.
58
+
58
59
## Secure prompt flow use your own virtual network
59
60
60
61
- To set up Azure Machine Learning related resources as private, see [Secure workspace resources](../how-to-secure-workspace-vnet.md).
Copy file name to clipboardExpand all lines: articles/machine-learning/prompt-flow/tools-reference/troubleshoot-guidance.md
+74-1Lines changed: 74 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,4 +58,77 @@ Prompt flow rely on fileshare to store snapshot of flow. If fileshare have some
58
58
- If you didn't get this datastore, you need add it in your workspace.
59
59
- Create fileshare with name `code-391ff5ac-6576-460f-ba4d-7e03433c68b6`
60
60
- Create data store with name `workspaceworkingdirectory` . See [Create datastores](../../how-to-datastore.md)
61
-
- If you have `workspaceworkingdirectory` datastore but its type is `blob` instead of `fileshare`, please create new workspace and use storage didn't enable hierarchical namespaces ADLS Gen2 as workspace default storage account. See [Create workspace](../../how-to-manage-workspace.md#create-a-workspace)
61
+
- If you have `workspaceworkingdirectory` datastore but its type is `blob` instead of `fileshare`, please create new workspace and use storage didn't enable hierarchical namespaces ADLS Gen2 as workspace default storage account. See [Create workspace](../../how-to-manage-workspace.md#create-a-workspace)
62
+
63
+
64
+
## Runtime related issues
65
+
66
+
### My runtime is failed with a system error **runtime not ready** when using a custom environment
67
+
68
+
:::image type="content" source="./media/how-to-create-manage-runtime/ci-failed-runtime-not-ready.png" alt-text="Screenshot of a failed run on the runtime detail page. " lightbox = "./media/how-to-create-manage-runtime/ci-failed-runtime-not-ready.png":::
69
+
70
+
First, go to the Compute Instance terminal and run `docker ps` to find the root cause.
71
+
72
+
Use `docker images` to check if the image was pulled successfully. If your image was pulled successfully, check if the Docker container is running. If it's already running, locate this runtime, which will attempt to restart the runtime and compute instance.
73
+
74
+
### Run failed due to "No module named XXX"
75
+
76
+
This type error usually related to runtime lack required packages. If you're using default environment, make sure image of your runtime is using the latest version, learn more: [runtime update](#update-runtime-from-ui), if you're using custom image and you're using conda environment, make sure you have installed all required packages in your conda environment, learn more: [customize Prompt flow environment](how-to-customize-environment-runtime.md#customize-environment-with-docker-context-for-runtime).
77
+
78
+
### Request timeout issue
79
+
80
+
#### Request timeout error shown in UI
81
+
82
+
**MIR runtime request timeout error in the UI:**
83
+
84
+
:::image type="content" source="./media/how-to-create-manage-runtime/mir-runtime-request-timeout.png" alt-text="Screenshot of a MIR runtime timeout error in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/mir-runtime-request-timeout.png":::
85
+
86
+
Error in the example says "UserError: Upstream request timeout".
:::image type="content" source="./media/how-to-create-manage-runtime/ci-runtime-request-timeout.png" alt-text="Screenshot of a compute instance runtime timeout error in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/ci-runtime-request-timeout.png":::
91
+
92
+
Error in the example says "UserError: Invoking runtime gega-ci timeout, error message: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing".
93
+
94
+
### How to identify which node consume the most time
95
+
96
+
1. Check the runtime logs
97
+
98
+
2. Trying to find below warning log format
99
+
100
+
{node_name} has been running for {duration} seconds.
101
+
102
+
For example:
103
+
104
+
- Case 1: Python script node running for long time.
105
+
106
+
:::image type="content" source="./media/how-to-create-manage-runtime/runtime-timeout-running-for-long-time.png" alt-text="Screenshot of a timeout run logs in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/runtime-timeout-running-for-long-time.png":::
107
+
108
+
In this case, you can find that the `PythonScriptNode` was running for a long time (almost 300s), then you can check the node details to see what's the problem.
109
+
110
+
- Case 2: LLM node running for long time.
111
+
112
+
:::image type="content" source="./media/how-to-create-manage-runtime/runtime-timeout-by-language-model-timeout.png" alt-text="Screenshot of a timeout logs caused by LLM timeout in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/runtime-timeout-by-language-model-timeout.png":::
113
+
114
+
In this case, if you find the message `request canceled` in the logs, it may be due to the OpenAI API call taking too long and exceeding the runtime limit.
115
+
116
+
An OpenAI API Timeout could be caused by a network issue or a complex request that requires more processing time. For more information, see [OpenAI API Timeout](https://help.openai.com/en/articles/6897186-timeout).
117
+
118
+
You can try waiting a few seconds and retrying your request. This usually resolves any network issues.
119
+
120
+
If retrying doesn't work, check whether you're using a long context model, such as ‘gpt-4-32k’, and have set a large value for `max_tokens`. If so, it's expected behavior because your prompt may generate a very long response that takes longer than the interactive mode upper threshold. In this situation, we recommend trying 'Bulk test', as this mode doesn't have a timeout setting.
121
+
122
+
3. If you can't find anything in runtime logs to indicate it's a specific node issue
123
+
124
+
Contact the Prompt Flow team ([promptflow-eng](mailto:[email protected])) with the runtime logs. We'll try to identify the root cause.
125
+
126
+
### How to find the compute instance runtime log for further investigation?
127
+
128
+
Go to the compute instance terminal and run `docker logs -<runtime_container_name>`
129
+
130
+
### User doesn't have access to this compute instance. Please check if this compute instance is assigned to you and you have access to the workspace. Additionally, verify that you are on the correct network to access this compute instance.
131
+
132
+
:::image type="content" source="./media/how-to-create-manage-runtime/ci-flow-clone-others.png" alt-text="Screenshot of a don't have access error on the flow page. " lightbox = "./media/how-to-create-manage-runtime/ci-flow-clone-others.png":::
133
+
134
+
This because you're cloning a flow from others that is using compute instance as runtime. As compute instance runtime is user isolated, you need to create your own compute instance runtime or select a managed online deployment/endpoint runtime, which can be shared with others.
0 commit comments