update

cloga · cloga · commit 0cbf4c991678 · 2023-09-15T18:33:48.000+08:00
diff --git a/articles/machine-learning/prompt-flow/how-to-create-manage-runtime.md b/articles/machine-learning/prompt-flow/how-to-create-manage-runtime.md
@@ -109,80 +109,6 @@ Go to runtime detail page and select update button at the top. You can change ne
 > If you used a custom environment, you need to rebuild it using latest prompt flow image first, and then update your runtime with the new custom environment.
 
 
-### Common issues
-
-#### My runtime is failed with a system error **runtime not ready** when using a custom environment
-
-:::image type="content" source="./media/how-to-create-manage-runtime/ci-failed-runtime-not-ready.png" alt-text="Screenshot of a failed run on the runtime detail page. " lightbox = "./media/how-to-create-manage-runtime/ci-failed-runtime-not-ready.png":::
-
-First, go to the Compute Instance terminal and run `docker ps` to find the root cause. 
-
-Use  `docker images`  to check if the image was pulled successfully. If your image was pulled successfully, check if the Docker container is running. If it's already running, locate this runtime, which will attempt to restart the runtime and compute instance.
-
-#### Run failed due to "No module named XXX"
-
-This type error usually related to runtime lack required packages. If you're using default environment, make sure image of your runtime is using the latest version, learn more: [runtime update](#update-runtime-from-ui), if you're using custom image and you're using conda environment, make sure you have installed all required packages in your conda environment, learn more: [customize Prompt flow environment](how-to-customize-environment-runtime.md#customize-environment-with-docker-context-for-runtime).
-
-#### Request timeout issue
-
-##### Request timeout error shown in UI
-
-**MIR runtime request timeout error in the UI:**
-
-:::image type="content" source="./media/how-to-create-manage-runtime/mir-runtime-request-timeout.png" alt-text="Screenshot of a MIR runtime timeout error in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/mir-runtime-request-timeout.png":::
-
-Error in the example says "UserError: Upstream request timeout".
-
-**Compute instance runtime request timeout error:**
-
-:::image type="content" source="./media/how-to-create-manage-runtime/ci-runtime-request-timeout.png" alt-text="Screenshot of a compute instance runtime timeout error in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/ci-runtime-request-timeout.png":::
-
-Error in the example says "UserError: Invoking runtime gega-ci timeout, error message: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing".
-
-#### How to identify which node consume the most time
-
-1. Check the runtime logs
-
-2. Trying to find below warning log format
-
-    {node_name} has been running for {duration} seconds.
-
-    For example:
-
-   - Case 1: Python script node running for long time.
-
-        :::image type="content" source="./media/how-to-create-manage-runtime/runtime-timeout-running-for-long-time.png" alt-text="Screenshot of a timeout run logs in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/runtime-timeout-running-for-long-time.png":::
-
-        In this case, you can find that the `PythonScriptNode` was running for a long time (almost 300s), then you can check the node details to see what's the problem.
-
-   - Case 2: LLM node running for long time.
-
-        :::image type="content" source="./media/how-to-create-manage-runtime/runtime-timeout-by-language-model-timeout.png" alt-text="Screenshot of a timeout logs caused by LLM timeout in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/runtime-timeout-by-language-model-timeout.png":::
-
-        In this case, if you find the message `request canceled` in the logs, it may be due to the OpenAI API call taking too long and exceeding the runtime limit.
-
-        An OpenAI API Timeout could be caused by a network issue or a complex request that requires more processing time. For more information, see [OpenAI API Timeout](https://help.openai.com/en/articles/6897186-timeout).
-
-        You can try waiting a few seconds and retrying your request. This usually resolves any network issues.
-
-        If retrying doesn't work, check whether you're using a long context model, such as ‘gpt-4-32k’, and have set a large value for `max_tokens`. If so, it's expected behavior because your prompt may generate a very long response that takes longer than the interactive mode upper threshold. In this situation, we recommend trying 'Bulk test', as this mode doesn't have a timeout setting.
-
-3. If you can't find anything in runtime logs to indicate it's a specific node issue
-
-    Contact the Prompt Flow team ([promptflow-eng](mailto:aml-pt-eng@microsoft.com)) with the runtime logs. We'll try to identify the root cause.
-
-### Compute instance runtime related
-
-#### How to find the compute instance runtime log for further investigation?
-
-Go to the compute instance terminal and run  `docker logs -<runtime_container_name>`
-
-#### User doesn't have access to this compute instance. Please check if this compute instance is assigned to you and you have access to the workspace. Additionally, verify that you are on the correct network to access this compute instance.
-
-:::image type="content" source="./media/how-to-create-manage-runtime/ci-flow-clone-others.png" alt-text="Screenshot of a don't have access error on the flow page. " lightbox = "./media/how-to-create-manage-runtime/ci-flow-clone-others.png":::
-
-This because you're cloning a flow from others that is using compute instance as runtime. As compute instance runtime is user isolated, you need to create your own compute instance runtime or select a managed online deployment/endpoint runtime, which can be shared with others.
-
 ## Next steps
 
 - [Develop a standard flow](how-to-develop-a-standard-flow.md)
diff --git a/articles/machine-learning/prompt-flow/how-to-secure-prompt-flow.md b/articles/machine-learning/prompt-flow/how-to-secure-prompt-flow.md
@@ -55,6 +55,7 @@ Workspace managed virtual network is the recommended way to support network isol
     :::image type="content" source="./media/how-to-secure-prompt-flow/outbound-rule-non-azure-resources.png" alt-text="Screenshot of user defined outbound rule for non Azure resource." lightbox = "./media/how-to-secure-prompt-flow/outbound-rule-non-azure-resources.png":::
 
 4. In workspace which enable managed VNet, you can only deployment to managed online endpoint. You can follow [Secure your managed online endpoints with network isolation](../how-to-secure-kubernetes-inferencing-environment.md) to secure your managed online endpoint.
+
 ## Secure prompt flow use your own virtual network
 
 - To set up Azure Machine Learning related resources as private, see [Secure workspace resources](../how-to-secure-workspace-vnet.md).
diff --git a/articles/machine-learning/prompt-flow/tools-reference/troubleshoot-guidance.md b/articles/machine-learning/prompt-flow/tools-reference/troubleshoot-guidance.md
@@ -58,4 +58,77 @@ Prompt flow rely on fileshare to store snapshot of flow. If fileshare have some
     - If you didn't get this datastore, you need add it in your workspace.
         - Create fileshare with name `code-391ff5ac-6576-460f-ba4d-7e03433c68b6`
         - Create data store with name `workspaceworkingdirectory` . See [Create datastores](../../how-to-datastore.md)
-    - If you have `workspaceworkingdirectory` datastore but its type is `blob` instead of `fileshare`, please create new workspace and use storage didn't enable hierarchical namespaces ADLS Gen2 as workspace default storage account. See [Create workspace](../../how-to-manage-workspace.md#create-a-workspace)
+    - If you have `workspaceworkingdirectory` datastore but its type is `blob` instead of `fileshare`, please create new workspace and use storage didn't enable hierarchical namespaces ADLS Gen2 as workspace default storage account. See [Create workspace](../../how-to-manage-workspace.md#create-a-workspace)
+     
+    
+## Runtime related issues
+
+### My runtime is failed with a system error **runtime not ready** when using a custom environment
+
+:::image type="content" source="./media/how-to-create-manage-runtime/ci-failed-runtime-not-ready.png" alt-text="Screenshot of a failed run on the runtime detail page. " lightbox = "./media/how-to-create-manage-runtime/ci-failed-runtime-not-ready.png":::
+
+First, go to the Compute Instance terminal and run `docker ps` to find the root cause. 
+
+Use  `docker images`  to check if the image was pulled successfully. If your image was pulled successfully, check if the Docker container is running. If it's already running, locate this runtime, which will attempt to restart the runtime and compute instance.
+
+### Run failed due to "No module named XXX"
+
+This type error usually related to runtime lack required packages. If you're using default environment, make sure image of your runtime is using the latest version, learn more: [runtime update](#update-runtime-from-ui), if you're using custom image and you're using conda environment, make sure you have installed all required packages in your conda environment, learn more: [customize Prompt flow environment](how-to-customize-environment-runtime.md#customize-environment-with-docker-context-for-runtime).
+
+### Request timeout issue
+
+#### Request timeout error shown in UI
+
+**MIR runtime request timeout error in the UI:**
+
+:::image type="content" source="./media/how-to-create-manage-runtime/mir-runtime-request-timeout.png" alt-text="Screenshot of a MIR runtime timeout error in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/mir-runtime-request-timeout.png":::
+
+Error in the example says "UserError: Upstream request timeout".
+
+**Compute instance runtime request timeout error:**
+
+:::image type="content" source="./media/how-to-create-manage-runtime/ci-runtime-request-timeout.png" alt-text="Screenshot of a compute instance runtime timeout error in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/ci-runtime-request-timeout.png":::
+
+Error in the example says "UserError: Invoking runtime gega-ci timeout, error message: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing".
+
+### How to identify which node consume the most time
+
+1. Check the runtime logs
+
+2. Trying to find below warning log format
+
+    {node_name} has been running for {duration} seconds.
+
+    For example:
+
+   - Case 1: Python script node running for long time.
+
+        :::image type="content" source="./media/how-to-create-manage-runtime/runtime-timeout-running-for-long-time.png" alt-text="Screenshot of a timeout run logs in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/runtime-timeout-running-for-long-time.png":::
+
+        In this case, you can find that the `PythonScriptNode` was running for a long time (almost 300s), then you can check the node details to see what's the problem.
+
+   - Case 2: LLM node running for long time.
+
+        :::image type="content" source="./media/how-to-create-manage-runtime/runtime-timeout-by-language-model-timeout.png" alt-text="Screenshot of a timeout logs caused by LLM timeout in the studio UI. " lightbox = "./media/how-to-create-manage-runtime/runtime-timeout-by-language-model-timeout.png":::
+
+        In this case, if you find the message `request canceled` in the logs, it may be due to the OpenAI API call taking too long and exceeding the runtime limit.
+
+        An OpenAI API Timeout could be caused by a network issue or a complex request that requires more processing time. For more information, see [OpenAI API Timeout](https://help.openai.com/en/articles/6897186-timeout).
+
+        You can try waiting a few seconds and retrying your request. This usually resolves any network issues.
+
+        If retrying doesn't work, check whether you're using a long context model, such as ‘gpt-4-32k’, and have set a large value for `max_tokens`. If so, it's expected behavior because your prompt may generate a very long response that takes longer than the interactive mode upper threshold. In this situation, we recommend trying 'Bulk test', as this mode doesn't have a timeout setting.
+
+3. If you can't find anything in runtime logs to indicate it's a specific node issue
+
+    Contact the Prompt Flow team ([promptflow-eng](mailto:aml-pt-eng@microsoft.com)) with the runtime logs. We'll try to identify the root cause.
+
+### How to find the compute instance runtime log for further investigation?
+
+Go to the compute instance terminal and run  `docker logs -<runtime_container_name>`
+
+### User doesn't have access to this compute instance. Please check if this compute instance is assigned to you and you have access to the workspace. Additionally, verify that you are on the correct network to access this compute instance.
+
+:::image type="content" source="./media/how-to-create-manage-runtime/ci-flow-clone-others.png" alt-text="Screenshot of a don't have access error on the flow page. " lightbox = "./media/how-to-create-manage-runtime/ci-flow-clone-others.png":::
+
+This because you're cloning a flow from others that is using compute instance as runtime. As compute instance runtime is user isolated, you need to create your own compute instance runtime or select a managed online deployment/endpoint runtime, which can be shared with others.