Skip to content

Commit b3b7261

Browse files
committed
debug draft
1 parent c526cbb commit b3b7261

13 files changed

+43
-55
lines changed

articles/machine-learning/how-to-debug-pipeline-failure.md

Lines changed: 43 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,107 +1,95 @@
11
---
2-
title: 'How to use studio UI to debug pipeline failure'
2+
title: Use Azure Machine Learning studio to debug pipeline failures
33
titleSuffix: Azure Machine Learning
4-
description: Learn how to debug compare pipeline failure with pipeline UI in studio.
4+
description: Learn how to debug pipeline failures and compare pipelines by using the Azure Machine Learning studio UI.
55
ms.reviewer: lagayhar
66
author: likebupt
77
ms.author: keli19
88
services: machine-learning
99
ms.service: machine-learning
1010
ms.subservice: core
1111
ms.topic: how-to
12-
ms.date: 05/27/2023
12+
ms.date: 05/23/2024
1313
ms.custom: designer
1414
---
1515

16-
# How to use pipeline UI to debug Azure Machine Learning pipeline failures
16+
# Use Designer in Azure Machine Learning studio to debug pipeline failures
1717

18-
After submitting a pipeline, you'll see a link to the pipeline job in your Azure Machine Learning workspace. The link lands you in the pipeline job page in Azure Machine Learning studio, in which you can check result and debug your pipeline job.
19-
20-
This article introduces how to use the pipeline job page to debug machine learning pipeline failures.
18+
After you submit a pipeline job, you can select a link to the job in your workspace in Azure Machine Learning studio. The link opens the pipeline job detail page, where you can check results and debug your pipeline job. This article explains how to use the pipeline job detail page to debug machine learning pipeline failures.
2119

2220
> [!IMPORTANT]
23-
> Items marked (preview) in this article are currently in public preview.
24-
> The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
25-
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
26-
21+
> Items marked (preview) in this article are currently in public preview. The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
2722
28-
## Using outline to quickly find a node
23+
## Use outline to quickly find a node
2924

30-
In pipeline job detail page, there's an outline left to the canvas, which shows the overall structure of your pipeline job. Hovering on any row, you can select the "Locate" button to locate that node in the canvas.
25+
On the pipeline job detail page, the **Outline** pane on the left shows the overall structure of your pipeline job. Hover on any row and select the **Locate in canvas** icon to highlight that node on the canvas and open an information pane for the node on the right.
3126

32-
:::image type="content" source="./media/how-to-debug-pipeline-failure/outline.png" alt-text="Screenshot showing outline and locate in the canvas." lightbox= "./media/how-to-debug-pipeline-failure/outline.png":::
27+
:::image type="content" source="./media/how-to-debug-pipeline-failure/outline-detail.png" alt-text="Screenshot showing outline and locate in the canvas." lightbox= "./media/how-to-debug-pipeline-failure/outline.png":::
3328

34-
You can filter failed or completed nodes, and filter by only components or dataset for further search. The left pane shows the matched nodes with more information including status, duration, and created time.
29+
In the **Outline** pane, you can select the **Filter** icon to quickly filter the view to **Completed nodes only**, **Component only**, or **Dataset only**. You can also filter the list by entering node names or component names in the Search box, or by selecting **Add filter** and choosing from a list of filters.
3530

36-
:::image type="content" source="./media/how-to-debug-pipeline-failure/quick-filter.png" alt-text="Screenshot showing the quick filter by in outline > search." lightbox= "./media/how-to-debug-pipeline-failure/quick-filter.png":::
31+
:::image type="content" source="./media/how-to-debug-pipeline-failure/quick-filter-detail.png" alt-text="Screenshot showing quick filter and search in the Outline pane." lightbox= "./media/how-to-debug-pipeline-failure/quick-filter.png":::
3732

38-
You can also sort the filtered nodes.
33+
The left pane shows the matched nodes with more information including status, duration, and run time and date. You can sort the filtered nodes.
3934

40-
:::image type="content" source="./media/how-to-debug-pipeline-failure/sort.png" alt-text="Screenshot of sorting search result in outline > search." lightbox= "./media/how-to-debug-pipeline-failure/sort.png":::
35+
:::image type="content" source="./media/how-to-debug-pipeline-failure/sort-detail.png" alt-text="Screenshot of sorting search results in the Outline pane." lightbox= "./media/how-to-debug-pipeline-failure/sort.png":::
4136

42-
## Check logs and outputs of component
37+
## Check component logs and outputs
4338

4439
If your pipeline fails or gets stuck on a node, first view the logs.
4540

46-
1. You can select the specific node and open the right pane.
47-
48-
1. Select **Outputs+logs** tab and you can explore all the outputs and logs of this node.
41+
![Animated screenshot showing how to check node logs.](media/how-to-debug-pipeline-failure/node-logs.gif)
4942

50-
The **user_logs folder** contains information about user code generated logs. This folder is open by default, and the **std_log.txt** log is selected. The **std_log.txt** is where your code's logs (for example, print statements) show up.
43+
1. Select the node to open the information pane on the right.
5144

52-
The **system_logs folder** contains logs generated by Azure Machine Learning. Learn more about [View and download diagnostic logs](how-to-log-view-metrics.md#view-and-download-diagnostic-logs).
45+
1. Select **Outputs + logs** tab to view all the outputs and logs of this node.
5346

54-
![Screenshot of how to check node logs.](media/how-to-debug-pipeline-failure/node-logs.gif)
47+
:::image type="content" source="./media/how-to-debug-pipeline-failure/log-detail.png" alt-text="Screenshot of the user_logs in the node information pane." lightbox= "./media/how-to-debug-pipeline-failure/log-detail.png":::
48+
49+
- The *user_logs* folder contains information about user code generated logs. This folder is open by default, and the *std_log.txt* log is selected. The **std_log.txt** is where your code's logs (for example, print statements) show up.
5550

56-
If you don't see those folders, this is due to the compute run time update isn't released to the compute cluster yet, and you can look at **70_driver_log.txt** under **azureml-logs** folder first.
51+
- The *system_logs* folder contains logs generated by Azure Machine Learning. To learn more, see [View and download diagnostic logs](how-to-log-view-metrics.md#view-and-download-diagnostic-logs).
5752

58-
## Compare different pipelines to debug failure or other unexpected issues (preview)
53+
If you don't see those folders, the compute run time update might not be released to the compute cluster yet, and you can look at *70_driver_log.txt* in the *azureml-logs* folder first.
5954

60-
Pipeline comparison identifies the differences (including topology, component properties, and job properties) between multiple jobs. For example you can compare a successful pipeline and a failed pipeline, which helps you find what modifications make your pipeline fail.
55+
## Compare pipeline jobs (preview)
6156

62-
Two major scenarios where you can use pipeline comparison to help with debugging:
57+
You can compare different pipeline jobs to debug failure or other unexpected issues (preview). Pipeline comparison identifies the differences, such as topology, component properties, and job properties, between pipeline jobs.
6358

64-
- Debug your failed pipeline job by comparing it to a completed one.
65-
- Debug your failed node in a pipeline by comparing it to a similar completed one.
59+
For example, you can compare successful and failed pipeline jobs to find differences that might have made one pipeline job fail. You can debug a failed pipeline job by comparing it to a completed job, or debug a failed node in a pipeline by comparing it to a similar completed node.
6660

67-
To enable this feature:
61+
To enable this feature in Azure Machine Learning studio, select the megaphone icon at top right to manage preview features. In the **Managed preview feature** panel, make sure **Compare pipeline jobs to debug failures or unexpected issues** is set to **Enabled**.
6862

69-
1. Navigate to Azure Machine Learning studio UI.
70-
2. Select **Manage preview features** (megaphone icon) among the icons on the top right side of the screen.
71-
3. In **Managed preview feature** panel, toggle on **Compare pipeline jobs to debug failures or unexpected issues** feature.
63+
:::image type="content" source="./media/how-to-debug-pipeline-failure/enable-preview.png" alt-text="Screenshot of the preview feature toggled on." lightbox= "./media/how-to-debug-pipeline-failure/enable-preview.png":::
7264

73-
:::image type="content" source="./media/how-to-debug-pipeline-failure/enable-preview.png" alt-text="Screenshot of manage preview features toggled on." lightbox= "./media/how-to-debug-pipeline-failure/enable-preview.png":::
65+
### Debug a failed pipeline job by comparing it to a completed job
7466

75-
### How to debug your failed pipeline job by comparing it to a completed one
67+
During iterative model development, you might clone and modify a successful baseline pipeline by changing a parameter, dataset, compute resource, or other setting. If the new pipeline fails, you can use pipeline comparison to help figure out the failure by identifying the changes from the parent pipeline.
7668

77-
During iterative model development, you may have a baseline pipeline, and then do some modifications such as changing a parameter, dataset or compute resource, etc. If your new pipeline failed, you can use pipeline comparison to identify what has changed by comparing it to the baseline pipeline, which could help with figuring out why it failed.
69+
For example, if you get an error message that your new pipeline failed due to an out-of-memory issue, you can use pipeline comparison to see what changed from a completed parent pipeline.
7870

7971
#### Compare a pipeline with its parent
8072

81-
The first thing you should check when debugging is to locate the failed node and check the logs.
82-
83-
For example, you may get an error message showing that your pipeline failed due to out-of-memory. If your pipeline is cloned from a completed parent pipeline, you can use pipeline comparison to see what has changed.
84-
85-
1. Select **Show lineage**.
86-
1. Select the link under "Cloned From". This will open a new browser tab with the parent pipeline.
73+
1. On the failed pipeline job page, select **Show lineage**.
74+
1. Select the link in the **Cloned from** popup to open the parent pipeline job page in a new browser tab.
8775

88-
:::image type="content" source="./media/how-to-debug-pipeline-failure/cloned-from.png" alt-text="Screenshot showing the cloned from link, with the previous step, the lineage button highlighted." lightbox= "./media/how-to-debug-pipeline-failure/cloned-from.png":::
76+
:::image type="content" source="./media/how-to-debug-pipeline-failure/cloned-from.png" alt-text="Screenshot showing the cloned from link, with the previous step, the lineage button highlighted." lightbox= "./media/how-to-debug-pipeline-failure/cloned-from.png":::
8977

90-
1. Select **Add to compare** on the failed pipeline and the parent pipeline. This adds them in the comparison candidate list.
78+
1. On both pages, select **Add to compare** on the top menu bar to add both jobs to the **Compare** list.
9179

92-
:::image type="content" source="./media/how-to-debug-pipeline-failure/comparison-list.png" alt-text="Screenshot showing the comparison list with a parent and child pipeline added." lightbox= "./media/how-to-debug-pipeline-failure/comparison-list.png":::
80+
:::image type="content" source="./media/how-to-debug-pipeline-failure/comparison-list-detail.png" alt-text="Screenshot showing the comparison list with a parent and child pipeline added." lightbox= "./media/how-to-debug-pipeline-failure/comparison-list.png":::
9381

94-
### Compare topology
82+
Once you add both pipelines to the comparison list, select **Compare detail** or **Compare graph**.
9583

96-
Once the two pipelines are added to the comparison list, you have two options: **Compare detail** and **Compare graph**. **Compare graph** allows you to compare pipeline topology.
84+
#### Compare graph
9785

98-
**Compare graph** shows you the graph topology changes between pipeline A and B. The special nodes in pipeline A are highlighted in red and marked with "A only". The special nodes in pipeline B are in green and marked with "B only". The shared nodes are in gray. If there are differences on the shared nodes, what has been changed is shown on the top of node.
86+
**Compare graph** shows the topology changes between pipelines **A** and **B**. Nodes specific to pipeline A are highlighted in red and marked with **A**, and nodes specific to pipeline B are highlighted in green and marked with **B**. A description of changes is shown at the tops of the nodes.
9987

100-
There are three categories of changes with summaries viewable in the detail page, parameter change, input source, pipeline component. When the pipeline component is changed this means that there's a topology change inside or an inner node parameter change, you can select the folder icon on the pipeline component node to dig down into the details. Other changes can be detected by viewing the colored nodes in the compare graph.
88+
Select a node to open the **Component information** pane, where depending on the node selected you can see **Dataset properties** or **Component properties** like **parameters**, **runSettings**, and **outputSettings**.
10189

102-
:::image type="content" source="./media/how-to-debug-pipeline-failure/parameter-changed.png" alt-text="Screenshot showing the parameter changed and the component information tab." lightbox= "./media/how-to-debug-pipeline-failure/parameter-changed.png":::
90+
:::image type="content" source="./media/how-to-debug-pipeline-failure/parameter-changed.png" alt-text="Screenshot showing the parameter changed and the component information tab." lightbox= "./media/how-to-debug-pipeline-failure/parameter-changed.png":::
10391

104-
### Compare pipeline meta info and properties
92+
#### Compare pipeline metadata and properties
10593

10694
If you investigate the dataset difference and find that data or topology doesn't seem to be the root cause of failure, you can also check the pipeline details like pipeline parameter, output or run settings.
10795

@@ -122,11 +110,11 @@ To quickly check the topology comparison, select the pipeline name and select **
122110

123111
:::image type="content" source="./media/how-to-debug-pipeline-failure/compare-graph.png" alt-text="Screenshot of detail comparison with compare graph highlighted." lightbox= "./media/how-to-debug-pipeline-failure/compare-graph.png":::
124112

125-
### How to debug your failed node in a pipeline by comparing to similar completed node
113+
### Debug a failed node in a pipeline by comparing to a similar completed node
126114

127115
If you only updated node properties and changed nothing in the pipeline, then you can debug the node by comparing it with the jobs that are submitted from the same component.
128116

129-
#### Find the job to compare with
117+
To find the job to compare with
130118

131119
1. Find a successful job to compare with by viewing all runs submitted from the same component.
132120
1. Right select the failed node and select *View Jobs*. This gives you a list of all the jobs.
17.5 KB
Loading
34 KB
Loading
-209 KB
Loading
149 KB
Loading
59.3 KB
Loading
75.2 KB
Loading
81.1 KB
Loading
-53.4 KB
Loading
24.9 KB
Loading

0 commit comments

Comments
 (0)