You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-debug-pipeline-failure.md
+32-42Lines changed: 32 additions & 42 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,9 +13,9 @@ ms.date: 05/23/2024
13
13
ms.custom: designer
14
14
---
15
15
16
-
# Use Designer in Azure Machine Learning studio to debug pipeline failures
16
+
# Use Azure Machine Learning studio to debug pipeline failures
17
17
18
-
After you submit a pipeline job, you can select a link to the job in your workspace in Azure Machine Learning studio. The link opens the pipeline job detail page, where you can check results and debug your pipeline job. This article explains how to use the pipeline job detail page to debug machine learning pipeline failures.
18
+
After you submit a pipeline job, you can select a link to the job in your workspace in Azure Machine Learning studio. The link opens the pipeline job detail page, where you can check results and debug failed pipeline jobs. This article explains how to use the pipeline job detail page and pipeline comparison (preview) to debug machine learning pipeline failures.
19
19
20
20
> [!IMPORTANT]
21
21
> Items marked (preview) in this article are currently in public preview. The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
@@ -46,11 +46,12 @@ If your pipeline fails or gets stuck on a node, first view the logs.
46
46
47
47
:::image type="content" source="./media/how-to-debug-pipeline-failure/log-detail.png" alt-text="Screenshot of the user_logs in the node information pane." lightbox= "./media/how-to-debug-pipeline-failure/log-detail.png":::
48
48
49
-
- The *user_logs* folder contains information about user code generated logs. This folder is open by default, and the *std_log.txt* log is selected. The **std_log.txt** is where your code's logs (for example, print statements) show up.
49
+
- The *user_logs* folder contains information about user code generated logs. This folder is open by default, and the *std_log.txt* log is selected. Your code's logs, such as print statements, appear in the *std_log.txt*.
50
50
51
51
- The *system_logs* folder contains logs generated by Azure Machine Learning. To learn more, see [View and download diagnostic logs](how-to-log-view-metrics.md#view-and-download-diagnostic-logs).
52
52
53
-
If you don't see those folders, the compute run time update might not be released to the compute cluster yet, and you can look at *70_driver_log.txt* in the *azureml-logs* folder first.
53
+
> [!NOTE]
54
+
> If you don't see those folders, the compute run time update might not be released to the compute cluster yet, and you can look at *70_driver_log.txt* in the *azureml-logs* folder first.
54
55
55
56
## Compare pipeline jobs (preview)
56
57
@@ -62,11 +63,11 @@ To enable this feature in Azure Machine Learning studio, select the megaphone ic
62
63
63
64
:::image type="content" source="./media/how-to-debug-pipeline-failure/enable-preview.png" alt-text="Screenshot of the preview feature toggled on." lightbox= "./media/how-to-debug-pipeline-failure/enable-preview.png":::
64
65
65
-
### Debug a failed pipeline job by comparing it to a completed job
66
+
### Debug a failed pipeline job by comparing it to a successful job
66
67
67
68
During iterative model development, you might clone and modify a successful baseline pipeline by changing a parameter, dataset, compute resource, or other setting. If the new pipeline fails, you can use pipeline comparison to help figure out the failure by identifying the changes from the parent pipeline.
68
69
69
-
For example, if you get an error message that your new pipeline failed due to an out-of-memory issue, you can use pipeline comparison to see what changed from a completed parent pipeline.
70
+
For example, if you get an error message that your new pipeline failed due to an out-of-memory issue, you can use pipeline comparison to see what changes from the parent pipeline might cause memory issues.
70
71
71
72
#### Compare a pipeline with its parent
72
73
@@ -79,64 +80,53 @@ For example, if you get an error message that your new pipeline failed due to an
79
80
80
81
:::image type="content" source="./media/how-to-debug-pipeline-failure/comparison-list-detail.png" alt-text="Screenshot showing the comparison list with a parent and child pipeline added." lightbox= "./media/how-to-debug-pipeline-failure/comparison-list.png":::
81
82
82
-
Once you add both pipelines to the comparison list, select **Compare detail** or **Compare graph**.
83
+
Once you add both pipelines to the comparison list, you can select **Compare details** or **Compare graph**.
83
84
84
85
#### Compare graph
85
86
86
-
**Compare graph** shows the topology changes between pipelines **A** and **B**. Nodes specific to pipeline A are highlighted in red and marked with **A**, and nodes specific to pipeline B are highlighted in green and marked with **B**. A description of changes is shown at the tops of the nodes.
87
+
**Compare graph** shows the graph topology changes between pipelines **A** and **B**. On the canvas, nodes specific to pipeline A are marked **A** and highlighted in red, and nodes specific to pipeline B are marked **B** and highlighted in green. A description of changes appears at the tops of nodes that have differences.
87
88
88
-
Select a node to open the**Component information** pane, where depending on the node selected you can see **Dataset properties** or **Component properties** like **parameters**, **runSettings**, and **outputSettings**.
89
+
You can select any node to open a**Component information** pane, where you can see **Dataset properties** or **Component properties** like **parameters**, **runSettings**, and **outputSettings**. You can choose to **Show only differences** and to **See differences inline**.
89
90
90
91
:::image type="content" source="./media/how-to-debug-pipeline-failure/parameter-changed.png" alt-text="Screenshot showing the parameter changed and the component information tab." lightbox= "./media/how-to-debug-pipeline-failure/parameter-changed.png":::
91
92
92
-
#### Compare pipeline metadata and properties
93
+
In this view, you can select **Show compare details** at upper right to open the pipeline **Comparison overview**, which shows the same information as the **Details comparison** page.
93
94
94
-
If you investigate the dataset difference and find that data or topology doesn't seem to be the root cause of failure, you can also check the pipeline details like pipeline parameter, output or run settings.
95
+
#### Compare details
95
96
96
-
**Compare graph**is used to compare pipeline topology,**Compare detail**is used to compare pipeline properties link meta info or settings.
97
+
To see overall pipeline and job metadata, properties, and differences, select **Compare details**in the compare list. The**Details comparison**page shows **Pipeline properties** and **Job properties** for both pipeline jobs.
97
98
98
-
To access the detail comparison, go to the comparison list, select **Compare details** or select **Show compare details** on the pipeline comparison page.
99
+
- Pipeline properties include pipeline parameters, compute settings, and output settings.
100
+
- Run properties include run status, submit time and duration, and other run settings.
99
101
100
-
You'll see *Pipeline properties* and *Run properties*.
101
102
102
-
- Pipeline properties include pipeline parameters, run and output setting, etc.
103
-
- Run properties include job status, submit time and duration, etc.
104
-
105
-
The following screenshot shows an example of using the detail comparison, where the default compute setting might have been the reason for failure.
106
-
107
-
:::image type="content" source="./media/how-to-debug-pipeline-failure/compute.png" alt-text="Screenshot showing the comparison overview of the default compute." lightbox= "./media/how-to-debug-pipeline-failure/compute.png":::
108
-
109
-
To quickly check the topology comparison, select the pipeline name and select **Compare graph**.
103
+
You can choose to **Show only differences** and **See differences inline**, or select **Compare graph** at upper right to open the graph topology comparison.
110
104
111
105
:::image type="content" source="./media/how-to-debug-pipeline-failure/compare-graph.png" alt-text="Screenshot of detail comparison with compare graph highlighted." lightbox= "./media/how-to-debug-pipeline-failure/compare-graph.png":::
112
106
113
-
### Debug a failed node in a pipeline by comparing to a similar completed node
107
+
The following screenshot shows an example of using the detail comparison where the **defaultCompute** setting might be the reason for failure.
114
108
115
-
If you only updated node properties and changed nothing in the pipeline, then you can debug the node by comparing it with the jobs that are submitted from the same component.
109
+
:::image type="content" source="./media/how-to-debug-pipeline-failure/compute-detail.png" alt-text="Screenshot showing the comparison overview of the default compute." lightbox= "./media/how-to-debug-pipeline-failure/compute.png":::
110
+
### Debug a failed node in a pipeline by comparing to a similar completed node
116
111
117
-
To find the job to compare with
112
+
If you only updated node properties, you can debug the node by comparing it in other jobs that used the same node.
118
113
119
-
1. Find a successful job to compare with by viewing all runs submitted from the same component.
120
-
1. Right select the failed node and select *View Jobs*. This gives you a list of all the jobs.
121
-
122
-
:::image type="content" source="./media/how-to-debug-pipeline-failure/view-jobs.png" alt-text="Screenshot that shows a failed node with view jobs highlighted." lightbox= "./media/how-to-debug-pipeline-failure/view-jobs.png":::
114
+
1. Right select a failed node and select **View jobs** to get a list of jobs.
123
115
124
-
1. Choose a completed job as a comparison target.
125
-
1. After you found a failed and completed job to compare with, add the two jobs to the comparison candidate list.
126
-
1. For the failed node, right select and select *Add to compare*.
127
-
1. For the completed job, go to its parent pipeline and located the completed job. Then select *Add to compare*.
128
-
1. Once the two jobs are in the comparison list, select **Compare detail** to show the differences.
116
+
:::image type="content" source="./media/how-to-debug-pipeline-failure/view-jobs-detail.png" alt-text="Screenshot that shows a failed node with view jobs highlighted." lightbox= "./media/how-to-debug-pipeline-failure/view-jobs.png":::
129
117
130
-
### Share the comparison results
118
+
1. Choose a completed job as a comparison target and open it.
119
+
1. On both job pages, select **Add to compare** on the top menu bar to add both jobs to the **Compare** list.
120
+
1. Once the two jobs are in the comparison list, select **Compare details** to show the differences.
131
121
132
-
To share your comparison results select **Share** and copying the link. For example, you might find out that the dataset difference might of lead to the failure but you aren't a dataset specialist, you can share the comparison result with a data engineer on your team.
122
+
### Sharecomparison results
133
123
134
-
:::image type="content" source="./media/how-to-debug-pipeline-failure/share.png" alt-text="Screenshot showing the share button and the link you should copy." lightbox= "./media/how-to-debug-pipeline-failure/share.png":::
124
+
To share the comparison results with your teammates or other stakeholders, select **Share** on the top menu bar. You can choose to **Copy shareable link to graph** or **Copy pipeline job ID**.
135
125
136
-
## Next steps
126
+
:::image type="content" source="./media/how-to-debug-pipeline-failure/share-detail.png" alt-text="Screenshot showing the share button and the link you should copy." lightbox= "./media/how-to-debug-pipeline-failure/share.png":::
137
127
138
-
In this article, you learned how to debug pipeline failures. To learn more about how you can use the pipeline, see the following articles:
128
+
## Related content
139
129
140
-
-[How to build pipeline using python sdk v2](./how-to-create-component-pipeline-python.md)
141
-
-[How to build pipeline using python CLI v2](./how-to-create-component-pipelines-cli.md)
142
-
-[What is machine learning component](./concept-component.md)
0 commit comments