Skip to content

Commit f079426

Browse files
committed
Learn Editor: Update how-to-interactive-jobs.md
1 parent 60edfcf commit f079426

File tree

1 file changed

+42
-12
lines changed

1 file changed

+42
-12
lines changed

articles/machine-learning/how-to-interactive-jobs.md

Lines changed: 42 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,20 +14,14 @@ ms.date: 03/15/2022
1414
#Customer intent: I'm a data scientist with ML knowledge in the machine learning space, looking to build ML models using data in Azure Machine Learning with full control of the model training including debugging and monitoring of live jobs.
1515
---
1616

17-
# Debug jobs and monitor training progress (preview)
18-
19-
> [!IMPORTANT]
20-
> Items marked (preview) in this article are currently in public preview.
21-
> The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
22-
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
17+
# Debug jobs and monitor training progress
2318

2419
Machine learning model training is usually an iterative process and requires significant experimentation. With the Azure Machine Learning interactive job experience, data scientists can use the Azure Machine Learning Python SDKv2, Azure Machine Learning CLIv2 or the Azure Studio to access the container where their job is running. Once the job container is accessed, users can iterate on training scripts, monitor training progress or debug the job remotely like they typically do on their local machines. Jobs can be interacted with via different training applications including **JupyterLab, TensorBoard, VS Code** or by connecting to the job container directly via **SSH**.
2520

2621
Interactive training is supported on **Azure Machine Learning Compute Clusters** and **Azure Arc-enabled Kubernetes Cluster**.
2722

2823
## Prerequisites
2924
- Review [getting started with training on Azure Machine Learning](./how-to-train-model.md).
30-
- To use this feature in Azure Machine Learning studio, enable the "Debug & monitor your training jobs" flight via the [preview panel](./how-to-enable-preview-features.md#how-do-i-enable-preview-features).
3125
- To use **VS Code**, [follow this guide](how-to-setup-vs-code.md) to set up the Azure Machine Learning extension.
3226
- Make sure your job environment has the `openssh-server` and `ipykernel ~=6.0` packages installed (all Azure Machine Learning curated training environments have these packages installed by default).
3327
- Interactive applications can't be enabled on distributed training runs where the distribution type is anything other than Pytorch, Tensorflow or MPI. Custom distributed training setup (configuring multi-node training without using the above distribution frameworks) is not currently supported.
@@ -70,8 +64,6 @@ By specifying interactive applications at job creation, you can connect directly
7064

7165
6. Review and create the job.
7266

73-
If you don't see the above options, make sure you have enabled the "Debug & monitor your training jobs" flight via the [preview panel](./how-to-enable-preview-features.md#how-do-i-enable-preview-features).
74-
7567
# [Python SDK](#tab/python)
7668
1. Define the interactive services you want to use for your job. Make sure to replace `your compute name` with your own value. If you want to use your own custom environment, follow the examples in [this tutorial](how-to-manage-environments-v2.md) to create a custom environment.
7769

@@ -170,11 +162,50 @@ To interact with your running job, click the button **Debug and monitor** on the
170162
:::image type="content" source="media/interactive-jobs/debug-and-monitor.png" alt-text="Screenshot of interactive jobs debug and monitor panel location.":::
171163

172164

165+
166+
167+
168+
169+
170+
171+
172+
173+
174+
175+
176+
177+
178+
179+
180+
181+
182+
183+
184+
185+
186+
187+
188+
189+
190+
191+
192+
193+
194+
195+
196+
197+
198+
199+
200+
201+
202+
203+
173204
Clicking the applications in the panel opens a new tab for the applications. You can access the applications only when they are in **Running** status and only the **job owner** is authorized to access the applications. If you're training on multiple nodes, you can pick the specific node you would like to interact with.
174205

175206
:::image type="content" source="media/interactive-jobs/interactive-jobs-application-list.png" alt-text="Screenshot of interactive jobs right panel information. Information content will vary depending on the user's data.":::
176207

177-
It might take a few minutes to start the job and the training applications specified during job creation. If you don't see the above options, make sure you have enabled the "Debug & monitor your training jobs" flight via the [preview panel](./how-to-enable-preview-features.md#how-do-i-enable-preview-features).
208+
It might take a few minutes to start the job and the training applications specified during job creation.
178209

179210
# [Python SDK](#tab/python)
180211
- Once the job is submitted, you can use `ml_client.jobs.show_services("<job name>", <compute node index>)` to view the interactive service endpoints.
@@ -211,8 +242,6 @@ When you click on the endpoints to interact when your job, you're taken to the u
211242
- If you have logged tensorflow events for your job, you can use TensorBoard to monitor the metrics when your job is running.
212243

213244
:::image type="content" source="./media/interactive-jobs/tensorboard-open.png" alt-text="Screenshot of interactive jobs tensorboard panel when first opened. This information will vary depending upon customer data":::
214-
215-
If you don't see the above options, make sure you have enabled the "Debug & monitor your training jobs" flight via the [preview panel](./how-to-enable-preview-features.md#how-do-i-enable-preview-features).
216245

217246
### End job
218247
Once you're done with the interactive training, you can also go to the job details page to cancel the job which will release the compute resource. Alternatively, use `az ml job cancel -n <your job name>` in the CLI or `ml_client.job.cancel("<job name>")` in the SDK.
@@ -247,3 +276,4 @@ To submit a job with a debugger attached and the execution paused, you can use d
247276
## Next steps
248277

249278
+ Learn more about [how and where to deploy a model](./how-to-deploy-online-endpoints.md).
279+

0 commit comments

Comments
 (0)