You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-interactive-jobs.md
+26-24Lines changed: 26 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,8 +31,8 @@ Interactive training is supported on **Azure Machine Learning Compute Clusters**
31
31
- To use **VS Code**, [follow this guide](how-to-setup-vs-code.md) to set up the Azure Machine Learning extension.
32
32
- Make sure your job environment has the `openssh-server` and `ipykernel ~=6.0` packages installed (all Azure Machine Learning curated training environments have these packages installed by default).
33
33
- Interactive applications can't be enabled on distributed training runs where the distribution type is anything other than Pytorch, Tensorflow or MPI. Custom distributed training setup (configuring multi-node training without using the above distribution frameworks) is not currently supported.
34
-
35
-
34
+
- To use SSH, you will need an SSH key pair. You can use the `ssh-keygen -f "<filepath>"` command to generate a public and private key pair.
35
+
36
36
## Interact with your job container
37
37
38
38
By specifying interactive applications at job creation, you can connect directly to the container on the compute node where your job is running. Once you have access to the job container, you can test or debug your job in the exact same environment where it would run. You can also use VS Code to attach to the running process and debug as you would locally.
@@ -133,27 +133,27 @@ If you don't see the above options, make sure you have enabled the "Debug & moni
133
133
134
134
1. 1. Create a job yaml `job.yaml`with below sample content. Make sure to replace `your compute name`with your own value. If you want to use custom environment, follow the examples in [this tutorial](how-to-manage-environments-v2.md) to create a custom environment.
135
135
```dotnetcli
136
-
code: src
137
-
command:
138
-
python train.py
139
-
# you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running.
nodes: all# For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
146
-
my_tensor_board:
147
-
job_service_type: tensor_board
148
-
log_dir: "output/tblogs"# relative path of Tensorboard logs (same as in your training script)
149
-
nodes: all
150
-
my_jupyter_lab:
151
-
job_service_type: jupyter_lab
152
-
nodes: all
153
-
my_ssh:
154
-
job_service_type: ssh
155
-
ssh_public_keys: <paste the entire pub key content>
156
-
nodes: all
136
+
code: src
137
+
command:
138
+
python train.py
139
+
# you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running.
nodes: all# For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
146
+
my_tensor_board:
147
+
job_service_type: tensor_board
148
+
log_dir: "output/tblogs"# relative path of Tensorboard logs (same as in your training script)
149
+
nodes: all
150
+
my_jupyter_lab:
151
+
job_service_type: jupyter_lab
152
+
nodes: all
153
+
my_ssh:
154
+
job_service_type: ssh
155
+
ssh_public_keys: <paste the entire pub key content>
156
+
nodes: all
157
157
```
158
158
159
159
The `services` section specifies the training applications you want to interact with.
@@ -180,6 +180,8 @@ To interact with your running job, click the button **Debug and monitor** on the
180
180
181
181
182
182
183
+
184
+
183
185
Clicking the applications in the panel opens a new tab for the applications. You can access the applications only when they are in**Running** status and only the **job owner**is authorized to access the applications. If you're training on multiple nodes, you can pick the specific node you would like to interact with.
184
186
185
187
:::image type="content"source="media/interactive-jobs/interactive-jobs-application-list.png" alt-text="Screenshot of interactive jobs right panel information. Information content will vary depending on the user's data.":::
@@ -256,4 +258,4 @@ To submit a job with a debugger attached and the execution paused, you can use d
256
258
257
259
## Next steps
258
260
259
-
+ Learn more about [how and where to deploy a model](./how-to-deploy-online-endpoints.md).
261
+
+ Learn more about [how and where to deploy a model](./how-to-deploy-online-endpoints.md).
0 commit comments