Skip to content

Commit 2fe6eff

Browse files
committed
Learn Editor: Update how-to-interactive-jobs.md
1 parent 738fd83 commit 2fe6eff

File tree

1 file changed

+26
-24
lines changed

1 file changed

+26
-24
lines changed

articles/machine-learning/how-to-interactive-jobs.md

Lines changed: 26 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,8 @@ Interactive training is supported on **Azure Machine Learning Compute Clusters**
3131
- To use **VS Code**, [follow this guide](how-to-setup-vs-code.md) to set up the Azure Machine Learning extension.
3232
- Make sure your job environment has the `openssh-server` and `ipykernel ~=6.0` packages installed (all Azure Machine Learning curated training environments have these packages installed by default).
3333
- Interactive applications can't be enabled on distributed training runs where the distribution type is anything other than Pytorch, Tensorflow or MPI. Custom distributed training setup (configuring multi-node training without using the above distribution frameworks) is not currently supported.
34-
35-
34+
- To use SSH, you will need an SSH key pair. You can use the `ssh-keygen -f "<filepath>"` command to generate a public and private key pair.
35+
3636
## Interact with your job container
3737

3838
By specifying interactive applications at job creation, you can connect directly to the container on the compute node where your job is running. Once you have access to the job container, you can test or debug your job in the exact same environment where it would run. You can also use VS Code to attach to the running process and debug as you would locally.
@@ -133,27 +133,27 @@ If you don't see the above options, make sure you have enabled the "Debug & moni
133133

134134
1. 1. Create a job yaml `job.yaml` with below sample content. Make sure to replace `your compute name` with your own value. If you want to use custom environment, follow the examples in [this tutorial](how-to-manage-environments-v2.md) to create a custom environment.
135135
```dotnetcli
136-
code: src
137-
command:
138-
python train.py
139-
# you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running.
140-
environment: azureml:AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu:41
141-
compute: azureml:<your compute name>
142-
services:
143-
my_vs_code:
144-
job_service_type: vs_code
145-
nodes: all # For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
146-
my_tensor_board:
147-
job_service_type: tensor_board
148-
log_dir: "output/tblogs" # relative path of Tensorboard logs (same as in your training script)
149-
nodes: all
150-
my_jupyter_lab:
151-
job_service_type: jupyter_lab
152-
nodes: all
153-
my_ssh:
154-
job_service_type: ssh
155-
ssh_public_keys: <paste the entire pub key content>
156-
nodes: all
136+
code: src
137+
command:
138+
python train.py
139+
# you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running.
140+
environment: azureml:AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu:41
141+
compute: azureml:<your compute name>
142+
services:
143+
my_vs_code:
144+
job_service_type: vs_code
145+
nodes: all # For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
146+
my_tensor_board:
147+
job_service_type: tensor_board
148+
log_dir: "output/tblogs" # relative path of Tensorboard logs (same as in your training script)
149+
nodes: all
150+
my_jupyter_lab:
151+
job_service_type: jupyter_lab
152+
nodes: all
153+
my_ssh:
154+
job_service_type: ssh
155+
ssh_public_keys: <paste the entire pub key content>
156+
nodes: all
157157
```
158158

159159
The `services` section specifies the training applications you want to interact with.
@@ -180,6 +180,8 @@ To interact with your running job, click the button **Debug and monitor** on the
180180

181181

182182

183+
184+
183185
Clicking the applications in the panel opens a new tab for the applications. You can access the applications only when they are in **Running** status and only the **job owner** is authorized to access the applications. If you're training on multiple nodes, you can pick the specific node you would like to interact with.
184186

185187
:::image type="content" source="media/interactive-jobs/interactive-jobs-application-list.png" alt-text="Screenshot of interactive jobs right panel information. Information content will vary depending on the user's data.":::
@@ -256,4 +258,4 @@ To submit a job with a debugger attached and the execution paused, you can use d
256258

257259
## Next steps
258260

259-
+ Learn more about [how and where to deploy a model](./how-to-deploy-online-endpoints.md).
261+
+ Learn more about [how and where to deploy a model](./how-to-deploy-online-endpoints.md).

0 commit comments

Comments
 (0)