Skip to content

Commit c82bb46

Browse files
committed
Learn Editor: Update how-to-interactive-jobs.md
1 parent e440a33 commit c82bb46

File tree

1 file changed

+38
-39
lines changed

1 file changed

+38
-39
lines changed

articles/machine-learning/how-to-interactive-jobs.md

Lines changed: 38 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -131,56 +131,56 @@ If you don't see the above options, make sure you have enabled the "Debug & moni
131131

132132
# [Azure CLI](#tab/azurecli)
133133

134-
1. Create a job yaml `job.yaml` with below sample content. Make sure to replace `your compute name` with your own value. If you want to use custom environment, follow the examples in [this tutorial](how-to-manage-environments-v2.md) to create a custom environment.
135-
```dotnetcli
136-
code: src
137-
command:
138-
python train.py
139-
# you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running.
140-
environment: azureml:AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu:41
141-
compute: azureml:<your compute name>
142-
services:
143-
my_vs_code:
144-
job_service_type: vs_code
145-
nodes: all # For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
146-
my_tensor_board:
147-
job_service_type: tensor_board
148-
properties:
149-
logDir: "output/tblogs" # relative path of Tensorboard logs (same as in your training script)
150-
nodes: all
151-
my_jupyter_lab:
152-
job_service_type: jupyter_lab
153-
nodes: all
154-
my_ssh:
155-
job_service_type: ssh
156-
properties:
157-
sshPublicKeys: <paste the entire pub key content>
158-
nodes: all
159-
```
160-
The `services` section specifies the training applications you want to interact with.
161-
162-
You can put `sleep <specific time>` at the end of the command to specify the amount of time you want to reserve the compute resource. The format follows:
163-
* sleep 1s
164-
* sleep 1m
165-
* sleep 1h
166-
* sleep 1d
167-
168-
You can also use the `sleep infinity` command that would keep the job alive indefinitely.
169-
170-
> [!NOTE]
134+
1. 1. Create a job yaml `job.yaml` with below sample content. Make sure to replace `your compute name` with your own value. If you want to use custom environment, follow the examples in [this tutorial](how-to-manage-environments-v2.md) to create a custom environment.
135+
```dotnetcli
136+
code: src
137+
command:
138+
python train.py
139+
# you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running.
140+
environment: azureml:AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu:41
141+
compute: azureml:<your compute name>
142+
services:
143+
my_vs_code:
144+
job_service_type: vs_code
145+
nodes: all # For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
146+
my_tensor_board:
147+
job_service_type: tensor_board
148+
logDir: "output/tblogs" # relative path of Tensorboard logs (same as in your training script)
149+
nodes: all
150+
my_jupyter_lab:
151+
job_service_type: jupyter_lab
152+
nodes: all
153+
my_ssh:
154+
job_service_type: ssh
155+
properties:
156+
sshPublicKeys: <paste the entire pub key content>
157+
nodes: all
158+
```
159+
160+
The `services` section specifies the training applications you want to interact with.
161+
162+
You can put `sleep <specific time>` at the end of the command to specify the amount of time you want to reserve the compute resource. The format follows:
163+
* sleep 1s
164+
* sleep 1m
165+
* sleep 1h
166+
* sleep 1d
167+
168+
You can also use the `sleep infinity` command that would keep the job alive indefinitely.
169+
170+
> [!NOTE]
171171
> If you use `sleep infinity`, you will need to manually [cancel the job](./how-to-interactive-jobs.md#end-job) to let go of the compute resource (and stop billing).
172172

173173
2. Run command `az ml job create --file <path to your job yaml file> --workspace-name <your workspace name> --resource-group <your resource group name> --subscription <sub-id> `to submit your training job. For more details on running a job via CLIv2, check out this [article](./how-to-train-model.md).
174174

175175
---
176-
177176
### Connect to endpoints
178177
# [Azure Machine Learning Studio](#tab/ui)
179178
To interact with your running job, click the button **Debug and monitor** on the job details page.
180179

181180
:::image type="content" source="media/interactive-jobs/debug-and-monitor.png" alt-text="Screenshot of interactive jobs debug and monitor panel location.":::
182181

183182

183+
184184
Clicking the applications in the panel opens a new tab for the applications. You can access the applications only when they are in **Running** status and only the **job owner** is authorized to access the applications. If you're training on multiple nodes, you can pick the specific node you would like to interact with.
185185

186186
:::image type="content" source="media/interactive-jobs/interactive-jobs-application-list.png" alt-text="Screenshot of interactive jobs right panel information. Information content will vary depending on the user's data.":::
@@ -206,7 +206,6 @@ You can find the reference documentation for these commands [here](/cli/azure/ml
206206
You can access the applications only when they are in **Running** status and only the **job owner** is authorized to access the applications. If you're training on multiple nodes, you can pick the specific node you would like to interact with by passing in the node index.
207207

208208
---
209-
210209
### Interact with the applications
211210
When you click on the endpoints to interact when your job, you're taken to the user container under your working directory, where you can access your code, inputs, outputs, and logs. If you run into any issues while connecting to the applications, the interactive capability and applications logs can be found from **system_logs->interactive_capability** under **Outputs + logs** tab.
212211

0 commit comments

Comments
 (0)