Skip to content

Commit c765f19

Browse files
authored
Update how-to-interactive-jobs.md
1 parent edb299b commit c765f19

File tree

1 file changed

+42
-49
lines changed

1 file changed

+42
-49
lines changed

articles/machine-learning/how-to-interactive-jobs.md

Lines changed: 42 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -84,27 +84,20 @@ If you don't see the above options, make sure you have enabled the "Debug & moni
8484
environment="AzureML-tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu@latest",
8585
compute="<name-of-compute>",
8686
services={
87-
"My_jupyterlab": JobService(
88-
job_service_type="jupyter_lab",
87+
"My_jupyterlab": JupyterLabJobService(
8988
nodes="all" # For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
9089
),
91-
"My_vscode": JobService(
92-
job_service_type="vs_code",
90+
"My_vscode": VsCodeJobService(
9391
nodes="all"
9492
),
95-
"My_tensorboard": JobService(
96-
job_service_type="tensor_board",
93+
"My_tensorboard": TensorBoardJobService(
9794
nodes="all",
98-
properties={
99-
"logDir": "output/tblogs" # relative path of Tensorboard logs (same as in your training script)
95+
logDir="output/tblogs" # relative path of Tensorboard logs (same as in your training script)
10096
}
10197
),
102-
"My_ssh": JobService(
103-
job_service_type="ssh",
104-
sshPublicKeys="<add-public-key>",
98+
"My_ssh": SshJobService(
99+
ssh_Public_Keys="<add-public-key>",
105100
nodes="all"
106-
properties={
107-
"sshPublicKeys":"<add-public-key>"
108101
}
109102
),
110103
}
@@ -131,43 +124,43 @@ If you don't see the above options, make sure you have enabled the "Debug & moni
131124

132125
# [Azure CLI](#tab/azurecli)
133126

134-
1. 1. Create a job yaml `job.yaml` with below sample content. Make sure to replace `your compute name` with your own value. If you want to use custom environment, follow the examples in [this tutorial](how-to-manage-environments-v2.md) to create a custom environment.
127+
1. Create a job yaml `job.yaml` with below sample content. Make sure to replace `your compute name` with your own value. If you want to use custom environment, follow the examples in [this tutorial](how-to-manage-environments-v2.md) to create a custom environment.
135128
```dotnetcli
136-
code: src
137-
command:
138-
python train.py
139-
# you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running.
140-
environment: azureml:AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu:41
141-
compute: azureml:<your compute name>
142-
services:
143-
my_vs_code:
144-
job_service_type: vs_code
145-
nodes: all # For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
146-
my_tensor_board:
147-
job_service_type: tensor_board
148-
log_dir: "output/tblogs" # relative path of Tensorboard logs (same as in your training script)
149-
nodes: all
150-
my_jupyter_lab:
151-
job_service_type: jupyter_lab
152-
nodes: all
153-
my_ssh:
154-
job_service_type: ssh
155-
ssh_public_keys: <paste the entire pub key content>
156-
nodes: all
157-
```
158-
159-
The `services` section specifies the training applications you want to interact with.
160-
161-
You can put `sleep <specific time>` at the end of the command to specify the amount of time you want to reserve the compute resource. The format follows:
162-
* sleep 1s
163-
* sleep 1m
164-
* sleep 1h
165-
* sleep 1d
166-
167-
You can also use the `sleep infinity` command that would keep the job alive indefinitely.
129+
code: src
130+
command:
131+
python train.py
132+
# you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running.
133+
environment: azureml:AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu:41
134+
compute: azureml:<your compute name>
135+
services:
136+
my_vs_code:
137+
job_service_type: vs_code
138+
nodes: all # For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
139+
my_tensor_board:
140+
job_service_type: tensor_board
141+
log_dir: "output/tblogs" # relative path of Tensorboard logs (same as in your training script)
142+
nodes: all
143+
my_jupyter_lab:
144+
job_service_type: jupyter_lab
145+
nodes: all
146+
my_ssh:
147+
job_service_type: ssh
148+
ssh_public_keys: <paste the entire pub key content>
149+
nodes: all
150+
```
151+
152+
The `services` section specifies the training applications you want to interact with.
153+
154+
You can put `sleep <specific time>` at the end of the command to specify the amount of time you want to reserve the compute resource. The format follows:
155+
* sleep 1s
156+
* sleep 1m
157+
* sleep 1h
158+
* sleep 1d
159+
160+
You can also use the `sleep infinity` command that would keep the job alive indefinitely.
168161

169-
> [!NOTE]
170-
> If you use `sleep infinity`, you will need to manually [cancel the job](./how-to-interactive-jobs.md#end-job) to let go of the compute resource (and stop billing).
162+
> [!NOTE]
163+
> If you use `sleep infinity`, you will need to manually [cancel the job](./how-to-interactive-jobs.md#end-job) to let go of the compute resource (and stop billing).
171164

172165
2. Run command `az ml job create --file <path to your job yaml file> --workspace-name <your workspace name> --resource-group <your resource group name> --subscription <sub-id> `to submit your training job. For more details on running a job via CLIv2, check out this [article](./how-to-train-model.md).
173166

@@ -264,4 +257,4 @@ To submit a job with a debugger attached and the execution paused, you can use d
264257

265258
## Next steps
266259

267-
+ Learn more about [how and where to deploy a model](./how-to-deploy-online-endpoints.md).
260+
+ Learn more about [how and where to deploy a model](./how-to-deploy-online-endpoints.md).

0 commit comments

Comments
 (0)