Skip to content

Commit 23b93ef

Browse files
authored
Update how-to-interactive-jobs.md
1 parent ec688b3 commit 23b93ef

File tree

1 file changed

+42
-51
lines changed

1 file changed

+42
-51
lines changed

articles/machine-learning/how-to-interactive-jobs.md

Lines changed: 42 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -77,48 +77,48 @@ If you don't see the above options, make sure you have enabled the "Debug & moni
7777

7878
Note that you have to import the `JobService` class from the `azure.ai.ml.entities` package to configure interactive services via the SDKv2.
7979

80-
```python
81-
command_job = command(...
82-
code="./src", # local path where the code is stored
83-
command="python main.py", # you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running
84-
environment="AzureML-tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu@latest",
85-
compute="<name-of-compute>",
86-
services={
87-
"My_jupyterlab": JupyterLabJobService(
88-
nodes="all" # For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
89-
),
90-
"My_vscode": VsCodeJobService(
91-
nodes="all"
92-
),
93-
"My_tensorboard": TensorBoardJobService(
94-
nodes="all",
95-
log_Dir="output/tblogs" # relative path of Tensorboard logs (same as in your training script)
96-
}
97-
),
98-
"My_ssh": SshJobService(
99-
ssh_Public_Keys="<add-public-key>",
100-
nodes="all"
101-
}
102-
),
103-
}
104-
)
105-
106-
# submit the command
107-
returned_job = ml_client.jobs.create_or_update(command_job)
108-
```
109-
110-
The `services` section specifies the training applications you want to interact with.
111-
112-
You can put `sleep <specific time>` at the end of your command to specify the amount of time you want to reserve the compute resource. The format follows:
113-
* sleep 1s
114-
* sleep 1m
115-
* sleep 1h
116-
* sleep 1d
117-
118-
You can also use the `sleep infinity` command that would keep the job alive indefinitely.
119-
120-
> [!NOTE]
121-
> If you use `sleep infinity`, you will need to manually [cancel the job](./how-to-interactive-jobs.md#end-job) to let go of the compute resource (and stop billing).
80+
```python
81+
command_job = command(...
82+
code="./src", # local path where the code is stored
83+
command="python main.py", # you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running
84+
environment="AzureML-tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu@latest",
85+
compute="<name-of-compute>",
86+
services={
87+
"My_jupyterlab": JupyterLabJobService(
88+
nodes="all" # For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
89+
),
90+
"My_vscode": VsCodeJobService(
91+
nodes="all"
92+
),
93+
"My_tensorboard": TensorBoardJobService(
94+
nodes="all",
95+
log_Dir="output/tblogs" # relative path of Tensorboard logs (same as in your training script)
96+
}
97+
),
98+
"My_ssh": SshJobService(
99+
ssh_Public_Keys="<add-public-key>",
100+
nodes="all"
101+
}
102+
),
103+
}
104+
)
105+
106+
# submit the command
107+
returned_job = ml_client.jobs.create_or_update(command_job)
108+
```
109+
110+
The `services` section specifies the training applications you want to interact with.
111+
112+
You can put `sleep <specific time>` at the end of your command to specify the amount of time you want to reserve the compute resource. The format follows:
113+
* sleep 1s
114+
* sleep 1m
115+
* sleep 1h
116+
* sleep 1d
117+
118+
You can also use the `sleep infinity` command that would keep the job alive indefinitely.
119+
120+
> [!NOTE]
121+
> If you use `sleep infinity`, you will need to manually [cancel the job](./how-to-interactive-jobs.md#end-job) to let go of the compute resource (and stop billing).
122122
123123
2. Submit your training job. For more details on how to train with the Python SDKv2, check out this [article](./how-to-train-model.md).
124124

@@ -172,15 +172,6 @@ To interact with your running job, click the button **Debug and monitor** on the
172172
:::image type="content" source="media/interactive-jobs/debug-and-monitor.png" alt-text="Screenshot of interactive jobs debug and monitor panel location.":::
173173
174174
175-
176-
177-
178-
179-
180-
181-
182-
183-
184175
Clicking the applications in the panel opens a new tab for the applications. You can access the applications only when they are in **Running** status and only the **job owner** is authorized to access the applications. If you're training on multiple nodes, you can pick the specific node you would like to interact with.
185176
186177
:::image type="content" source="media/interactive-jobs/interactive-jobs-application-list.png" alt-text="Screenshot of interactive jobs right panel information. Information content will vary depending on the user's data.":::

0 commit comments

Comments
 (0)