You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-interactive-jobs.md
+38-39Lines changed: 38 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -131,56 +131,56 @@ If you don't see the above options, make sure you have enabled the "Debug & moni
131
131
132
132
# [Azure CLI](#tab/azurecli)
133
133
134
-
1. Create a job yaml `job.yaml`with below sample content. Make sure to replace `your compute name`with your own value. If you want to use custom environment, follow the examples in [this tutorial](how-to-manage-environments-v2.md) to create a custom environment.
135
-
```dotnetcli
136
-
code: src
137
-
command:
138
-
python train.py
139
-
# you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running.
nodes: all# For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
146
-
my_tensor_board:
147
-
job_service_type: tensor_board
148
-
properties:
149
-
logDir: "output/tblogs"# relative path of Tensorboard logs (same as in your training script)
150
-
nodes: all
151
-
my_jupyter_lab:
152
-
job_service_type: jupyter_lab
153
-
nodes: all
154
-
my_ssh:
155
-
job_service_type: ssh
156
-
properties:
157
-
sshPublicKeys: <paste the entire pub key content>
158
-
nodes: all
159
-
```
160
-
The `services` section specifies the training applications you want to interact with.
161
-
162
-
You can put `sleep <specific time>` at the end of the command to specify the amount of time you want to reserve the compute resource. The format follows:
163
-
* sleep 1s
164
-
* sleep 1m
165
-
* sleep 1h
166
-
* sleep 1d
167
-
168
-
You can also use the `sleep infinity` command that would keep the job alive indefinitely.
169
-
170
-
> [!NOTE]
134
+
1. 1. Create a job yaml `job.yaml`with below sample content. Make sure to replace `your compute name`with your own value. If you want to use custom environment, follow the examples in [this tutorial](how-to-manage-environments-v2.md) to create a custom environment.
135
+
```dotnetcli
136
+
code: src
137
+
command:
138
+
python train.py
139
+
# you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running.
nodes: all# For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
146
+
my_tensor_board:
147
+
job_service_type: tensor_board
148
+
logDir: "output/tblogs"# relative path of Tensorboard logs (same as in your training script)
149
+
nodes: all
150
+
my_jupyter_lab:
151
+
job_service_type: jupyter_lab
152
+
nodes: all
153
+
my_ssh:
154
+
job_service_type: ssh
155
+
properties:
156
+
sshPublicKeys: <paste the entire pub key content>
157
+
nodes: all
158
+
```
159
+
160
+
The `services` section specifies the training applications you want to interact with.
161
+
162
+
You can put `sleep <specific time>` at the end of the command to specify the amount of time you want to reserve the compute resource. The format follows:
163
+
* sleep 1s
164
+
* sleep 1m
165
+
* sleep 1h
166
+
* sleep 1d
167
+
168
+
You can also use the `sleep infinity` command that would keep the job alive indefinitely.
169
+
170
+
> [!NOTE]
171
171
> If you use `sleep infinity`, you will need to manually [cancel the job](./how-to-interactive-jobs.md#end-job) to let go of the compute resource (and stop billing).
172
172
173
173
2. Run command `az ml job create --file<path to your job yaml file>--workspace-name <your workspace name>--resource-group <your resource group name>--subscription <sub-id> `to submit your training job. For more details on running a job via CLIv2, check out this [article](./how-to-train-model.md).
174
174
175
175
---
176
-
177
176
### Connect to endpoints
178
177
# [Azure Machine Learning Studio](#tab/ui)
179
178
To interact with your running job, click the button **Debug and monitor** on the job details page.
180
179
181
180
:::image type="content"source="media/interactive-jobs/debug-and-monitor.png" alt-text="Screenshot of interactive jobs debug and monitor panel location.":::
182
181
183
182
183
+
184
184
Clicking the applications in the panel opens a new tab for the applications. You can access the applications only when they are in**Running** status and only the **job owner**is authorized to access the applications. If you're training on multiple nodes, you can pick the specific node you would like to interact with.
185
185
186
186
:::image type="content"source="media/interactive-jobs/interactive-jobs-application-list.png" alt-text="Screenshot of interactive jobs right panel information. Information content will vary depending on the user's data.":::
@@ -206,7 +206,6 @@ You can find the reference documentation for these commands [here](/cli/azure/ml
206
206
You can access the applications only when they are in**Running** status and only the **job owner**is authorized to access the applications. If you're training on multiple nodes, you can pick the specific node you would like to interact with by passing in the node index.
207
207
208
208
---
209
-
210
209
### Interact with the applications
211
210
When you click on the endpoints to interact when your job, you're taken to the user container under your working directory, where you can access your code, inputs, outputs, and logs. If you run into any issues while connecting to the applications, the interactive capability and applications logs can be found from **system_logs->interactive_capability** under **Outputs + logs** tab.
0 commit comments