You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
nodes="all"# For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
89
-
),
90
-
"My_vscode": VsCodeJobService(
91
-
nodes="all"
92
-
),
93
-
"My_tensorboard": TensorBoardJobService(
94
-
nodes="all",
95
-
log_Dir="output/tblogs"# relative path of Tensorboard logs (same as in your training script)
The `services` section specifies the training applications you want to interact with.
111
-
112
-
You can put `sleep <specific time>` at the end of your command to specify the amount of time you want to reserve the compute resource. The format follows:
113
-
* sleep 1s
114
-
* sleep 1m
115
-
* sleep 1h
116
-
* sleep 1d
117
-
118
-
You can also use the `sleep infinity` command that would keep the job alive indefinitely.
119
-
120
-
> [!NOTE]
121
-
> If you use `sleep infinity`, you will need to manually [cancel the job](./how-to-interactive-jobs.md#end-job) to let go of the compute resource (and stop billing).
80
+
```python
81
+
command_job = command(...
82
+
code="./src", # local path where the code is stored
83
+
command="python main.py", # you can add a command like "sleep 1h" to reserve the compute resource is reserved after the script finishes running
nodes="all"# For distributed jobs, use the `nodes` property to pick which node you want to enable interactive services on. If `nodes` are not selected, by default, interactive applications are only enabled on the head node. Values are "all", or compute node index (for ex. "0", "1" etc.)
89
+
),
90
+
"My_vscode": VsCodeJobService(
91
+
nodes="all"
92
+
),
93
+
"My_tensorboard": TensorBoardJobService(
94
+
nodes="all",
95
+
log_Dir="output/tblogs"# relative path of Tensorboard logs (same as in your training script)
The `services` section specifies the training applications you want to interact with.
111
+
112
+
You can put `sleep <specific time>` at the end of your command to specify the amount of time you want to reserve the compute resource. The format follows:
113
+
* sleep 1s
114
+
* sleep 1m
115
+
* sleep 1h
116
+
* sleep 1d
117
+
118
+
You can also use the `sleep infinity` command that would keep the job alive indefinitely.
119
+
120
+
> [!NOTE]
121
+
> If you use `sleep infinity`, you will need to manually [cancel the job](./how-to-interactive-jobs.md#end-job) to let go of the compute resource (and stop billing).
122
122
123
123
2. Submit your training job. For more details on how to train with the Python SDKv2, check out this [article](./how-to-train-model.md).
124
124
@@ -172,15 +172,6 @@ To interact with your running job, click the button **Debug and monitor** on the
172
172
:::image type="content" source="media/interactive-jobs/debug-and-monitor.png" alt-text="Screenshot of interactive jobs debug and monitor panel location.":::
173
173
174
174
175
-
176
-
177
-
178
-
179
-
180
-
181
-
182
-
183
-
184
175
Clicking the applications in the panel opens a new tab for the applications. You can access the applications only when they are in **Running** status and only the **job owner** is authorized to access the applications. If you're training on multiple nodes, you can pick the specific node you would like to interact with.
185
176
186
177
:::image type="content" source="media/interactive-jobs/interactive-jobs-application-list.png" alt-text="Screenshot of interactive jobs right panel information. Information content will vary depending on the user's data.":::
0 commit comments