Skip to content

Commit c7ffbdb

Browse files
authored
Add timeout info to Jobs guide docs (#3281)
* add timeout info to docs * Update jobs.md * Update jobs.md
1 parent 8318fe4 commit c7ffbdb

File tree

1 file changed

+86
-0
lines changed

1 file changed

+86
-0
lines changed

docs/source/en/guides/jobs.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,11 @@ This feature is pay-as-you-go: you only pay for the seconds you use.
8686
>>> run_uv_job("my_script.py")
8787
```
8888

89+
<Tip warning>
90+
91+
**Important**: Jobs have a default timeout (30 minutes), after which they will automatically stop. For long-running tasks like model training, make sure to set a custom timeout using the `timeout` parameter. See [Configure Job Timeout](#configure-job-timeout) for details.
92+
</Tip>
93+
8994
[`run_job`] returns the [`JobInfo`] which has the URL of the Job on Hugging Face, where you can see the Job status and the logs.
9095
Save the Job ID from [`JobInfo`] to manage the job:
9196

@@ -195,6 +200,87 @@ Available `flavor` options:
195200

196201
That's it! You're now running code on Hugging Face's infrastructure.
197202

203+
## Configure Job Timeout
204+
205+
Jobs have a default timeout (30 minutes), after which they will automatically stop. This is important to know when running long-running tasks like model training.
206+
207+
### Setting a custom timeout
208+
209+
You can specify a custom timeout value using the `timeout` parameter when running a job. The timeout can be specified in two ways:
210+
211+
1. **As a number** (interpreted as seconds):
212+
```python
213+
>>> from huggingface_hub import run_job
214+
>>> job = run_job(
215+
... image="pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
216+
... command=["python", "train_model.py"],
217+
... flavor="a10g-large",
218+
... timeout=7200, # 2 hours in seconds
219+
... )
220+
```
221+
222+
2. **As a string with time units**:
223+
```python
224+
>>> # Using different time units
225+
>>> job = run_job(
226+
... image="pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
227+
... command=["python", "train_model.py"],
228+
... flavor="a10g-large",
229+
... timeout="2h", # 2 hours
230+
... )
231+
232+
>>> # Other examples:
233+
>>> # timeout="30m" # 30 minutes
234+
>>> # timeout="1.5h" # 1.5 hours
235+
>>> # timeout="1d" # 1 day
236+
>>> # timeout="3600s" # 3600 seconds
237+
```
238+
239+
Supported time units:
240+
- `s` - seconds
241+
- `m` - minutes
242+
- `h` - hours
243+
- `d` - days
244+
245+
### Using timeout with UV jobs
246+
247+
For UV jobs, you can also specify the timeout:
248+
249+
```python
250+
>>> from huggingface_hub import run_uv_job
251+
>>> job = run_uv_job(
252+
... "training_script.py",
253+
... flavor="a10g-large",
254+
... timeout="90m", # 90 minutes
255+
... )
256+
```
257+
258+
<Tip warning>
259+
260+
If you don't specify a timeout, a default timeout will be applied to your job. For long-running tasks like model training that may take hours, make sure to set an appropriate timeout to avoid unexpected job terminations.
261+
262+
</Tip>
263+
264+
### Monitoring job duration
265+
266+
When running long tasks, it's good practice to:
267+
- Estimate your job's expected duration and set a timeout with some buffer
268+
- Monitor your job's progress through the logs
269+
- Check the job status to ensure it hasn't timed out
270+
271+
```python
272+
>>> from huggingface_hub import inspect_job, fetch_job_logs
273+
>>> # Check job status
274+
>>> job_info = inspect_job(job_id=job.id)
275+
>>> if job_info.status.stage == "ERROR":
276+
... print(f"Job failed: {job_info.status.message}")
277+
... # Check logs for more details
278+
... for log in fetch_job_logs(job_id=job.id):
279+
... print(log)
280+
```
281+
282+
For more details about the timeout parameter, see the [`run_job` API reference](https://huggingface.co/docs/huggingface_hub/package_reference/hf_api#huggingface_hub.HfApi.run_job.timeout).
283+
198284
## Pass Environment variables and Secrets
199285

200286
You can pass environment variables to your job using `env` and `secrets`:

0 commit comments

Comments
 (0)