Skip to content

Commit 6e31992

Browse files
lhoestqdavanstrienjulien-chanouticelinaWauplin
authored
[Jobs] Add huggingface-cli jobs commands (#3211)
* jobs * style * docs * mypy * style * minor * remove hfjobs mentions * add huggingface-cli jobs uv commands * add some uv options * add test * fix for 3.8 * Update src/huggingface_hub/commands/jobs/uv.py Co-authored-by: Julien Chaumond <[email protected]> * move to HfApi * minor * more comments * uv run local_script.py * lucain's comments * more lucain's comments * Apply suggestions from code review Co-authored-by: célina <[email protected]> Co-authored-by: Lucain <[email protected]> * style * minor * Remove JobUrl and add url in JobInfo directly * Apply suggestions from code review Co-authored-by: Lucain <[email protected]> * add namespace arg * fix wrong job url * add missing methods at top level * add docs * uv script url as env, not secret * rename docs * update test * again * improve docs * add image arg to run_uv_job * List flavors from SpaceHardware * add to overview * remove zero GPU from flavors * add JobInfo etc. from _jobs_api in top level __init__ * add package_reference doc page * minor - link JobInfo in docs * JobInfo docstring --------- Co-authored-by: Daniel van Strien <[email protected]> Co-authored-by: Julien Chaumond <[email protected]> Co-authored-by: célina <[email protected]> Co-authored-by: Lucain <[email protected]> Co-authored-by: Lucain Pouget <[email protected]>
1 parent 9d25831 commit 6e31992

File tree

13 files changed

+1771
-0
lines changed

13 files changed

+1771
-0
lines changed

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@
4040
title: Integrate a library
4141
- local: guides/webhooks
4242
title: Webhooks
43+
- local: guides/jobs
44+
title: Jobs
4345
- title: 'Conceptual guides'
4446
sections:
4547
- local: concepts/git_vs_http
@@ -92,3 +94,5 @@
9294
title: Strict dataclasses
9395
- local: package_reference/oauth
9496
title: OAuth
97+
- local: package_reference/jobs
98+
title: Jobs

docs/source/en/guides/cli.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -604,3 +604,147 @@ Copy-and-paste the text below in your GitHub issue.
604604
- HF_HUB_ETAG_TIMEOUT: 10
605605
- HF_HUB_DOWNLOAD_TIMEOUT: 10
606606
```
607+
608+
## huggingface-cli jobs
609+
610+
Run compute jobs on Hugging Face infrastructure with a familiar Docker-like interface.
611+
612+
`huggingface-cli jobs` is a command-line tool that lets you run anything on Hugging Face's infrastructure (including GPUs and TPUs!) with simple commands. Think `docker run`, but for running code on A100s.
613+
614+
```bash
615+
# Directly run Python code
616+
>>> huggingface-cli jobs run python:3.12 python -c "print('Hello from the cloud!')"
617+
618+
# Use GPUs without any setup
619+
>>> huggingface-cli jobs run --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
620+
... python -c "import torch; print(torch.cuda.get_device_name())"
621+
622+
# Run in an organization account
623+
>>> huggingface-cli jobs run --namespace my-org-name python:3.12 python -c "print('Running in an org account')"
624+
625+
# Run from Hugging Face Spaces
626+
>>> huggingface-cli jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "select 'hello world'"
627+
628+
# Run a Python script with `uv` (experimental)
629+
>>> huggingface-cli jobs uv run my_script.py
630+
```
631+
632+
### ✨ Key Features
633+
634+
- 🐳 **Docker-like CLI**: Familiar commands (`run`, `ps`, `logs`, `inspect`) to run and manage jobs
635+
- 🔥 **Any Hardware**: From CPUs to A100 GPUs and TPU pods - switch with a simple flag
636+
- 📦 **Run Anything**: Use Docker images, HF Spaces, or your custom containers
637+
- 🔐 **Simple Auth**: Just use your HF token
638+
- 📊 **Live Monitoring**: Stream logs in real-time, just like running locally
639+
- 💰 **Pay-as-you-go**: Only pay for the seconds you use
640+
641+
### Quick Start
642+
643+
#### 1. Run your first job
644+
645+
```bash
646+
# Run a simple Python script
647+
>>> huggingface-cli jobs run python:3.12 python -c "print('Hello from HF compute!')"
648+
```
649+
650+
This command runs the job and shows the logs. You can pass `--detach` to run the Job in the background and only print the Job ID.
651+
652+
#### 2. Check job status
653+
654+
```bash
655+
# List your running jobs
656+
>>> huggingface-cli jobs ps
657+
658+
# Inspect the status of a job
659+
>>> huggingface-cli jobs inspect <job_id>
660+
661+
# View logs from a job
662+
>>> huggingface-cli jobs logs <job_id>
663+
664+
# Cancel a job
665+
>>> huggingface-cli jobs cancel <job_id>
666+
```
667+
668+
#### 3. Run on GPU
669+
670+
You can also run jobs on GPUs or TPUs with the `--flavor` option. For example, to run a PyTorch job on an A10G GPU:
671+
672+
```bash
673+
# Use an A10G GPU to check PyTorch CUDA
674+
>>> huggingface-cli jobs run --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
675+
... python -c "import torch; print(f"This code ran with the following GPU: {torch.cuda.get_device_name()}")"
676+
```
677+
678+
Running this will show the following output!
679+
680+
```bash
681+
This code ran with the following GPU: NVIDIA A10G
682+
```
683+
684+
That's it! You're now running code on Hugging Face's infrastructure.
685+
686+
### Common Use Cases
687+
688+
- **Model Training**: Fine-tune or train models on GPUs (T4, A10G, A100) without managing infrastructure
689+
- **Synthetic Data Generation**: Generate large-scale datasets using LLMs on powerful hardware
690+
- **Data Processing**: Process massive datasets with high-CPU configurations for parallel workloads
691+
- **Batch Inference**: Run offline inference on thousands of samples using optimized GPU setups
692+
- **Experiments & Benchmarks**: Run ML experiments on consistent hardware for reproducible results
693+
- **Development & Debugging**: Test GPU code without local CUDA setup
694+
695+
### Pass Environment variables and Secrets
696+
697+
You can pass environment variables to your job using
698+
699+
```bash
700+
# Pass environment variables
701+
>>> huggingface-cli jobs run -e FOO=foo -e BAR=bar python:3.12 python -c "import os; print(os.environ['FOO'], os.environ['BAR'])"
702+
```
703+
704+
```bash
705+
# Pass an environment from a local .env file
706+
>>> huggingface-cli jobs run --env-file .env python:3.12 python -c "import os; print(os.environ['FOO'], os.environ['BAR'])"
707+
```
708+
709+
```bash
710+
# Pass secrets - they will be encrypted server side
711+
>>> huggingface-cli jobs run -s MY_SECRET=psswrd python:3.12 python -c "import os; print(os.environ['MY_SECRET'])"
712+
```
713+
714+
```bash
715+
# Pass secrets from a local .env.secrets file - they will be encrypted server side
716+
>>> huggingface-cli jobs run --secrets-file .env.secrets python:3.12 python -c "import os; print(os.environ['MY_SECRET'])"
717+
```
718+
719+
### Hardware
720+
721+
Available `--flavor` options:
722+
723+
- CPU: `cpu-basic`, `cpu-upgrade`
724+
- GPU: `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`,`a100-large`
725+
- TPU: `v5e-1x1`, `v5e-2x2`, `v5e-2x4`
726+
727+
(updated in 07/2025 from Hugging Face [suggested_hardware docs](https://huggingface.co/docs/hub/en/spaces-config-reference))
728+
729+
### UV Scripts (Experimental)
730+
731+
Run UV scripts (Python scripts with inline dependencies) on HF infrastructure:
732+
733+
```bash
734+
# Run a UV script (creates temporary repo)
735+
>>> huggingface-cli jobs uv run my_script.py
736+
737+
# Run with persistent repo
738+
>>> huggingface-cli jobs uv run my_script.py --repo my-uv-scripts
739+
740+
# Run with GPU
741+
>>> huggingface-cli jobs uv run ml_training.py --flavor gpu-t4-small
742+
743+
# Pass arguments to script
744+
>>> huggingface-cli jobs uv run process.py input.csv output.parquet --repo data-scripts
745+
746+
# Run a script directly from a URL
747+
>>> huggingface-cli jobs uv run https://huggingface.co/datasets/username/scripts/resolve/main/example.py
748+
```
749+
750+
UV scripts are Python scripts that include their dependencies directly in the file using a special comment syntax. This makes them perfect for self-contained tasks that don't require complex project setups. Learn more about UV scripts in the [UV documentation](https://docs.astral.sh/uv/guides/scripts/).

docs/source/en/guides/jobs.md

Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
<!--⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
2+
rendered properly in your Markdown viewer.
3+
-->
4+
# Run and manage Jobs
5+
6+
The Hugging Face Hub provides compute for AI and data workflows via Jobs.
7+
A job runs on Hugging Face infrastructure and are defined with a command to run (e.g. a python command), a Docker Image from Hugging Face Spaces or Docker Hub, and a hardware flavor (CPU, GPU, TPU). This guide will show you how to interact with Jobs on the Hub, especially:
8+
9+
- Run a job.
10+
- Check job status.
11+
- Select the hardware.
12+
- Configure environment variables and secrets.
13+
- Run UV scripts.
14+
15+
If you want to run and manage a job on the Hub, your machine must be logged in. If you are not, please refer to
16+
[this section](../quick-start#authentication). In the rest of this guide, we will assume that your machine is logged in.
17+
18+
## Run a Job
19+
20+
Run compute Jobs defined with a command and a Docker Image on Hugging Face infrastructure (including GPUs and TPUs).
21+
22+
You can only manage Jobs that you own (under your username namespace) or from organizations in which you have write permissions.
23+
This feature is pay-as-you-go: you only pay for the seconds you use.
24+
25+
[`run_job`] lets you run any command on Hugging Face's infrastructure:
26+
27+
```python
28+
# Directly run Python code
29+
>>> from huggingface_hub import run_job
30+
>>> run_job(
31+
... image="python:3.12",
32+
... command=["python", "-c", "print('Hello from the cloud!')"],
33+
... )
34+
35+
# Use GPUs without any setup
36+
>>> run_job(
37+
... image="pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
38+
... command=["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
39+
... flavor="a10g-small",
40+
... )
41+
42+
# Run in an organization account
43+
>>> run_job(
44+
... image="python:3.12",
45+
... command=["python", "-c", "print('Running in an org account')"],
46+
... namespace="my-org-name",
47+
... )
48+
49+
# Run from Hugging Face Spaces
50+
>>> run_job(
51+
... image="hf.co/spaces/lhoestq/duckdb",
52+
... command=["duckdb", "-c", "select 'hello world'"],
53+
... )
54+
55+
# Run a Python script with `uv` (experimental)
56+
>>> from huggingface_hub import run_uv_job
57+
>>> run_uv_job("my_script.py")
58+
```
59+
60+
<Tip>
61+
62+
Use [huggingface-cli jobs](./cli#huggingface-cli-jobs) to run jobs in the command line.
63+
64+
</Tip>
65+
66+
[`run_job`] returns the [`JobInfo`] which has the URL of the Job on Hugging Face, where you can see the Job status and the logs.
67+
Save the Job ID from [`JobInfo`] to manage the job:
68+
69+
```python
70+
>>> from huggingface_hub import run_job
71+
>>> job = run_job(
72+
... image="python:3.12",
73+
... command=["python", "-c", "print('Hello from the cloud!')"]
74+
... )
75+
>>> job.url
76+
https://huggingface.co/jobs/lhoestq/687f911eaea852de79c4a50a
77+
>>> job.id
78+
687f911eaea852de79c4a50a
79+
```
80+
81+
Jobs run in the background. The next section guides you through [`inspect_job`] to know a jobs' status and [`fetch_job_logs`] to view the logs.
82+
83+
## Check Job status
84+
85+
```python
86+
# List your jobs
87+
>>> from huggingface_hub import list_jobs
88+
>>> jobs = list_jobs()
89+
>>> jobs[0]
90+
JobInfo(id='687f911eaea852de79c4a50a', created_at=datetime.datetime(2025, 7, 22, 13, 24, 46, 909000, tzinfo=datetime.timezone.utc), docker_image='python:3.12', space_id=None, command=['python', '-c', "print('Hello from the cloud!')"], arguments=[], environment={}, secrets={}, flavor='cpu-basic', status=JobStatus(stage='COMPLETED', message=None), owner=JobOwner(id='5e9ecfc04957053f60648a3e', name='lhoestq'), endpoint='https://huggingface.co', url='https://huggingface.co/jobs/lhoestq/687f911eaea852de79c4a50a')
91+
92+
# List your running jobs
93+
>>> running_jobs = [job for job in list_jobs() if job.status.stage == "RUNNING"]
94+
95+
# Inspect the status of a job
96+
>>> from huggingface_hub import inspect_job
97+
>>> inspect_job(job_id=job_id)
98+
JobInfo(id='687f911eaea852de79c4a50a', created_at=datetime.datetime(2025, 7, 22, 13, 24, 46, 909000, tzinfo=datetime.timezone.utc), docker_image='python:3.12', space_id=None, command=['python', '-c', "print('Hello from the cloud!')"], arguments=[], environment={}, secrets={}, flavor='cpu-basic', status=JobStatus(stage='COMPLETED', message=None), owner=JobOwner(id='5e9ecfc04957053f60648a3e', name='lhoestq'), endpoint='https://huggingface.co', url='https://huggingface.co/jobs/lhoestq/687f911eaea852de79c4a50a')
99+
100+
# View logs from a job
101+
>>> from huggingface_hub import fetch_job_logs
102+
>>> for log in fetch_job_logs(job_id=job_id):
103+
... print(log)
104+
Hello from the cloud!
105+
106+
# Cancel a job
107+
>>> from huggingface_hub import cancel_job
108+
>>> cancel_job(job_id=job_id)
109+
```
110+
111+
Check the status of multiple jobs to know when they're all finished using a loop and [`inspect_job`]:
112+
113+
```python
114+
# Run multiple jobs in parallel and wait for their completions
115+
>>> import time
116+
>>> from huggingface_hub import inspect_job, run_job
117+
>>> jobs = [run_job(image=image, command=command) for command in commands]
118+
>>> for job in jobs:
119+
... while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"):
120+
... time.sleep(10)
121+
```
122+
123+
## Select the hardware
124+
125+
There are numerous cases where running Jobs on GPUs are useful:
126+
127+
- **Model Training**: Fine-tune or train models on GPUs (T4, A10G, A100) without managing infrastructure
128+
- **Synthetic Data Generation**: Generate large-scale datasets using LLMs on powerful hardware
129+
- **Data Processing**: Process massive datasets with high-CPU configurations for parallel workloads
130+
- **Batch Inference**: Run offline inference on thousands of samples using optimized GPU setups
131+
- **Experiments & Benchmarks**: Run ML experiments on consistent hardware for reproducible results
132+
- **Development & Debugging**: Test GPU code without local CUDA setup
133+
134+
Run jobs on GPUs or TPUs with the `flavor` argument. For example, to run a PyTorch job on an A10G GPU:
135+
136+
```python
137+
# Use an A10G GPU to check PyTorch CUDA
138+
>>> from huggingface_hub import run_job
139+
>>> run_job(
140+
... image="pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
141+
... command=["python", "-c", "import torch; print(f'This code ran with the following GPU: {torch.cuda.get_device_name()}')"],
142+
... flavor="a10g-small",
143+
... )
144+
```
145+
146+
Running this will show the following output!
147+
148+
```bash
149+
This code ran with the following GPU: NVIDIA A10G
150+
```
151+
152+
Use this to run a fine tuning script like [trl/scripts/sft.py](https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py) with UV:
153+
154+
```python
155+
>>> from huggingface_hub import run_uv_job
156+
>>> run_uv_job(
157+
... "sft.py",
158+
... script_args=["--model_name_or_path", "Qwen/Qwen2-0.5B", ...],
159+
... dependencies=["trl"],
160+
... env={"HF_TOKEN": ...},
161+
... flavor="a10g-small",
162+
... )
163+
```
164+
165+
Available `flavor` options:
166+
167+
- CPU: `cpu-basic`, `cpu-upgrade`
168+
- GPU: `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`,`a100-large`
169+
- TPU: `v5e-1x1`, `v5e-2x2`, `v5e-2x4`
170+
171+
(updated in 07/2025 from Hugging Face [suggested_hardware docs](https://huggingface.co/docs/hub/en/spaces-config-reference))
172+
173+
That's it! You're now running code on Hugging Face's infrastructure.
174+
175+
## Pass Environment variables and Secrets
176+
177+
You can pass environment variables to your job using `env` and `secrets`:
178+
179+
```python
180+
# Pass environment variables
181+
>>> from huggingface_hub import run_job
182+
>>> run_job(
183+
... image="python:3.12",
184+
... command=["python", "-c", "import os; print(os.environ['FOO'], os.environ['BAR'])"],
185+
... env={"FOO": "foo", "BAR": "bar"},
186+
... )
187+
```
188+
189+
190+
```python
191+
# Pass secrets - they will be encrypted server side
192+
>>> from huggingface_hub import run_job
193+
>>> run_job(
194+
... image="python:3.12",
195+
... command=["python", "-c", "import os; print(os.environ['MY_SECRET'])"],
196+
... secrets={"MY_SECRET": "psswrd"},
197+
... )
198+
```
199+
200+
201+
### UV Scripts (Experimental)
202+
203+
Run UV scripts (Python scripts with inline dependencies) on HF infrastructure:
204+
205+
```python
206+
# Run a UV script (creates temporary repo)
207+
>>> from huggingface_hub import run_uv_job
208+
>>> run_uv_job("my_script.py")
209+
210+
# Run with GPU
211+
>>> run_uv_job("ml_training.py", flavor="gpu-t4-small")
212+
213+
# Run with dependencies
214+
>>> run_uv_job("inference.py", dependencies=["transformers", "torch"])
215+
216+
# Run a script directly from a URL
217+
>>> run_uv_job("https://huggingface.co/datasets/username/scripts/resolve/main/example.py")
218+
```
219+
220+
UV scripts are Python scripts that include their dependencies directly in the file using a special comment syntax. This makes them perfect for self-contained tasks that don't require complex project setups. Learn more about UV scripts in the [UV documentation](https://docs.astral.sh/uv/guides/scripts/).

docs/source/en/guides/overview.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,5 +127,14 @@ Take a look at these guides to learn how to use huggingface_hub to solve real-wo
127127
</p>
128128
</a>
129129

130+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg"
131+
href="./jobs">
132+
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">
133+
Jobs
134+
</div><p class="text-gray-700">
135+
How to run and manage compute Jobs on Hugging Face infrastructure and select the hardware?
136+
</p>
137+
</a>
138+
130139
</div>
131140
</div>

0 commit comments

Comments
 (0)