|
| 1 | +<!--⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be |
| 2 | +rendered properly in your Markdown viewer. |
| 3 | +--> |
| 4 | +# Run and manage Jobs |
| 5 | + |
| 6 | +The Hugging Face Hub provides compute for AI and data workflows via Jobs. |
| 7 | +A job runs on Hugging Face infrastructure and are defined with a command to run (e.g. a python command), a Docker Image from Hugging Face Spaces or Docker Hub, and a hardware flavor (CPU, GPU, TPU). This guide will show you how to interact with Jobs on the Hub, especially: |
| 8 | + |
| 9 | +- Run a job. |
| 10 | +- Check job status. |
| 11 | +- Select the hardware. |
| 12 | +- Configure environment variables and secrets. |
| 13 | +- Run UV scripts. |
| 14 | + |
| 15 | +If you want to run and manage a job on the Hub, your machine must be logged in. If you are not, please refer to |
| 16 | +[this section](../quick-start#authentication). In the rest of this guide, we will assume that your machine is logged in. |
| 17 | + |
| 18 | +## Run a Job |
| 19 | + |
| 20 | +Run compute Jobs defined with a command and a Docker Image on Hugging Face infrastructure (including GPUs and TPUs). |
| 21 | + |
| 22 | +You can only manage Jobs that you own (under your username namespace) or from organizations in which you have write permissions. |
| 23 | +This feature is pay-as-you-go: you only pay for the seconds you use. |
| 24 | + |
| 25 | +[`run_job`] lets you run any command on Hugging Face's infrastructure: |
| 26 | + |
| 27 | +```python |
| 28 | +# Directly run Python code |
| 29 | +>>> from huggingface_hub import run_job |
| 30 | +>>> run_job( |
| 31 | +... image="python:3.12", |
| 32 | +... command=["python", "-c", "print('Hello from the cloud!')"], |
| 33 | +... ) |
| 34 | + |
| 35 | +# Use GPUs without any setup |
| 36 | +>>> run_job( |
| 37 | +... image="pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel", |
| 38 | +... command=["python", "-c", "import torch; print(torch.cuda.get_device_name())"], |
| 39 | +... flavor="a10g-small", |
| 40 | +... ) |
| 41 | + |
| 42 | +# Run in an organization account |
| 43 | +>>> run_job( |
| 44 | +... image="python:3.12", |
| 45 | +... command=["python", "-c", "print('Running in an org account')"], |
| 46 | +... namespace="my-org-name", |
| 47 | +... ) |
| 48 | + |
| 49 | +# Run from Hugging Face Spaces |
| 50 | +>>> run_job( |
| 51 | +... image="hf.co/spaces/lhoestq/duckdb", |
| 52 | +... command=["duckdb", "-c", "select 'hello world'"], |
| 53 | +... ) |
| 54 | + |
| 55 | +# Run a Python script with `uv` (experimental) |
| 56 | +>>> from huggingface_hub import run_uv_job |
| 57 | +>>> run_uv_job("my_script.py") |
| 58 | +``` |
| 59 | + |
| 60 | +<Tip> |
| 61 | + |
| 62 | +Use [huggingface-cli jobs](./cli#huggingface-cli-jobs) to run jobs in the command line. |
| 63 | + |
| 64 | +</Tip> |
| 65 | + |
| 66 | +[`run_job`] returns the [`JobInfo`] which has the URL of the Job on Hugging Face, where you can see the Job status and the logs. |
| 67 | +Save the Job ID from [`JobInfo`] to manage the job: |
| 68 | + |
| 69 | +```python |
| 70 | +>>> from huggingface_hub import run_job |
| 71 | +>>> job = run_job( |
| 72 | +... image="python:3.12", |
| 73 | +... command=["python", "-c", "print('Hello from the cloud!')"] |
| 74 | +... ) |
| 75 | +>>> job.url |
| 76 | +https://huggingface.co/jobs/lhoestq/687f911eaea852de79c4a50a |
| 77 | +>>> job.id |
| 78 | +687f911eaea852de79c4a50a |
| 79 | +``` |
| 80 | + |
| 81 | +Jobs run in the background. The next section guides you through [`inspect_job`] to know a jobs' status and [`fetch_job_logs`] to view the logs. |
| 82 | + |
| 83 | +## Check Job status |
| 84 | + |
| 85 | +```python |
| 86 | +# List your jobs |
| 87 | +>>> from huggingface_hub import list_jobs |
| 88 | +>>> jobs = list_jobs() |
| 89 | +>>> jobs[0] |
| 90 | +JobInfo(id='687f911eaea852de79c4a50a', created_at=datetime.datetime(2025, 7, 22, 13, 24, 46, 909000, tzinfo=datetime.timezone.utc), docker_image='python:3.12', space_id=None, command=['python', '-c', "print('Hello from the cloud!')"], arguments=[], environment={}, secrets={}, flavor='cpu-basic', status=JobStatus(stage='COMPLETED', message=None), owner=JobOwner(id='5e9ecfc04957053f60648a3e', name='lhoestq'), endpoint='https://huggingface.co', url='https://huggingface.co/jobs/lhoestq/687f911eaea852de79c4a50a') |
| 91 | + |
| 92 | +# List your running jobs |
| 93 | +>>> running_jobs = [job for job in list_jobs() if job.status.stage == "RUNNING"] |
| 94 | + |
| 95 | +# Inspect the status of a job |
| 96 | +>>> from huggingface_hub import inspect_job |
| 97 | +>>> inspect_job(job_id=job_id) |
| 98 | +JobInfo(id='687f911eaea852de79c4a50a', created_at=datetime.datetime(2025, 7, 22, 13, 24, 46, 909000, tzinfo=datetime.timezone.utc), docker_image='python:3.12', space_id=None, command=['python', '-c', "print('Hello from the cloud!')"], arguments=[], environment={}, secrets={}, flavor='cpu-basic', status=JobStatus(stage='COMPLETED', message=None), owner=JobOwner(id='5e9ecfc04957053f60648a3e', name='lhoestq'), endpoint='https://huggingface.co', url='https://huggingface.co/jobs/lhoestq/687f911eaea852de79c4a50a') |
| 99 | + |
| 100 | +# View logs from a job |
| 101 | +>>> from huggingface_hub import fetch_job_logs |
| 102 | +>>> for log in fetch_job_logs(job_id=job_id): |
| 103 | +... print(log) |
| 104 | +Hello from the cloud! |
| 105 | + |
| 106 | +# Cancel a job |
| 107 | +>>> from huggingface_hub import cancel_job |
| 108 | +>>> cancel_job(job_id=job_id) |
| 109 | +``` |
| 110 | + |
| 111 | +Check the status of multiple jobs to know when they're all finished using a loop and [`inspect_job`]: |
| 112 | + |
| 113 | +```python |
| 114 | +# Run multiple jobs in parallel and wait for their completions |
| 115 | +>>> import time |
| 116 | +>>> from huggingface_hub import inspect_job, run_job |
| 117 | +>>> jobs = [run_job(image=image, command=command) for command in commands] |
| 118 | +>>> for job in jobs: |
| 119 | +... while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"): |
| 120 | +... time.sleep(10) |
| 121 | +``` |
| 122 | + |
| 123 | +## Select the hardware |
| 124 | + |
| 125 | +There are numerous cases where running Jobs on GPUs are useful: |
| 126 | + |
| 127 | +- **Model Training**: Fine-tune or train models on GPUs (T4, A10G, A100) without managing infrastructure |
| 128 | +- **Synthetic Data Generation**: Generate large-scale datasets using LLMs on powerful hardware |
| 129 | +- **Data Processing**: Process massive datasets with high-CPU configurations for parallel workloads |
| 130 | +- **Batch Inference**: Run offline inference on thousands of samples using optimized GPU setups |
| 131 | +- **Experiments & Benchmarks**: Run ML experiments on consistent hardware for reproducible results |
| 132 | +- **Development & Debugging**: Test GPU code without local CUDA setup |
| 133 | + |
| 134 | +Run jobs on GPUs or TPUs with the `flavor` argument. For example, to run a PyTorch job on an A10G GPU: |
| 135 | + |
| 136 | +```python |
| 137 | +# Use an A10G GPU to check PyTorch CUDA |
| 138 | +>>> from huggingface_hub import run_job |
| 139 | +>>> run_job( |
| 140 | +... image="pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel", |
| 141 | +... command=["python", "-c", "import torch; print(f'This code ran with the following GPU: {torch.cuda.get_device_name()}')"], |
| 142 | +... flavor="a10g-small", |
| 143 | +... ) |
| 144 | +``` |
| 145 | + |
| 146 | +Running this will show the following output! |
| 147 | + |
| 148 | +```bash |
| 149 | +This code ran with the following GPU: NVIDIA A10G |
| 150 | +``` |
| 151 | + |
| 152 | +Use this to run a fine tuning script like [trl/scripts/sft.py](https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py) with UV: |
| 153 | + |
| 154 | +```python |
| 155 | +>>> from huggingface_hub import run_uv_job |
| 156 | +>>> run_uv_job( |
| 157 | +... "sft.py", |
| 158 | +... script_args=["--model_name_or_path", "Qwen/Qwen2-0.5B", ...], |
| 159 | +... dependencies=["trl"], |
| 160 | +... env={"HF_TOKEN": ...}, |
| 161 | +... flavor="a10g-small", |
| 162 | +... ) |
| 163 | +``` |
| 164 | + |
| 165 | +Available `flavor` options: |
| 166 | + |
| 167 | +- CPU: `cpu-basic`, `cpu-upgrade` |
| 168 | +- GPU: `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`,`a100-large` |
| 169 | +- TPU: `v5e-1x1`, `v5e-2x2`, `v5e-2x4` |
| 170 | + |
| 171 | +(updated in 07/2025 from Hugging Face [suggested_hardware docs](https://huggingface.co/docs/hub/en/spaces-config-reference)) |
| 172 | + |
| 173 | +That's it! You're now running code on Hugging Face's infrastructure. |
| 174 | + |
| 175 | +## Pass Environment variables and Secrets |
| 176 | + |
| 177 | +You can pass environment variables to your job using `env` and `secrets`: |
| 178 | + |
| 179 | +```python |
| 180 | +# Pass environment variables |
| 181 | +>>> from huggingface_hub import run_job |
| 182 | +>>> run_job( |
| 183 | +... image="python:3.12", |
| 184 | +... command=["python", "-c", "import os; print(os.environ['FOO'], os.environ['BAR'])"], |
| 185 | +... env={"FOO": "foo", "BAR": "bar"}, |
| 186 | +... ) |
| 187 | +``` |
| 188 | + |
| 189 | + |
| 190 | +```python |
| 191 | +# Pass secrets - they will be encrypted server side |
| 192 | +>>> from huggingface_hub import run_job |
| 193 | +>>> run_job( |
| 194 | +... image="python:3.12", |
| 195 | +... command=["python", "-c", "import os; print(os.environ['MY_SECRET'])"], |
| 196 | +... secrets={"MY_SECRET": "psswrd"}, |
| 197 | +... ) |
| 198 | +``` |
| 199 | + |
| 200 | + |
| 201 | +### UV Scripts (Experimental) |
| 202 | + |
| 203 | +Run UV scripts (Python scripts with inline dependencies) on HF infrastructure: |
| 204 | + |
| 205 | +```python |
| 206 | +# Run a UV script (creates temporary repo) |
| 207 | +>>> from huggingface_hub import run_uv_job |
| 208 | +>>> run_uv_job("my_script.py") |
| 209 | + |
| 210 | +# Run with GPU |
| 211 | +>>> run_uv_job("ml_training.py", flavor="gpu-t4-small") |
| 212 | + |
| 213 | +# Run with dependencies |
| 214 | +>>> run_uv_job("inference.py", dependencies=["transformers", "torch"]) |
| 215 | + |
| 216 | +# Run a script directly from a URL |
| 217 | +>>> run_uv_job("https://huggingface.co/datasets/username/scripts/resolve/main/example.py") |
| 218 | +``` |
| 219 | + |
| 220 | +UV scripts are Python scripts that include their dependencies directly in the file using a special comment syntax. This makes them perfect for self-contained tasks that don't require complex project setups. Learn more about UV scripts in the [UV documentation](https://docs.astral.sh/uv/guides/scripts/). |
0 commit comments