[Jobs] Add scheduled jobs api #3306

lhoestq · 2025-08-14T16:00:18Z

I went with this syntax, let me know what you think:

hf jobs scheduled run @hourly ubuntu echo hello world
hf jobs scheduled run "0 * * * *" ubuntu echo hello world
hf jobs scheduled ps -a
hf jobs scheduled inspect <id>
hf jobs scheduled delete <id>
hf jobs scheduled suspend <id>
hf jobs scheduled resume <id>
hf jobs scheduled uv run @weekly train.py

and

schedule_job(image="ubuntu", command=["echo", "hello world"], schedule="@hourly")
schedule_job(image="ubuntu", command=["echo", "hello world"], schedule="0 * * * *")
list_scheduled_jobs()
inspect_scheduled_job(scheduled_job_id)
delete_scheduled_job(scheduled_job_id)
suspend_scheduled_job(scheduled_job_id)
resume_scheduled_job(scheduled_job_id)
schedule_uv_job("train.py", schedule="@weekly")

TODO:

tests
docs

HuggingFaceDocBuilderDev · 2025-08-14T16:04:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

julien-c

syntax-wise, lgtm

hanouticelina

Thanks! I did a first round of review with a couple of comments, mostly nits.
Quick UX question: what's the advantage of adding a dedicated scheduled subcommand vs allowing scheduling directly on hf jobs run with a --schedule flag, e.g:

hf jobs run --schedule @hourly python:3.12 python -c 'print("This runs every hour!")'

not a strong opinion on that but i think this allows to have one consistent way to use and think about hf jobs: call hf jobs run for both immediate and recurring runs, and just toggle --schedule when you want it scheduled.

hanouticelina · 2025-08-28T08:57:30Z

src/huggingface_hub/hf_api.py

+                repo_type="dataset",
+            ).oid
+
+            script_url = f"https://huggingface.co/datasets/{repo_id}/resolve/{commit_hash}/{filename}"


Suggested change

script_url = f"https://huggingface.co/datasets/{repo_id}/resolve/{commit_hash}/{filename}"

script_url = f"{self.endpoint}/datasets/{repo_id}/resolve/{commit_hash}/{filename}"

hanouticelina · 2025-08-28T08:57:42Z

src/huggingface_hub/hf_api.py

+            ).oid
+
+            script_url = f"https://huggingface.co/datasets/{repo_id}/resolve/{commit_hash}/{filename}"
+            repo_url = f"https://huggingface.co/datasets/{repo_id}"


Suggested change

repo_url = f"https://huggingface.co/datasets/{repo_id}"

repo_url = f"{self.endpoint}/datasets/{repo_id}"

hanouticelina · 2025-08-28T08:58:36Z

src/huggingface_hub/hf_api.py

+inspect_scheduled_job = api.inspect_scheduled_job
+delete_scheduled_job = api.delete_scheduled_job
+suspend_scheduled_job = api.suspend_scheduled_job
+resume_scheduled_job = api.suspend_scheduled_job


Suggested change

resume_scheduled_job = api.suspend_scheduled_job

resume_scheduled_job = api.resume_scheduled_job

hanouticelina · 2025-08-28T08:59:58Z

src/huggingface_hub/hf_api.py

+                repo_id = _repo
+                if "/" not in repo_id:
+                    repo_id = f"{namespace}/{repo_id}"
+                repo_id = _repo


The last repo_id = _repo undoes the namespace prefixing

Suggested change

repo_id = _repo

if "/" not in repo_id:

repo_id = f"{namespace}/{repo_id}"

repo_id = _repo

repo_id = _repo

if "/" not in repo_id:

repo_id = f"{namespace}/{repo_id}"

hanouticelina · 2025-08-28T09:05:48Z

src/huggingface_hub/hf_api.py

+                Refer to: https://huggingface.co/docs/huggingface_hub/quick-start#authentication.
+        """
+        if namespace is None:
+            namespace = whoami(token=token)["name"]


Suggested change

namespace = whoami(token=token)["name"]

namespace = self.whoami(token=token)["name"]

hanouticelina · 2025-08-28T09:10:40Z

src/huggingface_hub/_jobs_api.py

+        last_job = kwargs.get("lastJob") or kwargs.get("last_job")
+        self.last_job = LastJobInfo(**last_job) if last_job else None
+        next_job_run_at = kwargs.get("nextJobRunAt") or kwargs.get("next_job_run_at")
+        self.next_job_run_at = parse_datetime(str(next_job_run_at))


Suggested change

self.next_job_run_at = parse_datetime(str(next_job_run_at))

self.next_job_run_at = parse_datetime(str(next_job_run_at)) if next_job_run_at else None

hanouticelina · 2025-08-28T09:11:49Z

src/huggingface_hub/hf_api.py

+            f"{self.endpoint}/api/scheduled-jobs/{namespace}/{scheduled_job_id}",
+            headers=self._build_hf_headers(token=token),
+        )
+        response.raise_for_status()


Suggested change

response.raise_for_status()

hf_raise_for_status(response)

hanouticelina · 2025-08-28T09:15:18Z

docs/source/en/guides/cli.md

+>>> hf jobs scheduled run "*/5 * * * *" python:3.12 python -c 'print("This runs every 5 minutes!")'
+
+# Schedule with GPU
+>>> hf jobs schefuled run @hourly --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \


Suggested change

>>> hf jobs schefuled run @hourly --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \

>>> hf jobs scheduled run @hourly --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \

hanouticelina · 2025-08-28T09:18:22Z

src/huggingface_hub/_jobs_api.py

+            One of "@annually", "@yearly", "@monthly", "@weekly", "@daily", "@hourly", or a
+            CRON schedule expression (e.g., '0 9 * * 1' for 9 AM every Monday).
+        suspend (`bool` or `None`):
+            Whether the the scheduled job is suspended (paused).


Suggested change

Whether the the scheduled job is suspended (paused).

Whether the scheduled job is suspended (paused).

hanouticelina · 2025-08-28T09:30:32Z

src/huggingface_hub/hf_api.py

@@ -10453,6 +10453,525 @@ def run_uv_job(
            token=token,
        )

+    def schedule_job(


personal preference here, "create" is explicit and pairs naturally with list, inspect and delete

Suggested change

def schedule_job(

def create_scheduled_job(

Wauplin · 2025-08-28T15:23:37Z

(x-linking moon PR https://github.com/huggingface-internal/moon-landing/pull/14655)

Wauplin

Thanks for working on this @lhoestq!

I find it a bit sad to have 2 different routes to create/update Jobs and Schedule Jobs as it looks to me to be pretty much the same API with additional parameters. Let's keep it as it is since we are mimicking server API but at least let's factorize some logic to generate the payloads as there is quite some custom logic (especially uv scripts and timeoutSeconds)

Wauplin · 2025-08-28T15:26:58Z

src/huggingface_hub/hf_api.py

+        suspend: bool = False,
+        concurrency: bool = False,


Suggested change

suspend: bool = False,

concurrency: bool = False,

suspend: Optional[bool] = None,

concurrency: Optional[bool] = None,

Better to default to None and pass a value in the body only if non-none value is passed. This way we let the server handle the default value

Wauplin · 2025-08-28T15:28:12Z

src/huggingface_hub/hf_api.py

+            suspend (`bool`, *optional*):
+                If True, the scheduled Job is suspended (paused).


Does that mean one can create a cron job and pause it immediately?

yep exactly

Wauplin · 2025-08-28T15:28:31Z

src/huggingface_hub/hf_api.py

+                If True, the scheduled Job is suspended (paused).
+
+            concurrency (`bool`, *optional*):
+                If True, multiple instances of this Job can run concurrently.


Suggested change

If True, multiple instances of this Job can run concurrently.

If True, multiple instances of this Job can run concurrently. Defaults to False.

Wauplin · 2025-08-28T15:30:30Z

src/huggingface_hub/hf_api.py

+            flavor = SpaceHardware.CPU_BASIC
+
+        # prepare job spec to send to HF Jobs API
+        job_spec: Dict[str, Any] = {


it would be good to have a centralized _create_job_spec. I feel that it's a 1:1 duplicate of run_job implementation.

(same comment for the create uv schedule one)

Wauplin · 2025-08-28T15:32:56Z

src/huggingface_hub/hf_api.py

+        get_session().delete(
+            f"{self.endpoint}/api/scheduled-jobs/{namespace}/{scheduled_job_id}",
+            headers=self._build_hf_headers(token=token),
+        ).raise_for_status()


Suggested change

get_session().delete(

f"{self.endpoint}/api/scheduled-jobs/{namespace}/{scheduled_job_id}",

headers=self._build_hf_headers(token=token),

).raise_for_status()

response = get_session().delete(

f"{self.endpoint}/api/scheduled-jobs/{namespace}/{scheduled_job_id}",

headers=self._build_hf_headers(token=token),

)

hf_raise_for_status(response)

lhoestq · 2025-09-01T10:27:43Z

Quick UX question: what's the advantage of adding a dedicated scheduled subcommand vs allowing scheduling directly on hf jobs run with a --schedule flag, e.g:

I find it a bit sad to have 2 different routes to create/update Jobs and Schedule Jobs as it looks to me to be pretty much the same API with additional parameters.

Well hf jobs run has --detach which doesn't make sense for scheduled jobs. But still I'm not against hf jobs run --schedule ... since it feels natural anyways.

I originally had scheduled subcommand for the delete/suspend/resume commands which only exist for scheduled jobs, and also because inspect/ps don't return the same format.

Here is another suggestion using --schedule and without the subcommand. Note that delete/suspend/resume are only for scheduled jobs, and ps/inspect also includes scheduled jobs:

hf jobs run --schedule @hourly ubuntu echo hello world
hf jobs run --schedule "0 * * * *" ubuntu echo hello world
hf jobs ps
hf jobs inspect <id>
hf jobs delete <id>
hf jobs suspend <id>
hf jobs resume <id>
hf jobs uv run --schedule @weekly train.py

notes:

for inspect I can auto-infer if it's a job or a scheduled job
for ps I was thinking of merging the lists of jobs and scheduled jobs (instead of having two distinct lists) - and use the STATUS column for the status of the last job in case of a schedule. Let me know what you think !

ID                       IMAGE  COMMAND  CREATED             STATUS    SCHEDULE
------------------------ ------ -------- ------------------- --------- -----------
68b577d0c3308bfc7c640e72 ubuntu echo hey 2025-09-01 10:38:19 READY     @hourly
68b5777f37a21629b868b005 ubuntu echo hey 2025-09-01 10:38:24 RUNNING   * * * * *
68a6ee24716242753cc8591e ubuntu echo hey 2025-09-01 10:38:37 COMPLETED */5 * * * *
68b57787c3308bfc7c640e71 ubuntu echo hey 2025-09-01 10:45:59 RUNNING

julien-c · 2025-09-01T11:50:49Z

no strong opinion on the --schedule flag thing. I'm not 100% sure this will be hugely used anyways so maybe we ship like this?

hanouticelina · 2025-09-01T11:59:47Z

i'm okay to ship like this! let's keep hf jobs scheduled

add scheduled jobs api

e5b2318

lhoestq added 10 commits August 14, 2025 18:06

update methods names

5b7da05

fixes

fcb664a

style

fe65620

add to top level

abcfc60

environment -> env in jobSpec

b7a82c7

env -> environment

75788f2

add to top level

9b3e6bb

fix missing @

64fc61f

docs

a940916

add test

0d1d2b2

lhoestq marked this pull request as ready for review August 20, 2025 13:41

lhoestq requested a review from Wauplin August 20, 2025 13:41

remove owner from jobSpec

8855f43

julien-c reviewed Aug 27, 2025

View reviewed changes

hanouticelina reviewed Aug 28, 2025

View reviewed changes

Wauplin reviewed Aug 28, 2025

View reviewed changes

	script_url = f"https://huggingface.co/datasets/{repo_id}/resolve/{commit_hash}/{filename}"
	script_url = f"{self.endpoint}/datasets/{repo_id}/resolve/{commit_hash}/{filename}"

	repo_url = f"https://huggingface.co/datasets/{repo_id}"
	repo_url = f"{self.endpoint}/datasets/{repo_id}"

	resume_scheduled_job = api.suspend_scheduled_job
	resume_scheduled_job = api.resume_scheduled_job

	namespace = whoami(token=token)["name"]
	namespace = self.whoami(token=token)["name"]

	self.next_job_run_at = parse_datetime(str(next_job_run_at))
	self.next_job_run_at = parse_datetime(str(next_job_run_at)) if next_job_run_at else None

	>>> hf jobs schefuled run @hourly --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
	>>> hf jobs scheduled run @hourly --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \

	Whether the the scheduled job is suspended (paused).
	Whether the scheduled job is suspended (paused).

		suspend (`bool`, optional):
		If True, the scheduled Job is suspended (paused).

	If True, multiple instances of this Job can run concurrently.
	If True, multiple instances of this Job can run concurrently. Defaults to False.

[Jobs] Add scheduled jobs api #3306

Are you sure you want to change the base?

[Jobs] Add scheduled jobs api #3306

Uh oh!

Conversation

lhoestq commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 14, 2025

Uh oh!

julien-c left a comment

Choose a reason for hiding this comment

Uh oh!

hanouticelina left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wauplin commented Aug 28, 2025

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lhoestq commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

julien-c commented Sep 1, 2025

Uh oh!

hanouticelina commented Sep 1, 2025

Uh oh!

Uh oh!

lhoestq commented Aug 14, 2025 •

edited

Loading

lhoestq commented Sep 1, 2025 •

edited

Loading