Skip to content

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Aug 14, 2025

I went with this syntax, let me know what you think:

hf jobs scheduled run @hourly ubuntu echo hello world
hf jobs scheduled run "0 * * * *" ubuntu echo hello world
hf jobs scheduled ps -a
hf jobs scheduled inspect <id>
hf jobs scheduled delete <id>
hf jobs scheduled suspend <id>
hf jobs scheduled resume <id>
hf jobs scheduled uv run @weekly train.py

and

schedule_job(image="ubuntu", command=["echo", "hello world"], schedule="@hourly")
schedule_job(image="ubuntu", command=["echo", "hello world"], schedule="0 * * * *")
list_scheduled_jobs()
inspect_scheduled_job(scheduled_job_id)
delete_scheduled_job(scheduled_job_id)
suspend_scheduled_job(scheduled_job_id)
resume_scheduled_job(scheduled_job_id)
schedule_uv_job("train.py", schedule="@weekly")

TODO:

  • tests
  • docs

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@lhoestq lhoestq marked this pull request as ready for review August 20, 2025 13:41
@lhoestq lhoestq requested a review from Wauplin August 20, 2025 13:41
Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax-wise, lgtm

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I did a first round of review with a couple of comments, mostly nits.
Quick UX question: what's the advantage of adding a dedicated scheduled subcommand vs allowing scheduling directly on hf jobs run with a --schedule flag, e.g:

hf jobs run --schedule @hourly python:3.12 python -c 'print("This runs every hour!")'

not a strong opinion on that but i think this allows to have one consistent way to use and think about hf jobs: call hf jobs run for both immediate and recurring runs, and just toggle --schedule when you want it scheduled.

repo_type="dataset",
).oid

script_url = f"https://huggingface.co/datasets/{repo_id}/resolve/{commit_hash}/{filename}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
script_url = f"https://huggingface.co/datasets/{repo_id}/resolve/{commit_hash}/{filename}"
script_url = f"{self.endpoint}/datasets/{repo_id}/resolve/{commit_hash}/{filename}"

).oid

script_url = f"https://huggingface.co/datasets/{repo_id}/resolve/{commit_hash}/{filename}"
repo_url = f"https://huggingface.co/datasets/{repo_id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
repo_url = f"https://huggingface.co/datasets/{repo_id}"
repo_url = f"{self.endpoint}/datasets/{repo_id}"

inspect_scheduled_job = api.inspect_scheduled_job
delete_scheduled_job = api.delete_scheduled_job
suspend_scheduled_job = api.suspend_scheduled_job
resume_scheduled_job = api.suspend_scheduled_job
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resume_scheduled_job = api.suspend_scheduled_job
resume_scheduled_job = api.resume_scheduled_job

Comment on lines +10874 to +10877
repo_id = _repo
if "/" not in repo_id:
repo_id = f"{namespace}/{repo_id}"
repo_id = _repo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last repo_id = _repo undoes the namespace prefixing

Suggested change
repo_id = _repo
if "/" not in repo_id:
repo_id = f"{namespace}/{repo_id}"
repo_id = _repo
repo_id = _repo
if "/" not in repo_id:
repo_id = f"{namespace}/{repo_id}"

Refer to: https://huggingface.co/docs/huggingface_hub/quick-start#authentication.
"""
if namespace is None:
namespace = whoami(token=token)["name"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace = whoami(token=token)["name"]
namespace = self.whoami(token=token)["name"]

last_job = kwargs.get("lastJob") or kwargs.get("last_job")
self.last_job = LastJobInfo(**last_job) if last_job else None
next_job_run_at = kwargs.get("nextJobRunAt") or kwargs.get("next_job_run_at")
self.next_job_run_at = parse_datetime(str(next_job_run_at))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.next_job_run_at = parse_datetime(str(next_job_run_at))
self.next_job_run_at = parse_datetime(str(next_job_run_at)) if next_job_run_at else None

f"{self.endpoint}/api/scheduled-jobs/{namespace}/{scheduled_job_id}",
headers=self._build_hf_headers(token=token),
)
response.raise_for_status()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
response.raise_for_status()
hf_raise_for_status(response)

>>> hf jobs scheduled run "*/5 * * * *" python:3.12 python -c 'print("This runs every 5 minutes!")'

# Schedule with GPU
>>> hf jobs schefuled run @hourly --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> hf jobs schefuled run @hourly --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \
>>> hf jobs scheduled run @hourly --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \

One of "@annually", "@yearly", "@monthly", "@weekly", "@daily", "@hourly", or a
CRON schedule expression (e.g., '0 9 * * 1' for 9 AM every Monday).
suspend (`bool` or `None`):
Whether the the scheduled job is suspended (paused).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Whether the the scheduled job is suspended (paused).
Whether the scheduled job is suspended (paused).

@@ -10453,6 +10453,525 @@ def run_uv_job(
token=token,
)

def schedule_job(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal preference here, "create" is explicit and pairs naturally with list, inspect and delete

Suggested change
def schedule_job(
def create_scheduled_job(

@Wauplin
Copy link
Contributor

Wauplin commented Aug 28, 2025

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @lhoestq!

I find it a bit sad to have 2 different routes to create/update Jobs and Schedule Jobs as it looks to me to be pretty much the same API with additional parameters. Let's keep it as it is since we are mimicking server API but at least let's factorize some logic to generate the payloads as there is quite some custom logic (especially uv scripts and timeoutSeconds)

Comment on lines +10462 to +10463
suspend: bool = False,
concurrency: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
suspend: bool = False,
concurrency: bool = False,
suspend: Optional[bool] = None,
concurrency: Optional[bool] = None,

Better to default to None and pass a value in the body only if non-none value is passed. This way we let the server handle the default value

Comment on lines +10487 to +10488
suspend (`bool`, *optional*):
If True, the scheduled Job is suspended (paused).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean one can create a cron job and pause it immediately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep exactly

If True, the scheduled Job is suspended (paused).

concurrency (`bool`, *optional*):
If True, multiple instances of this Job can run concurrently.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If True, multiple instances of this Job can run concurrently.
If True, multiple instances of this Job can run concurrently. Defaults to False.

flavor = SpaceHardware.CPU_BASIC

# prepare job spec to send to HF Jobs API
job_spec: Dict[str, Any] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to have a centralized _create_job_spec. I feel that it's a 1:1 duplicate of run_job implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same comment for the create uv schedule one)

Comment on lines +10686 to +10689
get_session().delete(
f"{self.endpoint}/api/scheduled-jobs/{namespace}/{scheduled_job_id}",
headers=self._build_hf_headers(token=token),
).raise_for_status()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
get_session().delete(
f"{self.endpoint}/api/scheduled-jobs/{namespace}/{scheduled_job_id}",
headers=self._build_hf_headers(token=token),
).raise_for_status()
response = get_session().delete(
f"{self.endpoint}/api/scheduled-jobs/{namespace}/{scheduled_job_id}",
headers=self._build_hf_headers(token=token),
)
hf_raise_for_status(response)

@lhoestq
Copy link
Member Author

lhoestq commented Sep 1, 2025

Quick UX question: what's the advantage of adding a dedicated scheduled subcommand vs allowing scheduling directly on hf jobs run with a --schedule flag, e.g:

I find it a bit sad to have 2 different routes to create/update Jobs and Schedule Jobs as it looks to me to be pretty much the same API with additional parameters.

Well hf jobs run has --detach which doesn't make sense for scheduled jobs. But still I'm not against hf jobs run --schedule ... since it feels natural anyways.

I originally had scheduled subcommand for the delete/suspend/resume commands which only exist for scheduled jobs, and also because inspect/ps don't return the same format.

Here is another suggestion using --schedule and without the subcommand. Note that delete/suspend/resume are only for scheduled jobs, and ps/inspect also includes scheduled jobs:

hf jobs run --schedule @hourly ubuntu echo hello world
hf jobs run --schedule "0 * * * *" ubuntu echo hello world
hf jobs ps
hf jobs inspect <id>
hf jobs delete <id>
hf jobs suspend <id>
hf jobs resume <id>
hf jobs uv run --schedule @weekly train.py

notes:

  • for inspect I can auto-infer if it's a job or a scheduled job
  • for ps I was thinking of merging the lists of jobs and scheduled jobs (instead of having two distinct lists) - and use the STATUS column for the status of the last job in case of a schedule. Let me know what you think !
ID                       IMAGE  COMMAND  CREATED             STATUS    SCHEDULE
------------------------ ------ -------- ------------------- --------- -----------
68b577d0c3308bfc7c640e72 ubuntu echo hey 2025-09-01 10:38:19 READY     @hourly
68b5777f37a21629b868b005 ubuntu echo hey 2025-09-01 10:38:24 RUNNING   * * * * *
68a6ee24716242753cc8591e ubuntu echo hey 2025-09-01 10:38:37 COMPLETED */5 * * * *
68b57787c3308bfc7c640e71 ubuntu echo hey 2025-09-01 10:45:59 RUNNING

@julien-c
Copy link
Member

julien-c commented Sep 1, 2025

no strong opinion on the --schedule flag thing. I'm not 100% sure this will be hugely used anyways so maybe we ship like this?

@hanouticelina
Copy link
Contributor

i'm okay to ship like this! let's keep hf jobs scheduled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants