-
Notifications
You must be signed in to change notification settings - Fork 785
[Jobs] Add scheduled jobs api #3306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax-wise, lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I did a first round of review with a couple of comments, mostly nits.
Quick UX question: what's the advantage of adding a dedicated scheduled
subcommand vs allowing scheduling directly on hf jobs run
with a --schedule
flag, e.g:
hf jobs run --schedule @hourly python:3.12 python -c 'print("This runs every hour!")'
not a strong opinion on that but i think this allows to have one consistent way to use and think about hf jobs
: call hf jobs
run for both immediate and recurring runs, and just toggle --schedule
when you want it scheduled.
repo_type="dataset", | ||
).oid | ||
|
||
script_url = f"https://huggingface.co/datasets/{repo_id}/resolve/{commit_hash}/{filename}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
script_url = f"https://huggingface.co/datasets/{repo_id}/resolve/{commit_hash}/{filename}" | |
script_url = f"{self.endpoint}/datasets/{repo_id}/resolve/{commit_hash}/{filename}" |
).oid | ||
|
||
script_url = f"https://huggingface.co/datasets/{repo_id}/resolve/{commit_hash}/{filename}" | ||
repo_url = f"https://huggingface.co/datasets/{repo_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repo_url = f"https://huggingface.co/datasets/{repo_id}" | |
repo_url = f"{self.endpoint}/datasets/{repo_id}" |
inspect_scheduled_job = api.inspect_scheduled_job | ||
delete_scheduled_job = api.delete_scheduled_job | ||
suspend_scheduled_job = api.suspend_scheduled_job | ||
resume_scheduled_job = api.suspend_scheduled_job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resume_scheduled_job = api.suspend_scheduled_job | |
resume_scheduled_job = api.resume_scheduled_job |
repo_id = _repo | ||
if "/" not in repo_id: | ||
repo_id = f"{namespace}/{repo_id}" | ||
repo_id = _repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last repo_id = _repo
undoes the namespace prefixing
repo_id = _repo | |
if "/" not in repo_id: | |
repo_id = f"{namespace}/{repo_id}" | |
repo_id = _repo | |
repo_id = _repo | |
if "/" not in repo_id: | |
repo_id = f"{namespace}/{repo_id}" |
Refer to: https://huggingface.co/docs/huggingface_hub/quick-start#authentication. | ||
""" | ||
if namespace is None: | ||
namespace = whoami(token=token)["name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace = whoami(token=token)["name"] | |
namespace = self.whoami(token=token)["name"] |
last_job = kwargs.get("lastJob") or kwargs.get("last_job") | ||
self.last_job = LastJobInfo(**last_job) if last_job else None | ||
next_job_run_at = kwargs.get("nextJobRunAt") or kwargs.get("next_job_run_at") | ||
self.next_job_run_at = parse_datetime(str(next_job_run_at)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.next_job_run_at = parse_datetime(str(next_job_run_at)) | |
self.next_job_run_at = parse_datetime(str(next_job_run_at)) if next_job_run_at else None |
f"{self.endpoint}/api/scheduled-jobs/{namespace}/{scheduled_job_id}", | ||
headers=self._build_hf_headers(token=token), | ||
) | ||
response.raise_for_status() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response.raise_for_status() | |
hf_raise_for_status(response) |
>>> hf jobs scheduled run "*/5 * * * *" python:3.12 python -c 'print("This runs every 5 minutes!")' | ||
|
||
# Schedule with GPU | ||
>>> hf jobs schefuled run @hourly --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> hf jobs schefuled run @hourly --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \ | |
>>> hf jobs scheduled run @hourly --flavor a10g-small pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel \ |
One of "@annually", "@yearly", "@monthly", "@weekly", "@daily", "@hourly", or a | ||
CRON schedule expression (e.g., '0 9 * * 1' for 9 AM every Monday). | ||
suspend (`bool` or `None`): | ||
Whether the the scheduled job is suspended (paused). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether the the scheduled job is suspended (paused). | |
Whether the scheduled job is suspended (paused). |
@@ -10453,6 +10453,525 @@ def run_uv_job( | |||
token=token, | |||
) | |||
|
|||
def schedule_job( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personal preference here, "create" is explicit and pairs naturally with list, inspect and delete
def schedule_job( | |
def create_scheduled_job( |
(x-linking moon PR https://github.com/huggingface-internal/moon-landing/pull/14655) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @lhoestq!
I find it a bit sad to have 2 different routes to create/update Jobs and Schedule Jobs as it looks to me to be pretty much the same API with additional parameters. Let's keep it as it is since we are mimicking server API but at least let's factorize some logic to generate the payloads as there is quite some custom logic (especially uv scripts and timeoutSeconds
)
suspend: bool = False, | ||
concurrency: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suspend: bool = False, | |
concurrency: bool = False, | |
suspend: Optional[bool] = None, | |
concurrency: Optional[bool] = None, |
Better to default to None
and pass a value in the body only if non-none value is passed. This way we let the server handle the default value
suspend (`bool`, *optional*): | ||
If True, the scheduled Job is suspended (paused). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean one can create a cron job and pause it immediately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep exactly
If True, the scheduled Job is suspended (paused). | ||
|
||
concurrency (`bool`, *optional*): | ||
If True, multiple instances of this Job can run concurrently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If True, multiple instances of this Job can run concurrently. | |
If True, multiple instances of this Job can run concurrently. Defaults to False. |
flavor = SpaceHardware.CPU_BASIC | ||
|
||
# prepare job spec to send to HF Jobs API | ||
job_spec: Dict[str, Any] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to have a centralized _create_job_spec
. I feel that it's a 1:1 duplicate of run_job
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same comment for the create uv schedule one)
get_session().delete( | ||
f"{self.endpoint}/api/scheduled-jobs/{namespace}/{scheduled_job_id}", | ||
headers=self._build_hf_headers(token=token), | ||
).raise_for_status() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_session().delete( | |
f"{self.endpoint}/api/scheduled-jobs/{namespace}/{scheduled_job_id}", | |
headers=self._build_hf_headers(token=token), | |
).raise_for_status() | |
response = get_session().delete( | |
f"{self.endpoint}/api/scheduled-jobs/{namespace}/{scheduled_job_id}", | |
headers=self._build_hf_headers(token=token), | |
) | |
hf_raise_for_status(response) |
Well I originally had Here is another suggestion using
notes:
|
no strong opinion on the |
i'm okay to ship like this! let's keep |
I went with this syntax, let me know what you think:
and
TODO: