Skip to content

Kubernetes: fail fast if job pod was not scheduled#3874

Open
un-def wants to merge 1 commit into
masterfrom
issue_3871_kubernetes_fail_fast_if_pod_is_unscheduled
Open

Kubernetes: fail fast if job pod was not scheduled#3874
un-def wants to merge 1 commit into
masterfrom
issue_3871_kubernetes_fail_fast_if_pod_is_unscheduled

Conversation

@un-def
Copy link
Copy Markdown
Collaborator

@un-def un-def commented May 12, 2026

After a job pod is created, wait and fail with ComputeError if the pod has either not been scheduled or has already finished (probably failed) within the scheduling timeout (10 seconds).

A new watch permission for pods in the namespace is required.

In addition, run_job() and terminate_instance() were refactored to clean up objects on failures.

Part-of: #3871

After a job pod is created, wait and fail with `ComputeError` if the
pod has either not been scheduled or has already finished (probably
failed) within the scheduling timeout (10 seconds).

A new `watch` permission for `pods` in the namespace is required.

In addition, `run_job()` and `terminate_instance()` were refactored
to clean up objects on failures.

Part-of: #3871
@un-def un-def requested a review from jvstme May 12, 2026 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant