-
Notifications
You must be signed in to change notification settings - Fork 154
Description
What happened?
Bug Description
In KubernetesBackend.wait_for_job_status():
if polling_interval > timeout:
raise ValueError(
f"Polling interval {polling_interval} must be less than timeout: {timeout}"
)When polling_interval == timeout, this guard passes. But:
round(timeout / polling_interval) # round(10/10) = 1The job is polled exactly once with no retry. The error message says
"must be less than" but the code allows equal values — a contradiction.
Steps to Reproduce
client = TrainerClient()
client.wait_for_job_status("my-job", timeout=10, polling_interval=10)
# Passes validation but only polls once — silent wrong behaviorExpected Behavior
ValueError raised when polling_interval >= timeout.
Proposed Fix
Change > to >= in the validation guard. One-line fix.
What did you expect to happen?
ValueError raised when polling_interval >= timeout, matching the documented constraint "must be less than timeout".
Environment
Kubernetes version:
$ kubectl version
Kubeflow Trainer version:
$ kubectl get pods -n kubeflow -l app.kubernetes.io/name=trainer -o jsonpath="{.items[*].spec.containers[*].image}"
Kubeflow Python SDK version:
$ pip show kubeflow
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
Reactions are currently unavailable