-
Notifications
You must be signed in to change notification settings - Fork 154
Open
Description
What you would like to be added?
Support activeDeadlineSeconds in the Kubeflow SDK Trainer client API.
This capability is defined in KEP-2899 for Kubeflow Trainer:
https://github.com/XploY04/trainer/blob/1f3629c7713010b3abbbbe6c1aa170491f2421b7/docs/proposals/2899-resource-timeouts/README.md
Kubeflow Trainer supports activeDeadlineSeconds on TrainJob as of v2.2, and the SDK should expose this field directly so users can configure it through TrainerClient.
Why is this needed?
This allows automatic termination of TrainJobs when the configured deadline is reached, including cleanup of the underlying JobSet. It helps prevent runaway workloads and improves CPU/GPU utilization and overall cluster efficiency.
Love this feature?
Give it a 👍 We prioritize the features with most 👍
Reactions are currently unavailable