Skip to content

Add support of active_deadline_seconds to kubeflow sdk #403

@XploY04

Description

@XploY04

What you would like to be added?

Support activeDeadlineSeconds in the Kubeflow SDK Trainer client API.
This capability is defined in KEP-2899 for Kubeflow Trainer:
https://github.com/XploY04/trainer/blob/1f3629c7713010b3abbbbe6c1aa170491f2421b7/docs/proposals/2899-resource-timeouts/README.md
Kubeflow Trainer supports activeDeadlineSeconds on TrainJob as of v2.2, and the SDK should expose this field directly so users can configure it through TrainerClient.

Why is this needed?

This allows automatic termination of TrainJobs when the configured deadline is reached, including cleanup of the underlying JobSet. It helps prevent runaway workloads and improves CPU/GPU utilization and overall cluster efficiency.

Love this feature?

Give it a 👍 We prioritize the features with most 👍

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions