-
Notifications
You must be signed in to change notification settings - Fork 863
Description
What you would like to be added?
Feature Request: Enhanced TrainJob Status Conditions
Currently, TrainJob only exposes three status conditions:
Suspended - when the TrainJob is suspended
Complete - when the TrainJob has completed successfully
Failed - when the TrainJob has failed
Proposed additions:
Add Running status condition when the underlying JobSet/Jobs are actively executing
Add Pending status condition when the TrainJob is created but not yet running (e.g., waiting for resources, scheduling, etc.)
Implementation details:
Extend the status condition constants in pkg/apis/trainer/v1alpha1/trainjob_types.go
Update the controller logic in pkg/controller/trainjob_controller.go to detect and set these intermediate states
Modify the TerminalCondition method in runtime plugins to also report non-terminal states
Update the +kubebuilder:printcolumn annotation to show these states in kubectl get trainjob output
Example of desired behavior:
$ kubectl get trainjob NAME STATE AGE my-trainjob Pending 30s my-trainjob Running 2m my-trainjob Complete 10m
Why is this needed?
User Experience and Operational Visibility:
Poor UX for monitoring: Currently, when a TrainJob is created, users see no status condition until it completes or fails. This creates confusion about whether the job is actually running or stuck.
Debugging difficulties: Without intermediate states, it's hard to distinguish between:
A job that's waiting for resources (should be Pending)
A job that's actively training (should be Running)
A job that's stuck due to configuration issues
Inconsistent with Kubernetes patterns: Most Kubernetes resources (Pods, Jobs, Deployments) expose intermediate states. TrainJob's current design breaks user expectations.
This enhancement would align TrainJob with standard Kubernetes resource patterns and significantly improve the user experience for ML practitioners using Kubeflow Trainer.
Love this feature?
Give it a 👍 We prioritize the features with most 👍