feat: wait for pod to be running when follow=True in get_job_logs #183
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements the behavior requested in issue #182.
Previously, trainer.get_job_logs(job_id, follow=True) exited immediately if the pod did not yet exist or was still pending. This made it difficult for users to follow logs immediately after submitting a job, because pods are usually created asynchronously.
What this PR adds
When follow=True, the backend now waits for the pod to be created and to leave the Pending state.
Added a simple polling loop with:
timeout: 120 seconds
poll interval: 2 seconds
Preserves old behavior for follow=False, returning immediately if no pod exists.
No API changes, fully backward compatible.
Why this is needed
Users commonly want to follow logs right after submitting a TrainingJob.
With the previous behavior, they needed to implement custom waiting logic.
This PR aligns the trainer experience with typical Kubernetes log-following behavior.
Testing
All existing tests pass (162 passed).
No breaking changes.
Local manual tests done.
Fixes #182.