Skip to content

[Bug] [RayJob] JobStatus can not transition to terminal status when head restart #3903

@dushulin

Description

@dushulin

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

  • case: rayjob submitter pod transition to 'Complete' status, but rayjob cr is still Running. So head and worker pods still running.
    pods:
Image

Image

  • reason: I found header pod container has benn restarted, so job data in GCS maybe cleared

  • I expect rayjob cr status transition to 'Failed' status, and raycluster should be cleared

Reproduction script

delete head pod

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions