Skip to content

[RAY] timeout mechanism for job length or long queues #45

@ibm-peach-fish

Description

@ibm-peach-fish

reflected from private internal issue tracker by @chakrn

When we use caikit-ray-backend to submit a new job, that job can run indefinitely with no timeout. We need a way to have a configurable timeout value and cancel the job after exceeding the time.

After having a quick discussion with Dean, this can probably be done simply by changing the ray.get() to a ray.wait() (which should've been the case anyway). Then poll for status and kill the job after a certain elapsed time.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

Status

Ready for Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions