Support for vLLM sleep and wakeup endpoints in simulation mode

**What would you like to be added**:
Support for vLLM sleep and wakeup (https://docs.vllm.ai/en/latest/features/sleep_mode.html#usage) endpoints. Today is stable and enabled in DEV mode `VLLM_SERVER_DEV_MODE=1` 

**Why is this needed**:
To run inference servers as part of our solution and put them to sleep when inactive and no incoming requests. Waking-up vllm is much quicker than starting new vllm instances (avoids cold start).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support for vLLM sleep and wakeup endpoints in simulation mode #218

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Support for vLLM sleep and wakeup endpoints in simulation mode #218

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions