Skip to content

Support for vLLM sleep and wakeup endpoints in simulation mode #218

@aavarghese

Description

@aavarghese

What would you like to be added:
Support for vLLM sleep and wakeup (https://docs.vllm.ai/en/latest/features/sleep_mode.html#usage) endpoints. Today is stable and enabled in DEV mode VLLM_SERVER_DEV_MODE=1

Why is this needed:
To run inference servers as part of our solution and put them to sleep when inactive and no incoming requests. Waking-up vllm is much quicker than starting new vllm instances (avoids cold start).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions