What would you like to be added:
Support for vLLM sleep and wakeup (https://docs.vllm.ai/en/latest/features/sleep_mode.html#usage) endpoints. Today is stable and enabled in DEV mode VLLM_SERVER_DEV_MODE=1
Why is this needed:
To run inference servers as part of our solution and put them to sleep when inactive and no incoming requests. Waking-up vllm is much quicker than starting new vllm instances (avoids cold start).