Skip to content

Feature Request: Multi-Model Serving & Deployment Management #39

@ibndias

Description

@ibndias

Feature Request

Problem / Use Case

Currently, vLLM Playground manages a single vLLM server instance at a time. The architecture connects one Web UI to one vLLM container, and the CLI (start/stop/status) operates on a single server. While the v0.1.5 remote server feature allows connecting to one external vLLM instance, there is no support for managing multiple models or server instances simultaneously.

For production and development workflows, it is common to need multiple models running concurrently - for example, a coding assistant model alongside a general-purpose chat model, or A/B testing different model versions.

Proposed Feature

Support for managing multiple vLLM server instances simultaneously, including:

  • Multi-instance dashboard - Start, stop, and monitor multiple vLLM servers from a single UI, each serving a different model on different ports/GPUs
  • Model registry / catalog - A central view of available models with one-click deploy, showing which are currently running and their resource usage
  • Per-model configuration - Independent configuration (GPU assignment, quantization, context length, etc.) for each running instance
  • Dynamic model switching - Ability to route chat sessions to different running models without restarting the playground
  • Resource-aware scheduling - Visibility into GPU/CPU/memory utilization to help decide which models can be co-located

Why This Matters

As teams scale their use of local LLM serving, the ability to orchestrate multiple models from a single interface becomes essential. This would complement the existing OpenShift/K8s deployment support and make vLLM Playground a more complete solution for both individual developers and enterprise teams.

Environment

  • vLLM Playground v0.1.5
  • Feature request (not a bug)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions