Skip to content

A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports both GPU and CPU modes, with special optimizations for macOS Apple Silicon and enterprise deployment on OpenShift/Kubernetes.

License

Notifications You must be signed in to change notification settings

nussejzz/vllm-playground

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

158 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

vLLM Playground

A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports GPU and CPU modes, with special optimizations for macOS Apple Silicon and enterprise deployment on OpenShift/Kubernetes.

πŸ†• vLLM-Omni Multimodal Generation

vLLM-Omni Audio Generation

Generate images, edit photos, create speech, and produce music - all with vLLM-Omni integration.

✨ Claude Code Integration

vLLM Playground Claude Code

Run Claude Code with open-source models served by vLLM - your private, local coding assistant.

✨ Agentic-Ready with MCP Support

vLLM Playground MCP Integration

MCP (Model Context Protocol) integration enables models to use external tools with human-in-the-loop approval.

πŸ–ΌοΈ VLM (Vision Language Model)

VLM Support

Upload images and chat with vision models like Qwen2.5-VL, LLaVA, and more.

πŸ†• What's New in v0.1.5

  • 🌐 Remote vLLM Server - Connect to any remote vLLM instance via URL + API key
  • πŸ–ΌοΈ VLM Support - Image upload and multimodal chat with vision models
  • ✨ Markdown Rendering - Rich formatting for assistant messages (bold, lists, code blocks)
  • πŸ”§ Bug Fixes - GuideLLM hang, Claude Code remote mode, structured outputs for vLLM v0.12+

See Changelog for full details.


πŸš€ Quick Start

# Install from PyPI
pip install vllm-playground

# Pre-download container image (~10GB for GPU)
vllm-playground pull

# Start the playground
vllm-playground

Open http://localhost:7860 and click "Start Server" - that's it! πŸŽ‰

CLI Options

vllm-playground pull                # Pre-download GPU image (NVIDIA)
vllm-playground pull --nvidia       # Pre-download NVIDIA GPU image
vllm-playground pull --amd          # Pre-download AMD ROCm image
vllm-playground pull --tpu          # Pre-download Google TPU image
vllm-playground pull --cpu          # Pre-download CPU image
vllm-playground pull --all          # Pre-download all images
vllm-playground --port 8080         # Custom port
vllm-playground stop                # Stop running instance
vllm-playground status              # Check status

✨ Key Features

Feature Description
🌐 Remote Server Connect to any remote vLLM instance via URL + API key
πŸ–ΌοΈ VLM Support Upload images and chat with vision models (Qwen2.5-VL, LLaVA)
πŸ€– Claude Code Use open-source models as Claude Code backend via vLLM
πŸ’¬ Modern Chat UI Markdown-rendered chat with streaming responses
πŸ”§ Tool Calling Function calling with Llama, Mistral, Qwen, and more
πŸ”— MCP Integration Connect to MCP servers for agentic capabilities
πŸ—οΈ Structured Outputs Constrain responses to JSON Schema, Regex, or Grammar
🐳 Container Mode Zero-setup vLLM via automatic container management
☸️ OpenShift/K8s Enterprise deployment with dynamic pod creation
πŸ“Š Benchmarking GuideLLM integration for load testing
πŸ“š Recipes One-click configs from vLLM community recipes

πŸ“¦ Installation Options

Method Command Best For
PyPI pip install vllm-playground Most users
With Benchmarking pip install vllm-playground[benchmark] Load testing
From Source git clone + python run.py Development
OpenShift/K8s ./openshift/deploy.sh Enterprise

πŸ“– See Installation Guide for detailed instructions.


πŸ”§ Configuration

Tool Calling

Enable in Server Configuration before starting:

  1. Check "Enable Tool Calling"
  2. Select parser (or "Auto-detect")
  3. Start server
  4. Define tools in the πŸ”§ toolbar panel

Supported Models:

  • Llama 3.x (llama3_json)
  • Mistral (mistral)
  • Qwen (hermes)
  • Hermes (hermes)

Claude Code Integration

Use vLLM to serve open-source models as a backend for Claude Code:

  1. Go to Claude Code in the sidebar
  2. Start vLLM with a recommended model (see tips on the page)
  3. The embedded terminal connects automatically

Requirements:

  • vLLM v0.12.0+ (for Anthropic Messages API)
  • Model with native 65K+ context and tool calling support
  • ttyd installed for web terminal

Recommended Model for most GPUs:

meta-llama/Llama-3.1-8B-Instruct
--max-model-len 65536 --enable-auto-tool-choice --tool-call-parser llama3_json

Note: This integration demonstrates using vLLM as a backend for Claude Code. Claude Code is a separate product by Anthropic - users must install it independently and comply with Anthropic's Commercial Terms of Service. vLLM Playground provides the terminal interface only.

MCP Servers

Connect to external tools via Model Context Protocol:

  1. Go to MCP Servers in the sidebar
  2. Add a server (presets available: Filesystem, Git, Fetch, Time)
  3. Connect and enable in chat panel

⚠️ MCP requires Python 3.10+

CPU Mode (macOS)

Edit config/vllm_cpu.env:

export VLLM_CPU_KVCACHE_SPACE=40
export VLLM_CPU_OMP_THREADS_BIND=auto

Metal GPU Support (macOS Apple Silicon)

vLLM Playground supports Apple Silicon GPU acceleration:

  1. Install vllm-metal following official instructions
  2. Configure playground to use Metal:
    • Run Mode: Subprocess
    • Compute Mode: Metal
    • Venv Path: ~/.venv-vllm-metal (or your installation path)

See macOS Metal Guide for details.

Custom vLLM Installations

Use specific vLLM versions or custom builds:

  1. Install vLLM in a virtual environment
  2. Configure playground:
    • Run Mode: Subprocess
    • Venv Path: /path/to/your/venv

See Custom venv Guide for details.


πŸ“– Documentation

Getting Started

Features

Deployment

Reference

Releases

  • Changelog - Version history and changes
  • v0.1.5 - Remote server, VLM vision support, markdown rendering
  • v0.1.4 - vLLM-Omni multimodal, Studio UI
  • v0.1.3 - Multi-accelerators, Claude Code, vLLM-Metal
  • v0.1.2 - ModelScope integration, i18n improvements
  • v0.1.1 - MCP integration, runtime detection
  • v0.1.0 - First release, modern UI, tool calling

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User Browser   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚ http://localhost:7860
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Web UI (Host)  β”‚  ← FastAPI + JavaScript
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
    ↓         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€-─┐ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ vLLM    β”‚ β”‚  MCP   β”‚  ← Containers / External Servers
β”‚Containerβ”‚ β”‚Servers β”‚
└────────-β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“– See Architecture Overview for details.


πŸ†˜ Quick Troubleshooting

Issue Solution
Port in use vllm-playground stop
Container won't start podman logs vllm-service
Tool calling fails Restart with "Enable Tool Calling" checked
Image pull errors vllm-playground pull --all

πŸ“– See Troubleshooting Guide for more.


πŸ”— Related Projects


πŸ“ License

Apache 2.0 License - See LICENSE file for details.

🀝 Contributing

Contributions welcome! Please see CONTRIBUTING.md for setup instructions and guidelines.


Made with ❀️ for the vLLM community

About

A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports both GPU and CPU modes, with special optimizations for macOS Apple Silicon and enterprise deployment on OpenShift/Kubernetes.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • JavaScript 38.7%
  • Python 30.2%
  • CSS 15.3%
  • HTML 11.8%
  • Shell 3.9%
  • Dockerfile 0.1%