vLLM Playground

A modern web interface for managing and interacting with vLLM servers (www.github.com/vllm-project/vllm). Supports GPU and CPU modes, with special optimizations for macOS Apple Silicon and enterprise deployment on OpenShift/Kubernetes.

🆕 vLLM-Omni Multimodal Generation

Generate images, edit photos, create speech, and produce music - all with vLLM-Omni integration.

✨ Claude Code Integration

Run Claude Code with open-source models served by vLLM - your private, local coding assistant.

✨ Agentic-Ready with MCP Support

MCP (Model Context Protocol) integration enables models to use external tools with human-in-the-loop approval.

🖼️ VLM (Vision Language Model)

Upload images and chat with vision models like Qwen2.5-VL, LLaVA, and more.

🆕 What's New in v0.1.5

🌐 Remote vLLM Server - Connect to any remote vLLM instance via URL + API key
🖼️ VLM Support - Image upload and multimodal chat with vision models
✨ Markdown Rendering - Rich formatting for assistant messages (bold, lists, code blocks)
🔧 Bug Fixes - GuideLLM hang, Claude Code remote mode, structured outputs for vLLM v0.12+

See Changelog for full details.

🚀 Quick Start

# Install from PyPI
pip install vllm-playground

# Pre-download container image (~10GB for GPU)
vllm-playground pull

# Start the playground
vllm-playground

Open http://localhost:7860 and click "Start Server" - that's it! 🎉

CLI Options

vllm-playground pull                # Pre-download GPU image (NVIDIA)
vllm-playground pull --nvidia       # Pre-download NVIDIA GPU image
vllm-playground pull --amd          # Pre-download AMD ROCm image
vllm-playground pull --tpu          # Pre-download Google TPU image
vllm-playground pull --cpu          # Pre-download CPU image
vllm-playground pull --all          # Pre-download all images
vllm-playground --port 8080         # Custom port
vllm-playground stop                # Stop running instance
vllm-playground status              # Check status

✨ Key Features

Feature	Description
🌐 Remote Server	Connect to any remote vLLM instance via URL + API key
🖼️ VLM Support	Upload images and chat with vision models (Qwen2.5-VL, LLaVA)
🤖 Claude Code	Use open-source models as Claude Code backend via vLLM
💬 Modern Chat UI	Markdown-rendered chat with streaming responses
🔧 Tool Calling	Function calling with Llama, Mistral, Qwen, and more
🔗 MCP Integration	Connect to MCP servers for agentic capabilities
🏗️ Structured Outputs	Constrain responses to JSON Schema, Regex, or Grammar
🐳 Container Mode	Zero-setup vLLM via automatic container management
☸️ OpenShift/K8s	Enterprise deployment with dynamic pod creation
📊 Benchmarking	GuideLLM integration for load testing
📚 Recipes	One-click configs from vLLM community recipes

📦 Installation Options

Method	Command	Best For
PyPI	`pip install vllm-playground`	Most users
With Benchmarking	`pip install vllm-playground[benchmark]`	Load testing
From Source	`git clone` + `python run.py`	Development
OpenShift/K8s	`./openshift/deploy.sh`	Enterprise

📖 See Installation Guide for detailed instructions.

🔧 Configuration

Tool Calling

Enable in Server Configuration before starting:

Check "Enable Tool Calling"
Select parser (or "Auto-detect")
Start server
Define tools in the 🔧 toolbar panel

Supported Models:

Llama 3.x (llama3_json)
Mistral (mistral)
Qwen (hermes)
Hermes (hermes)

Claude Code Integration

Use vLLM to serve open-source models as a backend for Claude Code:

Go to Claude Code in the sidebar
Start vLLM with a recommended model (see tips on the page)
The embedded terminal connects automatically

Requirements:

vLLM v0.12.0+ (for Anthropic Messages API)
Model with native 65K+ context and tool calling support
ttyd installed for web terminal

Recommended Model for most GPUs:

meta-llama/Llama-3.1-8B-Instruct
--max-model-len 65536 --enable-auto-tool-choice --tool-call-parser llama3_json

Note: This integration demonstrates using vLLM as a backend for Claude Code. Claude Code is a separate product by Anthropic - users must install it independently and comply with Anthropic's Commercial Terms of Service. vLLM Playground provides the terminal interface only.

MCP Servers

Connect to external tools via Model Context Protocol:

Go to MCP Servers in the sidebar
Add a server (presets available: Filesystem, Git, Fetch, Time)
Connect and enable in chat panel

⚠️ MCP requires Python 3.10+

CPU Mode (macOS)

Edit config/vllm_cpu.env:

export VLLM_CPU_KVCACHE_SPACE=40
export VLLM_CPU_OMP_THREADS_BIND=auto

Metal GPU Support (macOS Apple Silicon)

vLLM Playground supports Apple Silicon GPU acceleration:

Install vllm-metal following official instructions
Configure playground to use Metal:
- Run Mode: Subprocess
- Compute Mode: Metal
- Venv Path: ~/.venv-vllm-metal (or your installation path)

See macOS Metal Guide for details.

Custom vLLM Installations

Use specific vLLM versions or custom builds:

Install vLLM in a virtual environment
Configure playground:
- Run Mode: Subprocess
- Venv Path: /path/to/your/venv

See Custom venv Guide for details.

📖 Documentation

Getting Started

Installation Guide - All installation methods
Quick Start - Get running in minutes
macOS CPU Guide - Apple Silicon CPU setup
macOS Metal Guide - Apple Silicon GPU acceleration
Custom venv Guide - Using custom vLLM installations

Features

Features Overview - Complete feature list
Gated Models Guide - Access Llama, Gemma, etc.

Deployment

OpenShift/K8s Deployment - Enterprise deployment
Architecture Overview - System design
Container Variants - Container options

Reference

Troubleshooting - Common issues
Performance Metrics - Benchmarking
Command Reference - CLI cheat sheet

Releases

Changelog - Version history and changes
v0.1.5 - Remote server, VLM vision support, markdown rendering
v0.1.4 - vLLM-Omni multimodal, Studio UI
v0.1.3 - Multi-accelerators, Claude Code, vLLM-Metal
v0.1.2 - ModelScope integration, i18n improvements
v0.1.1 - MCP integration, runtime detection
v0.1.0 - First release, modern UI, tool calling

🏗️ Architecture

┌──────────────────┐
│   User Browser   │
└────────┬─────────┘
         │ http://localhost:7860
         ↓
┌──────────────────┐
│   Web UI (Host)  │  ← FastAPI + JavaScript
└────────┬─────────┘
         │
    ┌────┴────┐
    ↓         ↓
┌───────-─┐ ┌────────┐
│ vLLM    │ │  MCP   │  ← Containers / External Servers
│Container│ │Servers │
└────────-┘ └────────┘

📖 See Architecture Overview for details.

🆘 Quick Troubleshooting

Issue	Solution
Port in use	`vllm-playground stop`
Container won't start	`podman logs vllm-service`
Tool calling fails	Restart with "Enable Tool Calling" checked
Image pull errors	`vllm-playground pull --all`

📖 See Troubleshooting Guide for more.

🔗 Related Projects

vLLM - High-throughput LLM serving
Claude Code - Anthropic's agentic coding tool
LLMCompressor Playground - Model compression & quantization
GuideLLM - Performance benchmarking
MCP Servers - Official MCP servers

📝 License

Apache 2.0 License - See LICENSE file for details.

🤝 Contributing

Contributions welcome! Please see CONTRIBUTING.md for setup instructions and guidelines.

Made with ❤️ for the vLLM community

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
assets		assets
cli_demo		cli_demo
containers		containers
deployments		deployments
docs		docs
openshift		openshift
releases		releases
scripts		scripts
vllm_playground		vllm_playground
.containerignore		.containerignore
.editorconfig		.editorconfig
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
env.example		env.example
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM Playground

🆕 vLLM-Omni Multimodal Generation

✨ Claude Code Integration

✨ Agentic-Ready with MCP Support

🖼️ VLM (Vision Language Model)

🆕 What's New in v0.1.5

🚀 Quick Start

CLI Options

✨ Key Features

📦 Installation Options

🔧 Configuration

Tool Calling

Claude Code Integration

MCP Servers

CPU Mode (macOS)

Metal GPU Support (macOS Apple Silicon)

Custom vLLM Installations

📖 Documentation

Getting Started

Features

Deployment

Reference

Releases

🏗️ Architecture

🆘 Quick Troubleshooting

🔗 Related Projects

📝 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

nussejzz/vllm-playground

Folders and files

Latest commit

History

Repository files navigation

vLLM Playground

🆕 vLLM-Omni Multimodal Generation

✨ Claude Code Integration

✨ Agentic-Ready with MCP Support

🖼️ VLM (Vision Language Model)

🆕 What's New in v0.1.5

🚀 Quick Start

CLI Options

✨ Key Features

📦 Installation Options

🔧 Configuration

Tool Calling

Claude Code Integration

MCP Servers

CPU Mode (macOS)

Metal GPU Support (macOS Apple Silicon)

Custom vLLM Installations

📖 Documentation

Getting Started

Features

Deployment

Reference

Releases

🏗️ Architecture

🆘 Quick Troubleshooting

🔗 Related Projects

📝 License

🤝 Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages