Skip to content

Latest commit

 

History

History
209 lines (174 loc) · 7.95 KB

File metadata and controls

209 lines (174 loc) · 7.95 KB

🎯 vLLM Playground - Feature Overview

✨ What You Get

🖥️ Modern Web Interface

A beautiful, dark-themed UI built with:

  • Responsive Design: Works on desktop, tablet, and mobile
  • Real-time Updates: WebSocket-powered live logs
  • Smooth Animations: Polished user experience
  • Intuitive Layout: Three-panel design for easy navigation

⚙️ Complete Server Management

  • One-Click Server Control: Start/stop vLLM servers instantly
  • Full Configuration: All vLLM parameters accessible
  • Status Monitoring: Real-time server status and uptime
  • Multiple Models: Easy switching between different models

💬 Interactive Chat Interface

  • Test Your Models: Chat directly with your vLLM server
  • Conversation History: Maintains context across messages
  • Adjustable Parameters: Temperature and max tokens sliders
  • Beautiful Message UI: Clear distinction between user/assistant messages

📋 Live Log Viewer

  • Real-time Streaming: See logs as they happen
  • Color-Coded: Different colors for info/warning/error
  • Auto-scroll: Option to follow newest logs
  • Searchable: Easy to find specific log entries

🏗️ Architecture

┌─────────────────────────────────────────────────────────┐
│                      Browser UI                          │
│  ┌─────────────┬──────────────┬──────────────┐         │
│  │ Config      │   Chat       │   Logs       │         │
│  │ Panel       │   Interface  │   Viewer     │         │
│  └─────────────┴──────────────┴──────────────┘         │
└────────────────────┬────────────────────────────────────┘
                     │ WebSocket + REST API
┌────────────────────▼────────────────────────────────────┐
│              FastAPI Backend (app.py)                    │
│  ┌──────────────────────────────────────────────────┐  │
│  │  • Server Management                             │  │
│  │  • Process Control                               │  │
│  │  • Log Broadcasting                              │  │
│  │  • Chat Proxy                                    │  │
│  └──────────────────────────────────────────────────┘  │
└────────────────────┬────────────────────────────────────┘
                     │ subprocess
┌────────────────────▼────────────────────────────────────┐
│              vLLM Server Process                         │
│  (OpenAI-compatible API on port 8000)                   │
└──────────────────────────────────────────────────────────┘

📊 File Structure

webui/
├── 📄 app.py                   # FastAPI backend server
├── 🌐 index.html               # Main UI interface
├── 🚀 run.py                   # Launcher script
├── 📜 start.sh                 # Quick start bash script
├── 📦 requirements.txt         # Python dependencies
├── 📖 README.md                # Full documentation
├── 📝 QUICKSTART.md            # Quick reference guide
├── ⚙️ example_configs.json     # Example configurations
├── 🙈 .gitignore               # Git ignore rules
└── 📁 static/
    ├── css/
    │   └── style.css           # Modern dark theme
    └── js/
        └── app.js              # Frontend logic

🎨 UI Features in Detail

Configuration Panel (Left)

  • Model Selection Dropdown: Popular models pre-loaded
  • Custom Model Input: Support for any HuggingFace model
  • Server Settings: Host, port, tensor parallelism
  • GPU Configuration: Memory utilization slider
  • Data Type Selection: auto/float16/bfloat16/float32
  • Advanced Options: Trust remote code, prefix caching
  • Start/Stop Buttons: Clear visual state

Chat Interface (Center)

  • Chat History Display: Scrollable conversation view
  • Message Input: Multi-line textarea with Ctrl+Enter
  • Generation Parameters:
    • Temperature slider (0.0 - 2.0)
    • Max tokens slider (1 - 4096)
  • Clear Chat Button: Start fresh conversations
  • Status Indicators: Shows when server is ready

Log Viewer (Right)

  • Real-time Updates: WebSocket streaming
  • Color-Coded Logs:
    • 🔵 Blue: Information
    • 🟡 Yellow: Warnings
    • 🔴 Red: Errors
    • 🟢 Green: Success
  • Auto-scroll Toggle: Follow or stay in place
  • Clear Logs Button: Clean up the view
  • Timestamp: Each log entry timestamped

🔌 API Endpoints

Method Endpoint Purpose
GET / Serve main UI
GET /api/status Get server status
POST /api/start Start vLLM server
POST /api/stop Stop vLLM server
POST /api/chat Send chat message
GET /api/models List common models
WS /ws/logs Log stream WebSocket

🎯 Use Cases

1. Development & Testing

  • Quickly spin up models for testing
  • Test different configurations
  • Debug issues with live logs
  • Prototype chat applications

2. Model Evaluation

  • Compare different models easily
  • Test with various parameters
  • Evaluate response quality
  • Benchmark performance

3. Demos & Presentations

  • Clean, professional interface
  • Easy to show to stakeholders
  • Real-time interaction
  • No command line needed

4. Learning & Experimentation

  • Learn how vLLM works
  • Experiment with settings
  • See the effects of parameters
  • Understand model behavior

🔒 Security Notes

⚠️ Important: This WebUI is designed for local development and testing.

For production use, consider:

  • Adding authentication
  • Using HTTPS
  • Limiting network access
  • Validating all inputs
  • Rate limiting
  • Resource quotas

🚀 Performance Tips

  1. First Run: Download happens on first model load (can be slow)
  2. GPU Memory: Start with 70-80% and adjust up
  3. Tensor Parallel: Use for models >13B parameters
  4. Prefix Caching: Enable for repeated prompts
  5. Log Stats: Disable for cleaner logs in production

🎓 Learning Resources

🤝 Contributing Ideas

Want to extend the WebUI? Consider adding:

  • Model temperature presets
  • Save/load configurations
  • Export chat history
  • Multiple chat sessions
  • System prompt configuration
  • Token counter
  • Response time metrics
  • GPU utilization charts
  • Model comparison mode
  • API key management

📈 Roadmap

Phase 1 ✅ (Current)

  • Basic server management
  • Chat interface
  • Log streaming
  • Configuration panel

Phase 2 (Future)

  • Streaming responses
  • Multiple sessions
  • Configuration presets
  • Enhanced metrics

Phase 3 (Future)

  • User authentication
  • Multi-user support
  • Advanced monitoring
  • Performance dashboards

Built with ❤️ for the vLLM community