GuideLLM v0.3.0
Overview
A major (non-semantic versioning sense) release introducing the GuideLLM web UI, containerized benchmarking, dataset preprocessing, and significant workflow improvements. This release transitions the project from the Neural Magic organization into the vLLM project ecosystem while expanding benchmarking capabilities and improving developer experience.
To get started, install with:
pip install guidellm==0.3.0
Or from source with:
pip install git+https://github.com/vllm-project/[email protected]
What's New
- GuideLLM Web UI: Complete frontend interface with interactive charts and data visualization for benchmark results
- Dataset Preprocessing: New preprocess command to filter datasets by token distribution and save to local files or Hugging Face Hub
- Containerized Benchmarking: Docker support with configurable environment variables for streamlined deployment
- Benchmark Scenarios: Support for file-based benchmark configuration with Pydantic validation
- HTML Report Generation: Static HTML reports with embedded visualization data
What's Changed
- Project Migration: Transitioned from neuralmagic to vllm-project GitHub organization with updated links and branding
- Improved Scheduling: Unified RPS and concurrent scheduler paths for better multi-turn conversation support
- Enhanced OpenAI Backend: Added support for custom headers, SSL verification control, query parameters, and request body modifications
- Development Workflow: Streamlined CI/CD with unified test execution, pre-commit improvements, and artifact management
- Synthetic Data Generator: Added prefix caching controls and unique prompt generation
What's Fixed
- Metric Calculation: Fixed double-counting issues in token calculations and concurrency change events
- Event Loop Errors: Resolved "Event loop Closed" errors in HTTP client connection pooling
- Token Counting: Fixed max token limits in synthetic data generator and first decode token counting
- Display Issues: Corrected metric units display and Firefox compatibility for web UI
Compatibility Notes
- Python: 3.9–3.13
- OS: Linux and macOS
- Dependencies: Updated to latest Pydantic, locked Click to support Python 3.9
- Breaking: Removed several UI workflow components and husky pre-commit hooks
- Breaking: Updated project URLs from vllm-project to neuralmagic organization
New Contributors
- @chewong made their first contribution in #168
- @dagrayvid made their first contribution in #173
- @TomerG711 made their first contribution in #162
- @wangchen615 made their first contribution in #123
- @kyolebu made their first contribution in #207
- @rymc made their first contribution in #223
- @jaredoconnell made their first contribution in #185
- @natoscott made their first contribution in #231
- @kdelee made their first contribution in #230
- @Harshith-umesh made their first contribution in #240
- @tjandy98 made their first contribution in #256
- @tukwila made their first contribution in #302
Changelog
Major Features
- #169: Implement complete GuideLLM UI with interactive charts and Redux state management
- #162: Add dataset preprocessing command with HuggingFace integration
- #123: Add containerized benchmarking support with Docker configuration
- #99: Add support for benchmark scenarios with Pydantic validation
- #218: Implement HTML output generation with embedded data
Infrastructure & Workflows
- #233: Unify RPS and concurrent scheduler paths for improved performance
- #215: Complete UI build pipeline and GitHub Pages workflows
- #231: Migrate project from vllm-project to neuralmagic organization
- #190: Add container build jobs to all workflows
Backend Improvements
- #230: Add CLI options for custom headers and SSL verification
- #146: Allow extra query parameters for OpenAI server requests
- #184: Add remove_from_body parameter to OpenAIHTTPBackend
- #183: Add prefix caching controls to synthetic dataset generator
Bug Fixes & Quality
- #266: Fix metric accumulation errors at extreme concurrency changes
- #188: Fix "Event loop Closed" error in HTTP client pooling
- #173: Fix double counting of tokens and warmup percentage calculation
- #170: Fix max token limits in synthetic data generator