Overview
This release marks a significant milestone with a full architectural refactor of the GuideLLM codebase to improve extensibility, performance, and maintainability. Key highlights include multimodal benchmarking support (vision and audio), a new mock server for testing, and comprehensive updates to output generation and statistics gathering. Additionally, the minimum supported Python version has been bumped to 3.10 to leverage modern language features.
To get started, install with:
pip install guidellm==0.4.0Or from source with:
pip install git+https://github.com/vllm-project/[email protected]What's New
- Multimodal Support: Added comprehensive support for vision and audio workloads, including audio transcription and translation benchmarking.
- Full Refactor: Complete restructuring of core packages (
backends,scheduler,benchmark,data) to support high-rate load generation and easier extensibility. - Mock Server: Introduced a built-in mock server package to facilitate testing and development without requiring a live LLM backend.
- E2E Testing: Added a new End-to-End (E2E) testing workflow with a dedicated vLLM simulator.
What's Changed
- Python Requirement: Minimum supported Python version bumped to 3.10 (previously 3.9).
- CLI Arguments:
- Renamed
-rate-typeto-profilefor clarity. - Added support for dashed arguments (e.g.,
-max-secondsalongsidemax_seconds). -ratesargument now supports comma-separated lists for easier sweeping.
- Renamed
- Container: Updated Docker container to include
ffmpeg-freeand other utilities for multimodal support. - Data Pipelines: Reworked data pipelines to support complex multimodal datasets and better error propagation for HuggingFace loading.
What's Fixed
- Synthetic Data: Fixed an issue where synthetic text datasets would lose randomness across benchmarks in the same session.
- CSV Generation: Resolved failures in CSV output generation during benchmarks.
- Asyncio Stability: Fixed various
asyncioand timezone-related issues in tests and schedulers. - Type Safety: Extensive type fixes and improvements across the codebase, particularly in the
schedulerandutilspackages.
Compatibility Notes
- Python: 3.10 – 3.13
- OS: Linux and macOS
- Dependencies:
- Added
torchcodec - Removed
librosa,pydub,soundfile - Development workflow now uses
pdmandtox-pdm
- Added
New Contributors
- @shijinye made their first contribution in PR #327
- @git-jxj made their first contribution in PR #435
- @AlonKellner-RedHat made their first contribution in PR #440
Changelog
Refactor & Core Architecture
- PR #351: Full refactor of GuideLLM
- PR #354: Scheduler package updates, rewrites, and tests expansion
- PR #355: Backend package updates, rewrites, and tests expansion
- PR #356: Benchmark package updates and rewrites
- PR #357: Mock server package creation
- PR #364: Core reintroduction of changes from main
Multimodal & Data
- PR #384: Data pipelines rework and multimodal support
- PR #419: Split multimodal group into vision and audio
- PR #411: Replace librosa, pydub, and soundfile with torchcodec
- PR #412: Fixes for constant rate and audio flows
- PR #463: Ensure synthetic text datasets remain random across benchmarks
Features & Enhancements
- PR #378: Complete CSV output
- PR #432: Better scenario from-file support
- PR #441: Support dashed arguments for benchmark args
- PR #433: Switch --rates CLI arg to handle a comma separated list of values
- PR #382: Advanced Prefix Cache Controls
Infrastructure & Quality
- PR #397: Bump minimum python version to 3.10
- PR #440: Basic E2E tests
- PR #420: Adapt container for new optional requirements
- PR #415: Add tox command to update lock file
- PR #442: Updates and Fixes for benchmark outputs, schemas, and stats calculations