Skip to content

Commit 47088f9

Browse files
authored
[GuideLLM Refactor] entrypoints and working state (base to create PRs off of til merged into refactor base) (#358)
## **Summary** Refactor of the GuideLLM command-line interface, streamlining the benchmark command structure while adding new mock server functionality and performance optimization features and adding in any missing fixes in other PRs to stabilize the refactor to a working state. ## **Details** - **CLI Interface Overhaul**: - Removed legacy `-scenario` option in favor of direct parameter specification - Reorganized CLI options with clear grouping (Backend, Data, Output, Aggregators, Constraints) - Added parameter aliases for backward compatibility (e.g., `-rate-type` → `-profile`) - Simplified option defaults by removing scenario-based defaults - Added comprehensive docstrings and help text for all commands and options - **New Mock Server Command**: - Added guidellm mock-server command with full OpenAI/vLLM API compatibility - Configurable latency characteristics (request latency, TTFT, ITL, output tokens) - Support for both streaming and non-streaming endpoints - Comprehensive server configuration options (host, port, workers, model name) - **Performance Optimization Features**: - Added new `perf` optional dependency group with `orjson`, `msgpack`, `msgspec`, uvloop - Integrated uvloop for enhanced async performance when available - Optimized event loop policy selection based on availability - **Internal Architecture Improvements**: - Updated import paths (guidellm.backend → guidellm.backends, guidellm.scheduler.strategy → guidellm.scheduler) - Replaced scenario-based benchmarking with direct benchmark_generative_text function calls - Enhanced error handling and parameter validation - Simplified logging format for better readability - **Enhanced Output and Configuration**: - Added support for multiple output formats with `-output-formats` option - Improved output path handling for files vs directories - Added new constraint options (`-max-errors`, `-max-error-rate`, `-max-global-error-rate`) - Enhanced warmup/cooldown specification with flexible numeric/percentage options - **Code Quality Improvements**: - Comprehensive type annotations throughout the codebase - Detailed docstrings following Google/NumPy style conventions - Consistent parameter naming and organization - Removed deprecated version option from main CLI group ## **Test Plan** - Tests for entrypoints to be added later ## **Related Issues** - Part of the larger scheduler refactor initiative --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
2 parents 46a2c1e + 447101b commit 47088f9

22 files changed

+1458
-1948
lines changed

pyproject.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,12 @@ dependencies = [
6767
]
6868

6969
[project.optional-dependencies]
70+
perf = [
71+
"orjson",
72+
"msgpack",
73+
"msgspec",
74+
"uvloop",
75+
]
7076
recommended = [
7177
"tiktoken>=0.11.0", # For OpenAI tokenizer
7278
"blobfile>=3.1.0", # For OpenAI tokenizer

0 commit comments

Comments
 (0)