Commit 47088f9
authored
[GuideLLM Refactor] entrypoints and working state (base to create PRs off of til merged into refactor base) (#358)
## **Summary**
Refactor of the GuideLLM command-line interface, streamlining the
benchmark command structure while adding new mock server functionality
and performance optimization features and adding in any missing fixes in
other PRs to stabilize the refactor to a working state.
## **Details**
- **CLI Interface Overhaul**:
- Removed legacy `-scenario` option in favor of direct parameter
specification
- Reorganized CLI options with clear grouping (Backend, Data, Output,
Aggregators, Constraints)
- Added parameter aliases for backward compatibility
(e.g., `-rate-type` → `-profile`)
- Simplified option defaults by removing scenario-based defaults
- Added comprehensive docstrings and help text for all commands and
options
- **New Mock Server Command**:
- Added guidellm mock-server command with full OpenAI/vLLM API
compatibility
- Configurable latency characteristics (request latency, TTFT, ITL,
output tokens)
- Support for both streaming and non-streaming endpoints
- Comprehensive server configuration options (host, port, workers, model
name)
- **Performance Optimization Features**:
- Added new `perf` optional dependency group
with `orjson`, `msgpack`, `msgspec`, uvloop
- Integrated uvloop for enhanced async performance when available
- Optimized event loop policy selection based on availability
- **Internal Architecture Improvements**:
- Updated import paths
(guidellm.backend → guidellm.backends, guidellm.scheduler.strategy → guidellm.scheduler)
- Replaced scenario-based benchmarking with
direct benchmark_generative_text function calls
- Enhanced error handling and parameter validation
- Simplified logging format for better readability
- **Enhanced Output and Configuration**:
- Added support for multiple output formats
with `-output-formats` option
- Improved output path handling for files vs directories
- Added new constraint options
(`-max-errors`, `-max-error-rate`, `-max-global-error-rate`)
- Enhanced warmup/cooldown specification with flexible
numeric/percentage options
- **Code Quality Improvements**:
- Comprehensive type annotations throughout the codebase
- Detailed docstrings following Google/NumPy style conventions
- Consistent parameter naming and organization
- Removed deprecated version option from main CLI group
## **Test Plan**
- Tests for entrypoints to be added later
## **Related Issues**
- Part of the larger scheduler refactor initiative
---
- [x] "I certify that all code in this PR is my own, except as noted
below."
## **Use of AI**
- [x] Includes AI-assisted code completion
- [x] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)File tree
22 files changed
+1458
-1948
lines changed- src/guidellm
- benchmark
- objects
- presentation
- request
- utils
- tests
- integration/scheduler
- unit
- utils
22 files changed
+1458
-1948
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
70 | 76 | | |
71 | 77 | | |
72 | 78 | | |
| |||
0 commit comments