Skip to content

Commit 18e95be

Browse files
authored
[GuideLLM Refactor] benchmark package updates and rewrites (#356)
## **Summary** Introduces a comprehensive refactor of the benchmarking system, replacing the previous architecture with a more flexible and extensible design. The changes include new aggregation protocols, enhanced benchmark objects with comprehensive metrics, and improved progress tracking capabilities. This refactor enables better separation of concerns, more granular metric collection, and improved real-time monitoring of benchmark execution. ## **Details** - **New Aggregation System**: Replaced `BenchmarkAggregator` with protocol-based `Aggregator` and `CompilableAggregator` interfaces, enabling composable metric collection and compilation - **Enhanced Benchmark Objects**: Refactored benchmark data models in `objects.py` with comprehensive metrics including timing distributions, token statistics, and performance measurements - **Improved Benchmarker**: Redesigned `Benchmarker` class to coordinate request scheduling, data aggregation, and result compilation with thread-safe singleton pattern - **Flexible Output System**: Added pluggable output formatters supporting console, CSV, HTML, and JSON formats with configurable file paths - **Advanced Progress Tracking**: Implemented composite progress handlers with real-time console display showing detailed metrics, timing information, and progress bars - **Profile System Enhancements**: Enhanced profile configurations with better strategy generation, constraint management, and completion tracking - **Comprehensive Entrypoints**: Redesigned `benchmark_generative_text` function with improved configuration options, validation, and error handling ### Key Components Added: - `SchedulerStatsAggregator`: Collects scheduler timing and performance metrics - `GenerativeRequestsAggregator`: Compiles complete generative benchmark results with warmup/cooldown filtering - `GenerativeStatsProgressAggregator`: Tracks real-time generation metrics during execution - `BenchmarkerProgressGroup`: Composite progress handler for multiple tracking instances - `GenerativeBenchmarkerOutput`: Pluggable output system with multiple format support ### Breaking Changes: - Removed `BenchmarkAggregator` and `GenerativeBenchmarkAggregator` classes - Restructured benchmark object hierarchy and field names - Modified `Benchmarker.run()` method signature and return type - Updated progress tracking interfaces and event handling ## **Test Plan** - Tests to be added in a subsequent PR ## **Related Issues** - Part of the larger scheduler refactor initiative --- - [x] "I certify that all code in this PR is my own, except as noted below." ## **Use of AI** - [x] Includes AI-assisted code completion - [x] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
2 parents 8f48cf9 + 61736f5 commit 18e95be

File tree

10 files changed

+4077
-3548
lines changed

10 files changed

+4077
-3548
lines changed

src/guidellm/benchmark/__init__.py

Lines changed: 42 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,31 @@
1-
from .aggregator import AggregatorT, BenchmarkAggregator, GenerativeBenchmarkAggregator
2-
from .benchmark import (
1+
from .aggregator import (
2+
Aggregator,
3+
AggregatorState,
4+
CompilableAggregator,
5+
GenerativeRequestsAggregator,
6+
GenerativeStatsProgressAggregator,
7+
InjectExtrasAggregator,
8+
SchedulerStatsAggregator,
9+
SerializableAggregator,
10+
)
11+
from .benchmarker import Benchmarker
12+
from .entrypoints import benchmark_generative_text, reimport_benchmarks_report
13+
from .objects import (
314
Benchmark,
4-
BenchmarkArgs,
515
BenchmarkMetrics,
6-
BenchmarkRunStats,
16+
BenchmarkSchedulerStats,
717
BenchmarkT,
818
GenerativeBenchmark,
19+
GenerativeBenchmarksReport,
920
GenerativeMetrics,
10-
GenerativeTextErrorStats,
11-
GenerativeTextResponseStats,
12-
StatusBreakdown,
21+
GenerativeRequestStats,
22+
)
23+
from .output import (
24+
GenerativeBenchmarkerConsole,
25+
GenerativeBenchmarkerCSV,
26+
GenerativeBenchmarkerHTML,
27+
GenerativeBenchmarkerOutput,
1328
)
14-
from .benchmarker import Benchmarker, BenchmarkerResult, GenerativeBenchmarker
15-
from .entrypoints import benchmark_generative_text, reimport_benchmarks_report
16-
from .output import GenerativeBenchmarksConsole, GenerativeBenchmarksReport
1729
from .profile import (
1830
AsyncProfile,
1931
ConcurrentProfile,
@@ -22,46 +34,45 @@
2234
SweepProfile,
2335
SynchronousProfile,
2436
ThroughputProfile,
25-
create_profile,
2637
)
2738
from .progress import (
28-
BenchmarkerProgressDisplay,
29-
BenchmarkerTaskProgressState,
30-
GenerativeTextBenchmarkerProgressDisplay,
31-
GenerativeTextBenchmarkerTaskProgressState,
39+
BenchmarkerProgress,
40+
BenchmarkerProgressGroup,
41+
GenerativeConsoleBenchmarkerProgress,
3242
)
3343

3444
__all__ = [
35-
"AggregatorT",
45+
"Aggregator",
46+
"AggregatorState",
3647
"AsyncProfile",
3748
"Benchmark",
38-
"BenchmarkAggregator",
39-
"BenchmarkArgs",
4049
"BenchmarkMetrics",
41-
"BenchmarkRunStats",
50+
"BenchmarkSchedulerStats",
4251
"BenchmarkT",
4352
"Benchmarker",
44-
"BenchmarkerProgressDisplay",
45-
"BenchmarkerResult",
46-
"BenchmarkerTaskProgressState",
53+
"BenchmarkerProgress",
54+
"BenchmarkerProgressGroup",
55+
"CompilableAggregator",
4756
"ConcurrentProfile",
4857
"GenerativeBenchmark",
49-
"GenerativeBenchmarkAggregator",
50-
"GenerativeBenchmarker",
51-
"GenerativeBenchmarksConsole",
58+
"GenerativeBenchmarkerCSV",
59+
"GenerativeBenchmarkerConsole",
60+
"GenerativeBenchmarkerHTML",
61+
"GenerativeBenchmarkerOutput",
5262
"GenerativeBenchmarksReport",
63+
"GenerativeConsoleBenchmarkerProgress",
5364
"GenerativeMetrics",
54-
"GenerativeTextBenchmarkerProgressDisplay",
55-
"GenerativeTextBenchmarkerTaskProgressState",
56-
"GenerativeTextErrorStats",
57-
"GenerativeTextResponseStats",
65+
"GenerativeRequestStats",
66+
"GenerativeRequestsAggregator",
67+
"GenerativeStatsProgressAggregator",
68+
"InjectExtrasAggregator",
5869
"Profile",
5970
"ProfileType",
60-
"StatusBreakdown",
71+
"SchedulerStatsAggregator",
72+
"SerializableAggregator",
6173
"SweepProfile",
6274
"SynchronousProfile",
6375
"ThroughputProfile",
6476
"benchmark_generative_text",
65-
"create_profile",
6677
"reimport_benchmarks_report",
6778
]

0 commit comments

Comments
 (0)