[GuideLLM Refactor] benchmark package updates and rewrites #356

markurtz · 2025-09-19T11:45:08Z

Summary

Introduces a comprehensive refactor of the benchmarking system, replacing the previous architecture with a more flexible and extensible design. The changes include new aggregation protocols, enhanced benchmark objects with comprehensive metrics, and improved progress tracking capabilities. This refactor enables better separation of concerns, more granular metric collection, and improved real-time monitoring of benchmark execution.

Details

New Aggregation System: Replaced BenchmarkAggregator with protocol-based Aggregator and CompilableAggregator interfaces, enabling composable metric collection and compilation
Enhanced Benchmark Objects: Refactored benchmark data models in objects.py with comprehensive metrics including timing distributions, token statistics, and performance measurements
Improved Benchmarker: Redesigned Benchmarker class to coordinate request scheduling, data aggregation, and result compilation with thread-safe singleton pattern
Flexible Output System: Added pluggable output formatters supporting console, CSV, HTML, and JSON formats with configurable file paths
Advanced Progress Tracking: Implemented composite progress handlers with real-time console display showing detailed metrics, timing information, and progress bars
Profile System Enhancements: Enhanced profile configurations with better strategy generation, constraint management, and completion tracking
Comprehensive Entrypoints: Redesigned benchmark_generative_text function with improved configuration options, validation, and error handling

Key Components Added:

SchedulerStatsAggregator: Collects scheduler timing and performance metrics
GenerativeRequestsAggregator: Compiles complete generative benchmark results with warmup/cooldown filtering
GenerativeStatsProgressAggregator: Tracks real-time generation metrics during execution
BenchmarkerProgressGroup: Composite progress handler for multiple tracking instances
GenerativeBenchmarkerOutput: Pluggable output system with multiple format support

Breaking Changes:

Removed BenchmarkAggregator and GenerativeBenchmarkAggregator classes
Restructured benchmark object hierarchy and field names
Modified Benchmarker.run() method signature and return type
Updated progress tracking interfaces and event handling

Test Plan

Tests to be added in a subsequent PR

Related Issues

Part of the larger scheduler refactor initiative

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Signed-off-by: Mark Kurtz <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/guidellm/benchmark/progress.py:1

Return type annotation includes SynchronousProfile and ThroughputProfile but the method returns strategy instances, not profile instances. These should be SynchronousStrategy and ThroughputStrategy.

"""

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/guidellm/benchmark/progress.py

src/guidellm/benchmark/output.py

src/guidellm/benchmark/aggregator.py

Signed-off-by: Mark Kurtz <[email protected]>

…nto features/refactor/base-draft [GuideLLM Refactor] benchmark package updates and rewrites #356

…eatures/refactor/benchmarker

markurtz requested review from DaltheCow, sjmonson, markVaykhansky, jaredoconnell, AlonKellner-RedHat and Copilot September 19, 2025 11:45

markurtz changed the base branch from main to features/refactor/backend September 19, 2025 11:45

Copilot AI reviewed Sep 19, 2025

View reviewed changes

markurtz force-pushed the features/refactor/backend branch from 6b3331f to a88605e Compare September 19, 2025 12:14

markurtz added 2 commits September 19, 2025 12:18

Add in benchmark package refactor

7829fb8

Signed-off-by: Mark Kurtz <[email protected]>

fixes and rebase

4834767

Signed-off-by: Mark Kurtz <[email protected]>

markurtz force-pushed the features/refactor/benchmarker branch from 2515465 to 4834767 Compare September 19, 2025 12:20

markurtz requested a review from Copilot September 19, 2025 12:21

Copilot AI reviewed Sep 19, 2025

View reviewed changes

src/guidellm/benchmark/progress.py Outdated Show resolved Hide resolved

src/guidellm/benchmark/output.py Show resolved Hide resolved

src/guidellm/benchmark/output.py Show resolved Hide resolved

src/guidellm/benchmark/aggregator.py Outdated Show resolved Hide resolved

fixes from copilot review

61736f5

Signed-off-by: Mark Kurtz <[email protected]>

sjmonson approved these changes Sep 23, 2025

View reviewed changes

sjmonson added a commit that referenced this pull request Sep 23, 2025

Merge remote-tracking branch 'origin/features/refactor/benchmarker' i…

aa9ae47

…nto features/refactor/base-draft [GuideLLM Refactor] benchmark package updates and rewrites #356

sjmonson added a commit that referenced this pull request Sep 25, 2025

[GuideLLM Refactor] benchmark package updates and rewrites \#356\n\nf…

7a45473

…eatures/refactor/benchmarker

Base automatically changed from features/refactor/backend to features/refactor/base September 29, 2025 14:19

markurtz merged commit 18e95be into features/refactor/base Sep 29, 2025
12 of 17 checks passed

markurtz deleted the features/refactor/benchmarker branch September 29, 2025 14:19

markurtz added this to the v0.4.0 milestone Oct 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GuideLLM Refactor] benchmark package updates and rewrites #356

[GuideLLM Refactor] benchmark package updates and rewrites #356

Uh oh!

markurtz commented Sep 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[GuideLLM Refactor] benchmark package updates and rewrites #356

[GuideLLM Refactor] benchmark package updates and rewrites #356

Uh oh!

Conversation

markurtz commented Sep 19, 2025

Summary

Details

Key Components Added:

Breaking Changes:

Test Plan

Related Issues

Use of AI

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!