Skip to content

Conversation

markurtz
Copy link
Collaborator

Summary

Introduces a comprehensive refactor of the benchmarking system, replacing the previous architecture with a more flexible and extensible design. The changes include new aggregation protocols, enhanced benchmark objects with comprehensive metrics, and improved progress tracking capabilities. This refactor enables better separation of concerns, more granular metric collection, and improved real-time monitoring of benchmark execution.

Details

  • New Aggregation System: Replaced BenchmarkAggregator with protocol-based Aggregator and CompilableAggregator interfaces, enabling composable metric collection and compilation
  • Enhanced Benchmark Objects: Refactored benchmark data models in objects.py with comprehensive metrics including timing distributions, token statistics, and performance measurements
  • Improved Benchmarker: Redesigned Benchmarker class to coordinate request scheduling, data aggregation, and result compilation with thread-safe singleton pattern
  • Flexible Output System: Added pluggable output formatters supporting console, CSV, HTML, and JSON formats with configurable file paths
  • Advanced Progress Tracking: Implemented composite progress handlers with real-time console display showing detailed metrics, timing information, and progress bars
  • Profile System Enhancements: Enhanced profile configurations with better strategy generation, constraint management, and completion tracking
  • Comprehensive Entrypoints: Redesigned benchmark_generative_text function with improved configuration options, validation, and error handling

Key Components Added:

  • SchedulerStatsAggregator: Collects scheduler timing and performance metrics
  • GenerativeRequestsAggregator: Compiles complete generative benchmark results with warmup/cooldown filtering
  • GenerativeStatsProgressAggregator: Tracks real-time generation metrics during execution
  • BenchmarkerProgressGroup: Composite progress handler for multiple tracking instances
  • GenerativeBenchmarkerOutput: Pluggable output system with multiple format support

Breaking Changes:

  • Removed BenchmarkAggregator and GenerativeBenchmarkAggregator classes
  • Restructured benchmark object hierarchy and field names
  • Modified Benchmarker.run() method signature and return type
  • Updated progress tracking interfaces and event handling

Test Plan

  • Tests to be added in a subsequent PR

Related Issues

  • Part of the larger scheduler refactor initiative

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

@markurtz markurtz changed the base branch from main to features/refactor/backend September 19, 2025 11:45
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@markurtz markurtz force-pushed the features/refactor/backend branch from 6b3331f to a88605e Compare September 19, 2025 12:14
@markurtz markurtz force-pushed the features/refactor/benchmarker branch from 2515465 to 4834767 Compare September 19, 2025 12:20
@markurtz markurtz requested a review from Copilot September 19, 2025 12:21
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/guidellm/benchmark/progress.py:1

  • Return type annotation includes SynchronousProfile and ThroughputProfile but the method returns strategy instances, not profile instances. These should be SynchronousStrategy and ThroughputStrategy.
"""

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Signed-off-by: Mark Kurtz <[email protected]>
sjmonson added a commit that referenced this pull request Sep 23, 2025
…nto features/refactor/base-draft

[GuideLLM Refactor] benchmark package updates and rewrites #356
sjmonson added a commit that referenced this pull request Sep 25, 2025
Base automatically changed from features/refactor/backend to features/refactor/base September 29, 2025 14:19
@markurtz markurtz merged commit 18e95be into features/refactor/base Sep 29, 2025
12 of 17 checks passed
@markurtz markurtz deleted the features/refactor/benchmarker branch September 29, 2025 14:19
@markurtz markurtz added this to the v0.4.0 milestone Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants