Fix errors with metric accumulation #266

sjmonson · 2025-08-08T21:37:50Z

Summary

Fixes a issue in metric calculation that caused incorrect statistics at extreme changes in concurrency and an issue where the first decode token was not counted in total tokens per second.

Details

Fixed issue where merged concurrency change events would double-count concurrency
Ensure first decode token is counted when calculating total tokens per second

Test Plan

Run unit tests: tox -e test-unit -- -m "regression and sanity"

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Signed-off-by: Samuel Monson <[email protected]>

Copilot

Pull Request Overview

This PR fixes two metric calculation issues in the benchmark statistics system: a double-counting bug in concurrency calculations when events are merged due to epsilon tolerance, and incorrect token-per-second calculations that excluded the first decode token.

Fixed concurrency metric accumulation logic to prevent double-counting when events are merged
Corrected token-per-second calculations to include the first decode token by adding 1 to prompt token counts
Added comprehensive regression and edge case tests for the concurrency calculation fixes

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
src/guidellm/objects/statistics.py	Restructured concurrency event processing to fix double-counting in merged events
src/guidellm/benchmark/benchmark.py	Added +1 to prompt tokens to include first decode token in total token calculations
tests/unit/objects/test_statistics.py	Added regression tests for concurrency double-counting and epsilon edge cases

src/guidellm/objects/statistics.py

src/guidellm/benchmark/benchmark.py

src/guidellm/objects/statistics.py

Signed-off-by: Samuel Monson <[email protected]>

markurtz

One minor update, otherwise looks good

src/guidellm/benchmark/benchmark.py

Fixes a issue in metric calculation that caused incorrect statistics at extreme changes in concurrency and an issue where the first decode token was not counted in total tokens per second.  - [x] Fixed issue where merged concurrency change events would double-count concurrency - [x] Ensure first decode token is counted when calculating total tokens per second  - Run unit tests: `tox -e test-unit -- -m "regression and sanity"` --- - [x] "I certify that all code in this PR is my own, except as noted below." - [x] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [x] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Samuel Monson <[email protected]>

sjmonson changed the title ~~Fix a Couple Errors With Metric Accumulation~~ Fix errors with metric accumulation Aug 8, 2025

sjmonson added 3 commits August 11, 2025 16:55

Fix issue with tallying concurrency

82cfb96

Signed-off-by: Samuel Monson <[email protected]>

Count first decode token in first_iter_counts

dfaf886

Signed-off-by: Samuel Monson <[email protected]>

Add unit tests for concurrency issue

1ea30d4

Signed-off-by: Samuel Monson <[email protected]>

sjmonson force-pushed the fix/metric_cumulation branch from f211e30 to 1ea30d4 Compare August 11, 2025 20:55

sjmonson marked this pull request as ready for review August 11, 2025 21:06

sjmonson requested review from Copilot and markurtz August 11, 2025 21:06

Copilot AI reviewed Aug 11, 2025

View reviewed changes

src/guidellm/objects/statistics.py Outdated Show resolved Hide resolved

src/guidellm/objects/statistics.py Outdated Show resolved Hide resolved

src/guidellm/objects/statistics.py Show resolved Hide resolved

sjmonson added this to the v0.3.0 milestone Aug 13, 2025

jaredoconnell reviewed Aug 18, 2025

View reviewed changes

src/guidellm/benchmark/benchmark.py Outdated Show resolved Hide resolved

src/guidellm/objects/statistics.py Show resolved Hide resolved

Address feedback + simplify

086d2b5

Signed-off-by: Samuel Monson <[email protected]>

sjmonson requested a review from jaredoconnell August 19, 2025 14:13

jaredoconnell approved these changes Aug 20, 2025

View reviewed changes

markurtz requested changes Aug 20, 2025

View reviewed changes

src/guidellm/benchmark/benchmark.py Show resolved Hide resolved

markurtz approved these changes Aug 20, 2025

View reviewed changes

sjmonson merged commit 3e274d3 into main Aug 20, 2025
17 of 18 checks passed

sjmonson deleted the fix/metric_cumulation branch August 20, 2025 15:32

sjmonson mentioned this pull request Oct 13, 2025

[GuideLLM Refactor] Data pipelines rework and multimodal support #384

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix errors with metric accumulation #266

Fix errors with metric accumulation #266

Uh oh!

sjmonson commented Aug 8, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markurtz left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix errors with metric accumulation #266

Fix errors with metric accumulation #266

Uh oh!

Conversation

sjmonson commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test Plan

Use of AI

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markurtz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sjmonson commented Aug 8, 2025 •

edited

Loading