Skip to content

Conversation

sjmonson
Copy link
Collaborator

@sjmonson sjmonson commented Aug 8, 2025

Summary

Fixes a issue in metric calculation that caused incorrect statistics at extreme changes in concurrency and an issue where the first decode token was not counted in total tokens per second.

Details

  • Fixed issue where merged concurrency change events would double-count concurrency
  • Ensure first decode token is counted when calculating total tokens per second

Test Plan

  • Run unit tests: tox -e test-unit -- -m "regression and sanity"

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

@sjmonson sjmonson changed the title Fix a Couple Errors With Metric Accumulation Fix errors with metric accumulation Aug 8, 2025
@sjmonson sjmonson force-pushed the fix/metric_cumulation branch from f211e30 to 1ea30d4 Compare August 11, 2025 20:55
@sjmonson sjmonson marked this pull request as ready for review August 11, 2025 21:06
@sjmonson sjmonson requested review from Copilot and markurtz August 11, 2025 21:06
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes two metric calculation issues in the benchmark statistics system: a double-counting bug in concurrency calculations when events are merged due to epsilon tolerance, and incorrect token-per-second calculations that excluded the first decode token.

  • Fixed concurrency metric accumulation logic to prevent double-counting when events are merged
  • Corrected token-per-second calculations to include the first decode token by adding 1 to prompt token counts
  • Added comprehensive regression and edge case tests for the concurrency calculation fixes

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/guidellm/objects/statistics.py Restructured concurrency event processing to fix double-counting in merged events
src/guidellm/benchmark/benchmark.py Added +1 to prompt tokens to include first decode token in total token calculations
tests/unit/objects/test_statistics.py Added regression tests for concurrency double-counting and epsilon edge cases

@sjmonson sjmonson added this to the v0.3.0 milestone Aug 13, 2025
Signed-off-by: Samuel Monson <[email protected]>
@sjmonson sjmonson requested a review from jaredoconnell August 19, 2025 14:13
Copy link
Collaborator

@markurtz markurtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor update, otherwise looks good

@sjmonson sjmonson merged commit 3e274d3 into main Aug 20, 2025
17 of 18 checks passed
@sjmonson sjmonson deleted the fix/metric_cumulation branch August 20, 2025 15:32
sjmonson added a commit that referenced this pull request Oct 8, 2025
<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Fixes a issue in metric calculation that caused incorrect statistics at
extreme changes in concurrency and an issue where the first decode token
was not counted in total tokens per second.

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- [x] Fixed issue where merged concurrency change events would
double-count concurrency
- [x] Ensure first decode token is counted when calculating total tokens
per second

<!--
List the steps needed to test this PR.
-->
- Run unit tests: `tox -e test-unit -- -m "regression and sanity"`

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

- [x] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [x] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

---------

Signed-off-by: Samuel Monson <[email protected]>
sjmonson added a commit that referenced this pull request Oct 9, 2025
<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Fixes a issue in metric calculation that caused incorrect statistics at
extreme changes in concurrency and an issue where the first decode token
was not counted in total tokens per second.

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- [x] Fixed issue where merged concurrency change events would
double-count concurrency
- [x] Ensure first decode token is counted when calculating total tokens
per second

<!--
List the steps needed to test this PR.
-->
- Run unit tests: `tox -e test-unit -- -m "regression and sanity"`

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

- [x] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [x] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

---------

Signed-off-by: Samuel Monson <[email protected]>
sjmonson added a commit that referenced this pull request Oct 10, 2025
<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Fixes a issue in metric calculation that caused incorrect statistics at
extreme changes in concurrency and an issue where the first decode token
was not counted in total tokens per second.

<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- [x] Fixed issue where merged concurrency change events would
double-count concurrency
- [x] Ensure first decode token is counted when calculating total tokens
per second

<!--
List the steps needed to test this PR.
-->
- Run unit tests: `tox -e test-unit -- -m "regression and sanity"`

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

- [x] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [x] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

---------

Signed-off-by: Samuel Monson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants