Skip to content

Conversation

@irar2
Copy link
Collaborator

@irar2 irar2 commented Oct 26, 2025

The test sends a large number of requests simultaneously, max-num-seqs is 1000, time-to-first-token is 2000, std-dev 600.
The test checks number of running requests metric three times:

  1. Short time after the requests were sent, in this case we expect the number of running requests to be exactly 1000, since all the requests should be still running.
  2. 2.5 seconds later, most of the first 1000 requests should be finished, but there can be some still finishing and being replaced by new requests. Therefore, the number of running requests can be either 1000 and 999, with one request finished but still not replaced.
  3. After one more second, we can also expect 1000 or 999 running requests as in the second case.

Closes #228

@mayabar
Copy link
Collaborator

mayabar commented Oct 27, 2025

/lgtm
/approve

@github-actions github-actions bot added the lgtm label Oct 27, 2025
@github-actions github-actions bot merged commit d60782a into llm-d:main Oct 27, 2025
4 checks passed
@irar2 irar2 deleted the test branch October 29, 2025 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix occasional queue test failure

2 participants