Skip to content

Conversation

@mayabar
Copy link
Collaborator

@mayabar mayabar commented Aug 25, 2025

  • Added logic to randomly select the response length based on a pre-defined histogram when the request includes the max_tokens property
  • This initial version applies the same histogram regardless of the specific max_tokens value
  • Future improvements will introduce dynamic buckets adapting the histogram according to different max_tokens values.

fix #167

…sed on a histogram - intial implementation

Signed-off-by: Maya Barnea <[email protected]>
@mayabar mayabar requested review from irar2 and shmuelk August 25, 2025 11:10
…t be randomly selected, instead it will be stop when response length is maxTokens, otherwise - stop

- fix utils_tests

Signed-off-by: Maya Barnea <[email protected]>
@mayabar mayabar requested a review from shmuelk August 26, 2025 08:01
Copy link
Collaborator

@shmuelk shmuelk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding more comments to the function getResponseLengthByHistogram

@mayabar mayabar requested a review from shmuelk August 27, 2025 14:42
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
@shmuelk
Copy link
Collaborator

shmuelk commented Aug 28, 2025

/lgtm

/approve

@github-actions github-actions bot added the lgtm label Aug 28, 2025
@mayabar mayabar merged commit b98882a into llm-d:main Aug 28, 2025
4 checks passed
@mayabar mayabar deleted the max_tokens branch August 28, 2025 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance calculation of tokens number in response based on request's max_tokens parameter - simple logic

2 participants