Respect ignore_eos in sim

**What would you like to be added**:
When generate tokens, respect `ignore_eos`

**Why is this needed**:
https://github.com/llm-d/llm-d-inference-sim/blob/639b40eefcfaadb18ffe5a4c634a220d77c19115/pkg/common/utils.go#L176
Currently, when generating tokens, `max_tokens` or `max_completion_tokens` is used to validate the length of the generated output. However, the `ignore_eos` parameter is not taken into account.

When evaluating model performance on production request datasets, it's crucial to force models to generate a precise number of tokens for a fair comparison. Setting `ignore_eos` to **true** ensures that the output will be exactly `max_tokens` (or `max_completion_tokens`) long, mimicking the behavior of real-world inference services more accurately.

Therefore, we need to modify the token generation logic to properly handle the `ignore_eos` parameter. This will allow for consistent and reproducible evaluations by ensuring the number of output tokens is equal to the specified maximum length when `ignore_eos` is true.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Respect ignore_eos in sim #186

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Respect ignore_eos in sim #186

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions