Skip to content

Respect ignore_eos in sim #186

@pancak3

Description

@pancak3

What would you like to be added:
When generate tokens, respect ignore_eos

Why is this needed:

maxTokens := int(*maxCompletionTokens)

Currently, when generating tokens, max_tokens or max_completion_tokens is used to validate the length of the generated output. However, the ignore_eos parameter is not taken into account.

When evaluating model performance on production request datasets, it's crucial to force models to generate a precise number of tokens for a fair comparison. Setting ignore_eos to true ensures that the output will be exactly max_tokens (or max_completion_tokens) long, mimicking the behavior of real-world inference services more accurately.

Therefore, we need to modify the token generation logic to properly handle the ignore_eos parameter. This will allow for consistent and reproducible evaluations by ensuring the number of output tokens is equal to the specified maximum length when ignore_eos is true.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions