-
Notifications
You must be signed in to change notification settings - Fork 88
Labels
internalfiled by core contributor or associatefiled by core contributor or associate
Description
The client-side tokenization in guidellm fails to account for the extra tokens added in the server's chat prompt template. There are two possible workarounds:
- Enable usage metrics in each request and let the server tell us how many prompt tokens there are.
- Use the
/completions
endpoint rather than/chat/completions
as the chat template is not applied on the/completions
endpoint.
Metadata
Metadata
Assignees
Labels
internalfiled by core contributor or associatefiled by core contributor or associate