feat: Make tokenizer `add_special_tokens` option configurable #54

njhill · 2024-03-11T19:28:09Z

In particular so that it can be disabled for chat/instruct models where an explicit template is used that already includes these tokens. For example the leading <s> token added by llama and mixtral tokenizers.

This allows it to be configured globally. We could potentially support per-request in future.

In particular so that it can be disabled for chat/instruct models where an explicit template is used that already includes these tokens. (for example the leading <s> token added by llama and mixtral tokenizers) Signed-off-by: Nick Hill <[email protected]>

RHOAI 2.8.4 granite attention

njhill force-pushed the omit_special_tokens branch from 7c1ff02 to 2afbf1c Compare March 11, 2024 21:47

njhill force-pushed the omit_special_tokens branch from 2afbf1c to 7f074ac Compare March 21, 2024 17:26

Xaenalt pushed a commit to Xaenalt/text-generation-inference that referenced this pull request Aug 14, 2024

Merge pull request IBM#54 from Xaenalt/rhoai-2.8.4-granite-attention

ab2c4e8

RHOAI 2.8.4 granite attention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Make tokenizer `add_special_tokens` option configurable #54

feat: Make tokenizer `add_special_tokens` option configurable #54

Uh oh!

njhill commented Mar 11, 2024

Uh oh!

Uh oh!

feat: Make tokenizer add_special_tokens option configurable #54

Are you sure you want to change the base?

feat: Make tokenizer add_special_tokens option configurable #54

Uh oh!

Conversation

njhill commented Mar 11, 2024

Uh oh!

Uh oh!

feat: Make tokenizer `add_special_tokens` option configurable #54

feat: Make tokenizer `add_special_tokens` option configurable #54