Skip to content

Conversation

bartowski1182
Copy link
Contributor

@bartowski1182 bartowski1182 commented Mar 20, 2025

tokenizer.added_tokens_decoder returns a fresh dict every time relatively slowly (~0.04s on average) which results in massive slowdowns when we have a huge number of added tokens:

https://github.com/huggingface/transformers/blob/9be4728af8bec48073ae841881d7f4e2ac3521d1/src/transformers/tokenization_utils_fast.py#L264

Typically this slowdown is imperceptible, but when we have a model like ByteCraft with 100,000 added tokens, suddenly 0.04 * 2 * 100,000 = 8000 seconds extra to process the tokens: https://huggingface.co/SamsungSAILMontreal/ByteCraft/blob/main/added_tokens.json

This fix removes the slowdown entirely by calling it only once at the start (initial tokenizer load is still slow at 2 minutes but that's at least workable)

Make sure to read the contributing guidelines before submitting a PR

tokenizer.added_tokens_decoder returns a fresh dict every time relatively slowly (~0.04s on average) which results in massive slowdowns when we have a huge number of added tokens
@github-actions github-actions bot added the python python script changes label Mar 20, 2025
@ggerganov ggerganov merged commit 732b5fb into ggml-org:master Mar 20, 2025
5 checks passed
Ivy233 pushed a commit to Ivy233/llama.cpp that referenced this pull request Mar 23, 2025
tokenizer.added_tokens_decoder returns a fresh dict every time relatively slowly (~0.04s on average) which results in massive slowdowns when we have a huge number of added tokens
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants