-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
System Info
Issue in "huggingface/transformers": "3.4.2",
Environment/Platform
- Website/web-app
- Browser extension
- Server-side (e.g., Node.js, Deno, Bun)
- Desktop app (e.g., Electron)
- Other (e.g., VSCode extension)
Description
For more details check this issue
huggingface/tokenizers#1680 (comment)
Reproduction
Create a PreTrainedTokenizer from AutoTokenizer.from_pretrained
then call encode on the PreTrainedTokenizer instances, as this uses BPE internally, and there is cache
const cached = this.cache.get(token);
if (cached !== undefined) {
return cached;
}
// Save the result to the cache
this.cache.set(token, result);
/**
* Apply Byte-Pair-Encoding (BPE) to a given token. Efficient heap-based priority
* queue implementation adapted from https://github.com/belladoreai/llama-tokenizer-js.
* @param {string} token The token to encode.
* @returns {string[]} The BPE encoded tokens.
*/
bpe(token)
For more details please check this
huggingface/tokenizers#1680 (comment)
daulet/tokenizers#35