-
Notifications
You must be signed in to change notification settings - Fork 229
Open
Description
When running the Google Colab notebook, it looks like there is some error when loading the Mixtral Instruct Tokenizer:
[/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py](https://localhost:8080/#) in __init__(self, *args, **kwargs)
109 elif fast_tokenizer_file is not None and not from_slow:
110 # We have a serialization from tokenizers which let us directly build the backend
--> 111 fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
112 elif slow_tokenizer is not None:
113 # We need to convert a slow tokenizer to build the backend
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3
This appears to be a bug with the transformers and tokenizer versions (see: huggingface/transformers#31789), so the requirements.txt probably need to be updated. But i haven't been able to fix it properly. I changed the tokenizer to the base Mixtral model, but it's not the proper solution.
Metadata
Metadata
Assignees
Labels
No labels