Mixtral Instruct tokenizer from Colab notebook doesn't work.

When running the Google Colab notebook, it looks like there is some error when loading the Mixtral Instruct Tokenizer:

```
[/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py](https://localhost:8080/#) in __init__(self, *args, **kwargs)
    109         elif fast_tokenizer_file is not None and not from_slow:
    110             # We have a serialization from tokenizers which let us directly build the backend
--> 111             fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
    112         elif slow_tokenizer is not None:
    113             # We need to convert a slow tokenizer to build the backend

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3
```

This appears to be a bug with the transformers and tokenizer versions (see: https://github.com/huggingface/transformers/issues/31789), so the requirements.txt probably need to be updated. But i haven't been able to fix it properly. I changed the tokenizer to the base Mixtral model, but it's not the proper solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixtral Instruct tokenizer from Colab notebook doesn't work. #38

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Mixtral Instruct tokenizer from Colab notebook doesn't work. #38

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions