Skip to content

Mixtral Instruct tokenizer from Colab notebook doesn't work. #38

@jmuntaner-smd

Description

@jmuntaner-smd

When running the Google Colab notebook, it looks like there is some error when loading the Mixtral Instruct Tokenizer:

[/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py](https://localhost:8080/#) in __init__(self, *args, **kwargs)
    109         elif fast_tokenizer_file is not None and not from_slow:
    110             # We have a serialization from tokenizers which let us directly build the backend
--> 111             fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
    112         elif slow_tokenizer is not None:
    113             # We need to convert a slow tokenizer to build the backend

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3

This appears to be a bug with the transformers and tokenizer versions (see: huggingface/transformers#31789), so the requirements.txt probably need to be updated. But i haven't been able to fix it properly. I changed the tokenizer to the base Mixtral model, but it's not the proper solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions