Mixtral Instruct tokenizer from Colab notebook doesn't work. #38

jmuntaner-smd · 2024-07-08T14:49:44Z

When running the Google Colab notebook, it looks like there is some error when loading the Mixtral Instruct Tokenizer:

[/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py](https://localhost:8080/#) in __init__(self, *args, **kwargs)
    109         elif fast_tokenizer_file is not None and not from_slow:
    110             # We have a serialization from tokenizers which let us directly build the backend
--> 111             fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
    112         elif slow_tokenizer is not None:
    113             # We need to convert a slow tokenizer to build the backend

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3

This appears to be a bug with the transformers and tokenizer versions (see: huggingface/transformers#31789), so the requirements.txt probably need to be updated. But i haven't been able to fix it properly. I changed the tokenizer to the base Mixtral model, but it's not the proper solution.

The text was updated successfully, but these errors were encountered:

kaushikacharya · 2024-07-11T06:35:02Z

`> I changed the tokenizer to the base Mixtral model, but it's not the proper solution.

`What is the tokenizer version that you are using? I am also facing a similar issue.

The issue seems to be due to recent commits in
https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/commits/main

jmuntaner-smd · 2024-07-11T14:54:11Z

I just changed the google colab line to this: tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-v0.1")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixtral Instruct tokenizer from Colab notebook doesn't work. #38

Mixtral Instruct tokenizer from Colab notebook doesn't work. #38

jmuntaner-smd commented Jul 8, 2024 •

edited

Loading

kaushikacharya commented Jul 11, 2024 •

edited

Loading

jmuntaner-smd commented Jul 11, 2024

Mixtral Instruct tokenizer from Colab notebook doesn't work. #38

Mixtral Instruct tokenizer from Colab notebook doesn't work. #38

Comments

jmuntaner-smd commented Jul 8, 2024 • edited Loading

kaushikacharya commented Jul 11, 2024 • edited Loading

jmuntaner-smd commented Jul 11, 2024

jmuntaner-smd commented Jul 8, 2024 •

edited

Loading

kaushikacharya commented Jul 11, 2024 •

edited

Loading