Question regarding TASTE Tokenizer and checkpoints

Thanks for the great work on this project.

I'm trying to train my own model and, as I understand it, due to the nature of the proposed TASTE Tokenizer, I shouldn't need to retrain TASTE even if I switch the backbone LLM from LLaMA3.2 to something else.

Therefore, I'd like to proceed directly from STAGE 2 (starting with vector quantization) using the existing TASTE Tokenizer.

However, when I try to use the TASTE checkpoint downloaded via STAGE1_TRAIN/storage/download_checkpoints.py in STAGE 2, I get an error because the configuration is not compatible with the Hugging Face Hub format.

I'd like to ask if you could upload a completed TASTE tokenizer in the correct format separately. If not, is there another way to get around this issue? Or, I was wondering if a usable TASTE tokenizer is available in either https://huggingface.co/datasets/MediaTek-Research/TASTE-Dump or https://huggingface.co/MediaTek-Research/Llama-1B-TASTE-V0.


Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding TASTE Tokenizer and checkpoints #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question regarding TASTE Tokenizer and checkpoints #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions