Ask for the training details about tokenizer

Hello! Thanks for your great job! And I have a question: you say Tokenizer Training process requires a system with more than 2TB of RAM and takes approximately 12 hours for each. But when I reproduced it (by running python ./utils/tokenizer.py --dataset eth --model bpe --metric pixel ), it took only a few minutes to get the result. I wonder if any other operations are needed? thank you!