为什么不直接用byte-level的bpe tokenizer #251
Answered
by
zh-zheng
silverriver
asked this question in
Q&A 问答
-
RT 那为什么不直接用类似于GPT2的byte-level BPE tokenizer?可以处理所有unicode字符 |
Beta Was this translation helpful? Give feedback.
Answered by
zh-zheng
Nov 30, 2022
Replies: 1 comment
-
ant已有中文词表,ant-plus想要继续训练的同时支持英文,肯定是使用扩充原有词表的方案,不然就只能重新训练模型了。 |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
zh-zheng
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ant已有中文词表,ant-plus想要继续训练的同时支持英文,肯定是使用扩充原有词表的方案,不然就只能重新训练模型了。