为什么padded_vocab_size和词表大小不一样 #491
-
padded_vocab_size=65024 如果自己填充词表的话(加五百个左右的special token)需要做什么改动呢 |
Beta Was this translation helpful? Give feedback.
Answered by
zRzRzRzRzRzRzR
Nov 30, 2023
Replies: 1 comment 1 reply
-
padded_vocab_size通常大于实际的词表大小,这是为了提高计算效率、优化硬件使用,以及为将来可能的词汇扩展留出空间。 |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
zRzRzRzRzRzRzR
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
padded_vocab_size通常大于实际的词表大小,这是为了提高计算效率、优化硬件使用,以及为将来可能的词汇扩展留出空间。
自己添加词表不太现实,因为改动词表还得改Embed,还得改模型的参数和初始化,基本等于重新训一次模型