Skip to content
Discussion options

You must be logged in to vote

您好,
175B的GPT3使用了大概300B Token的数据,目前我们清洗之后的200G数据大约是50B的Token数量,用来训练10B的模型在量级上应该是足够的。
此外200G目前也是初始阶段的数据量,后续有新的数据我们也会不断增加~
感谢您的提问!

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@xv44586
Comment options

@jayzzhou-thu
Comment options

@xv44586
Comment options

@jayzzhou-thu
Comment options

@XiaoqingNLP
Comment options

Answer selected by zh-zheng
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants