How to prepare dataset for pretraining after https://github.com/NVIDIA-NeMo/NeMo/pull/14192 #15029
Unanswered
ooooo-create
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I see the document for custom Pretrain dataset in https://docs.nvidia.com/nemo-framework/user-guide/latest/data/pretrain_data.html . It will use
scripts/nlp_language_modeling/preprocess_data_for_megatron.pywhich is removed in #14192. I want to know there are others ways to do that. Thanks!Beta Was this translation helpful? Give feedback.
All reactions