What is the best place to tokenizer text data to avoid deadlocks? #12165
Unanswered
celsofranssa
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment
-
are you using any distributed setting? ddp or ddp_spawn, etc.. if yes, you can simply ignore this since prepare_data is called from within a process, but only on global_rank/local_rank zero, so it won't go into a deadlock situation. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Below is my implementation of the
DataModule
:The tokenization is done entirely in the
prepare_data
method. However, I get the following warning:So what would be the best place to tokenize the dataset and avoid the issues reported in the warning?
Beta Was this translation helpful? Give feedback.
All reactions