iterations/gpu don't scale when using custom sampler #3716
-
Hi! I'm currently using Pytorch's weighted random sampler for my multi-class skewed dataset and I've put "use_ddp_sampler" to False. How can I speed up my training procedure using a custom sampler? Edit: Trainer settings:
|
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 1 reply
-
WeightedRandomSampler is not a distributed sampler right? |
Beta Was this translation helpful? Give feedback.
-
@bartmch I think @rohitgr7 is right here, the base class for both DistributedSampler and WeightedRandomSampler is Sampler but its the DistributedSampler which implements the DDP capabilities. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your replies! Two silly questions: 1) how can I implement a sampler in Lightning which is ddp and takes care of my unbalanced dataset? 2) weighted sampler is simply giving a sampling probability to each number right? Why doesn't it simply scale on multiple goud? Is it not implemented (yet) or am I understanding it wrong? |
Beta Was this translation helpful? Give feedback.
-
All you need is a custom sampler here + set |
Beta Was this translation helpful? Give feedback.
-
@bartmch I hope the above suggestion might help. |
Beta Was this translation helpful? Give feedback.
-
Hey @rohitgr7 apologies for the delay - I'll try the solution this week. |
Beta Was this translation helpful? Give feedback.
-
Hi! Just found this discussion, seems a critical feature to implement directly in lightning! I'm working on my custom implementation for the moment |
Beta Was this translation helpful? Give feedback.
All you need is a custom sampler here + set
replace_sampler_ddp=False
inTrainer
.WeightedRandomSampler
just uses theweights
which you need to define to sample the batch from the dataset something similar to what boosting algorithms do while sampling. In your use case, you need some kind ofDistributedBalancedSampler
that can do either oversampling or undersampling. There are some discussions here which partially might solve your use-case in the future. But for now neither lightning nor PyTorch has this yet. But you check some custom sampler here that might help.