iterations/gpu don't scale when using custom sampler #3716

bartmch · 2020-09-29T06:45:35Z

bartmch
Sep 29, 2020

Hi! I'm currently using Pytorch's weighted random sampler for my multi-class skewed dataset and I've put "use_ddp_sampler" to False.
Everything works well with the custom sampler but it's taking a significantly longer time to train my model on multiple GPUs.
I notice that with the default sampler my 2000 iterations scale well over 4 GPUs, 500 iterations/gpu but with the custom sampler, it doesn't scale and stays at 2000 iterations/GPU over 4 GPUs - highly likely explaining the longer training times.

How can I speed up my training procedure using a custom sampler?

Edit: Trainer settings:

trainer = pl.Trainer(   gpus                = args.gpus, 
                            num_nodes           = 1,
                            distributed_backend = 'ddp', 
                            max_epochs          = args.epochs,
                            weights_save_path   = args.weights_save_path, 
                            logger              = True,
                            replace_sampler_ddp = False,
                            checkpoint_callback = checkpoint_callback)

Answered by rohitgr7

Sep 29, 2020

All you need is a custom sampler here + set replace_sampler_ddp=False in Trainer. WeightedRandomSampler just uses the weights which you need to define to sample the batch from the dataset something similar to what boosting algorithms do while sampling. In your use case, you need some kind of DistributedBalancedSampler that can do either oversampling or undersampling. There are some discussions here which partially might solve your use-case in the future. But for now neither lightning nor PyTorch has this yet. But you check some custom sampler here that might help.

View full answer

rohitgr7 · 2020-09-29T07:58:01Z

rohitgr7
Sep 29, 2020

WeightedRandomSampler is not a distributed sampler right?

0 replies

ananyahjha93 · 2020-09-29T18:25:21Z

ananyahjha93
Sep 29, 2020

@bartmch I think @rohitgr7 is right here, the base class for both DistributedSampler and WeightedRandomSampler is Sampler but its the DistributedSampler which implements the DDP capabilities.

0 replies

bartmch · 2020-09-29T18:34:39Z

bartmch
Sep 29, 2020
Author

Thanks for your replies! Two silly questions: 1) how can I implement a sampler in Lightning which is ddp and takes care of my unbalanced dataset? 2) weighted sampler is simply giving a sampling probability to each number right? Why doesn't it simply scale on multiple goud? Is it not implemented (yet) or am I understanding it wrong?

0 replies

rohitgr7 · 2020-09-29T19:24:31Z

rohitgr7
Sep 29, 2020

All you need is a custom sampler here + set replace_sampler_ddp=False in Trainer. WeightedRandomSampler just uses the weights which you need to define to sample the batch from the dataset something similar to what boosting algorithms do while sampling. In your use case, you need some kind of DistributedBalancedSampler that can do either oversampling or undersampling. There are some discussions here which partially might solve your use-case in the future. But for now neither lightning nor PyTorch has this yet. But you check some custom sampler here that might help.

0 replies

rohitgr7 · 2020-10-09T19:45:09Z

rohitgr7
Oct 9, 2020

@bartmch I hope the above suggestion might help.

0 replies

bartmch · 2020-10-11T08:38:07Z

bartmch
Oct 11, 2020
Author

Hey @rohitgr7 apologies for the delay - I'll try the solution this week.

1 reply

bw4sz Mar 29, 2021

Any updates here, this seems like a critical need, since the vast majority of classification tasks are unbalanced. What is the suggested go to right now? Write a custom sampler and wrap it in distributed, as in above?

gianscarpe · 2021-09-23T12:51:22Z

gianscarpe
Sep 23, 2021

Hi! Just found this discussion, seems a critical feature to implement directly in lightning! I'm working on my custom implementation for the moment

@Borda @awaelchli

0 replies

iterations/gpu don't scale when using custom sampler #3716

Uh oh!

Uh oh!

bartmch Sep 29, 2020

Replies: 7 comments · 1 reply

Uh oh!

rohitgr7 Sep 29, 2020

Uh oh!

ananyahjha93 Sep 29, 2020

Uh oh!

bartmch Sep 29, 2020 Author

Uh oh!

Uh oh!

rohitgr7 Sep 29, 2020

Uh oh!

rohitgr7 Oct 9, 2020

Uh oh!

bartmch Oct 11, 2020 Author

Uh oh!

bw4sz Mar 29, 2021

Uh oh!

Uh oh!

gianscarpe Sep 23, 2021

bartmch
Sep 29, 2020

Replies: 7 comments 1 reply

rohitgr7
Sep 29, 2020

ananyahjha93
Sep 29, 2020

bartmch
Sep 29, 2020
Author

rohitgr7
Sep 29, 2020

rohitgr7
Oct 9, 2020

bartmch
Oct 11, 2020
Author

gianscarpe
Sep 23, 2021