How exactly does replace_sampler_ddp work in lightning source code? #9057

wangchu1 · 2021-08-23T16:47:32Z

wangchu1
Aug 23, 2021

By searching the traces of replace_sampler_ddp in github code search, I can pinpoint the related function in the following link.

https://github.com/PyTorchLightning/pytorch-lightning/blob/e1442d247e0e4967dd2772bdcf5166226c974f89/pytorch_lightning/trainer/data_loading.py#L122

Now, my question is, when & where does the above auto_add_sampler function get called to assign distributed loaders for the trainer? Very interested in where did the code base set shuffle flag for train_loader and val_loader. Suppose I call:

trainer.fit(model, train_dataloader, val_dataloader)

where train_dataloader and val_dataloader are pytorch data loaders wrapped on a map like dataset. When we don't use lightning data module, will the trainer code still handles the auto adding ddp sampler function, and create shuffle for train and no shuffle for val?

Thanks for any help!

tchaton · 2021-08-24T08:55:25Z

tchaton
Aug 24, 2021
Maintainer

Dear @fate3439,

This is actually done there: https://github.com/PyTorchLightning/pytorch-lightning/blob/e1442d247e0e4967dd2772bdcf5166226c974f89/pytorch_lightning/trainer/data_loading.py#L307.

Here is the flow.

FitLoop -> reset_train_dataloader -> call model.train_dataloader or datamodule.train_dataloader and apply modification to the DataLoader such as injecting the distributed sampler within the DataLoader.

Best,
T.C

2 replies

wangchu1 Aug 31, 2021
Author

Thank you for your kind reply! Could you take a look at my understanding?

The fitloop calls a function right here

https://github.com/PyTorchLightning/pytorch-lightning/blob/b13749b4ec0931f16204d433f44b1e7e0775689b/pytorch_lightning/loops/fit_loop.py#L172

And this call "on_device_start" calls "reset_trainlaoder" but with condition "self.trainer._should_reload_dl_epoch". I investigated into this flag and found that it is defined here

https://github.com/PyTorchLightning/pytorch-lightning/blob/b13749b4ec0931f16204d433f44b1e7e0775689b/pytorch_lightning/trainer/properties.py#L316

Now, this ultimately traces to a variable set in trainer.py, that is
https://github.com/PyTorchLightning/pytorch-lightning/blob/86a0cb74a4ff34e17c791b07297b4e29b90e755b/pytorch_lightning/trainer/trainer.py#L285

My question is, if I do nothing, but that I mean I did not set related variables/flags, just used very standard trainer initialization, this variable reset_every_n_epoch is it supposed to be 0? And in the case of 0, it will NOT reset train loader at all, right?

I faced an issue that I had to use limit_train_batches to train on my dataset cause the entire dataset is way too big. In the case of setting
"""
limit_train_batches=0.25
"""
will the reset_every_n_epoch automatically be set as a positive number 1 by the trainer? Sorry for asking those newbie questions, I am really lost in the nested function calls implemented by lightning code base...

Thanks again!

awaelchli Sep 4, 2021

And in the case of 0, it will NOT reset train loader at all, right?

Simply put, resetting/reloading the dataloader in Lightning means it will call the train_dataloader() method on the LightningModule. reload_dataloaders_every_n_epochs=0 by default which means train_dataloader() on the LightningModule gets called only once at the beginning of training and then never again. The dataloader returned there will then be reused, meaning each epoch we will call iter(dataloader).

Not sure I understand exactly what your issue was but I hope this helps with the understanding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How exactly does replace_sampler_ddp work in lightning source code? #9057

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How exactly does replace_sampler_ddp work in lightning source code? #9057

Uh oh!

wangchu1 Aug 23, 2021

Replies: 1 comment · 2 replies

Uh oh!

tchaton Aug 24, 2021 Maintainer

Uh oh!

wangchu1 Aug 31, 2021 Author

Uh oh!

Uh oh!

awaelchli Sep 4, 2021

wangchu1
Aug 23, 2021

Replies: 1 comment 2 replies

tchaton
Aug 24, 2021
Maintainer

wangchu1 Aug 31, 2021
Author