how to use Apex DistributedDataParallel with Lightining? #10922

mostafaelaraby · 2021-12-03T15:51:09Z

mostafaelaraby
Dec 3, 2021

I was wondering if there's a way to use apex.parallel.DistributedDataParallel instead of pytorch native DistributedDataParallel. (I am trying to reproduce a paper that used Apex DDP and apex mixed precision and i am getting lower results using pytorch native one)

Answered by awaelchli

Dec 5, 2021

Here is a quick draft of what you could try:

from pytorch_lightning.plugins.training_type import DDPPlugin
from apex.parallel import DistributedDataParallel
class ApexDDPPlugin(DDPPlugin):

    def _setup_model(self, model: Module):
        return  DistributedDataParallel(module=model, device_ids=self.determine_ddp_device_ids(), **self._ddp_kwargs)

    @property
    def lightning_module(self):
        return self.module.module

I'm not sure if apex DistributedDataParallel supports device ids (it seems not??), you may need to remove it.

Use it in the trainer:

trainer = Trainer(gpus=2, strategy=ApexDDPPlugin(), precision=...)
trainer.fit(model)

View full answer

awaelchli · 2021-12-05T01:11:00Z

awaelchli
Dec 5, 2021

Here is a quick draft of what you could try:

from pytorch_lightning.plugins.training_type import DDPPlugin
from apex.parallel import DistributedDataParallel
class ApexDDPPlugin(DDPPlugin):

    def _setup_model(self, model: Module):
        return  DistributedDataParallel(module=model, device_ids=self.determine_ddp_device_ids(), **self._ddp_kwargs)

    @property
    def lightning_module(self):
        return self.module.module

I'm not sure if apex DistributedDataParallel supports device ids (it seems not??), you may need to remove it.

Use it in the trainer:

trainer = Trainer(gpus=2, strategy=ApexDDPPlugin(), precision=...)
trainer.fit(model)

1 reply

mostafaelaraby Dec 5, 2021
Author

Thank you @awaelchli I started from your code and it worked here's a fixed snippet

def unwrap_lightning_module(wrapped_model):
    from apex.parallel import DistributedDataParallel
    from pytorch_lightning.overrides.base import (
        _LightningModuleWrapperBase,
        _LightningPrecisionModuleWrapperBase,
    )

    model = wrapped_model
    if isinstance(model, DistributedDataParallel):
        model = unwrap_lightning_module(model.module)
    if isinstance(
        model, (_LightningModuleWrapperBase, _LightningPrecisionModuleWrapperBase)
    ):
        model = unwrap_lightning_module(model.module)
    return model


class ApexDDPPlugin(DDPPlugin):
    def _setup_model(self, model):
        from apex.parallel import DistributedDataParallel

        return DistributedDataParallel(model, delay_allreduce=False)

    @property
    def lightning_module(self):
        return unwrap_lightning_module(self._model)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to use Apex DistributedDataParallel with Lightining? #10922

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

how to use Apex DistributedDataParallel with Lightining? #10922

Uh oh!

mostafaelaraby Dec 3, 2021

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

awaelchli Dec 5, 2021

Uh oh!

Uh oh!

mostafaelaraby Dec 5, 2021 Author

mostafaelaraby
Dec 3, 2021

Replies: 1 comment 1 reply

awaelchli
Dec 5, 2021

mostafaelaraby Dec 5, 2021
Author