Split one model's different parts on different gpus #7162
Replies: 8 comments 3 replies
-
PyTorch Lightning has support for self.model = nn.Sequential(Bert(), Linear(10, 20)) # __init__()
...
...
self.model(x) # forward()
...
plugin = RPCSequentialPlugin(balance=[1, 1])
trainer = Trainer() |
Beta Was this translation helpful? Give feedback.
-
Hey @dalek-who, I won't recommend to use Instead, you can DeepSpeed Integration: https://pytorch-lightning.readthedocs.io/en/stable/advanced/multi_gpu.html?highlight=deepspeed#deepspeed. We managed to scale crazy large model. It can also be used on only 1 gpu with Give it a try and give us feedback. Best, |
Beta Was this translation helpful? Give feedback.
-
@tchaton Can you provide a simple |
Beta Was this translation helpful? Give feedback.
-
Oh, I wasn't aware of the deprecation. Sorry about that. |
Beta Was this translation helpful? Give feedback.
-
Hey guys :) Regarding the deprecation of the DeepSpeed Stage 3 offers the same practice which we already have within Lightning. A minimal example of how all this can work can be found here: https://github.com/SeanNaren/minGPT/tree/stage3 Regarding a layer (in this case We are planning on a refresh in the documentation to make it easier to find these tidbits, as things have become a bit complex in the ecosystem. For a small example: class MyLargeModel(pl.LightningModule):
def __init__(self):
super().__init__()
# a large backbone like bert
self.bert = Bert()
def configure_sharded_model(self):
# a very very large classifier layer with 6 million classes, is now sharded instantly onto all GPUs
# Using DeepSpeed Stage 3
self.classifier = nn.Linear(768, 6_000_000)
def forward(x):
emb = self.bert(x)
score = self.classifier(emb)
return score
trainer = pl.Trainer(
gpus=4,
plugins='deepspeed_stage_3'
)
trainer.fit(model) DeepSpeed Stage 3 shards the model across all GPUs, but |
Beta Was this translation helpful? Give feedback.
-
@SeanNaren which torch and pytorch-lightning version should I use? |
Beta Was this translation helpful? Give feedback.
-
Dear @dalek-who, You should you PyTorch 1.3.0rc1 and latest PyTorch. Best, |
Beta Was this translation helpful? Give feedback.
-
@tchaton I use pl-1.3.0rc1 and torch-1.8.1. Some problems of this solution:
File "/home/projects/long_tail_link/link_main.py", line 479, in main
trainer.test(model=pl_module, verbose=False)
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 956, in test
results = self.fit(model)
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 485, in fit
self.pre_dispatch()
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 512, in pre_dispatch
self.accelerator.pre_dispatch(self)
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 105, in pre_dispatch
self.training_type_plugin.pre_dispatch()
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 234, in pre_dispatch
self.init_deepspeed()
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 239, in init_deepspeed
self._format_config()
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 395, in _format_config
self._format_batch_size_and_grad_accum_config()
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 407, in _format_batch_size_and_grad_accum_config
batch_size = self.lightning_module.train_dataloader().batch_sampler.batch_size
AttributeError: 'NoneType' object has no attribute 'batch_sampler'
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
🚀 Feature
Motivation
In my case, I have a simplified large model like this:
self.classifier
is so large that it must be on another gpu.However, if I simply set
pl.Trainer
:It will copy the model on two gpus (and both will raise
CUDA out of memory
), rather than split it on two gpus.Pitch
A easy way to manually split one model on different device like the tutorial above.
Alternatives
Additional context
Beta Was this translation helpful? Give feedback.
All reactions