predict with multiple GPUs doesn't aggregate the predictions even with on_predict_end or on_predict_batch_end #7852
-
I train a model with 2 GPUs, when running To see the issue, simple create a class Model(pl.LightningModule):
...
def predict_step(self, batch, batch_idx, dataloader_idx):
y = self(x)
return {"predict":y}
m = Modle(...) and use any trainer = pl.Trainer(gpus=2, accelerator=ddp)
predictions = trainer.predict(model=m, datamodule=dm) but we'll get two |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
You can either sync them by writing them both to disk (we have a |
Beta Was this translation helpful? Give feedback.
You can either sync them by writing them both to disk (we have a
PredictionWriter
for that) or you can useself.all_gather
inside yourpredict_step
to sync across GPUs (note that this can lead to GPU OOM!)