Serialize model predictions when running on multiple GPUs #8699

aleSuglia · 2021-08-03T13:43:50Z

aleSuglia
Aug 3, 2021

Hi there,

By looking at the way the Trainer.predict works, it seems to me that the model is storing the predictions internally and then the idea is to serialise them at the end of the prediction loop. However, in my case, storing all the predictions is infeasible (they will occupy too much memory). Therefore, I've implemented a Callback that writes at each batch on disk intermediate results and then it's supposed to collect them and group them together at the end of the entire prediction loop.

At the moment, I've achieved the first part by implementing on_predict_batch_end() to store the intermediate results on disk. Then, I implemented on_predict_epoch_end() in order to make sure that the results are grouped back together. However, I have the impression that on_predict_epoch_end() is run when a specific GPU finishes and it's not synchronised among GPUs. How can I hook this function to the end of the prediction loop from all the GPUs? Is adding @rank_zero_only a correct solution for this scenario?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serialize model predictions when running on multiple GPUs #8699

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Serialize model predictions when running on multiple GPUs #8699

Uh oh!

Uh oh!

aleSuglia Aug 3, 2021

Replies: 0 comments

aleSuglia
Aug 3, 2021