move_metrics_to_cpu does not work #7828
-
I used torchmetrics.AveragePrecision as the metric in PL module. This metric takes a lot of gpu memory as it save all data in buffer. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I think cc @tchaton for confirmation. @SkafteNicki @tchaton should we maybe extend this? I see that this can be misleading. @MeteorsHub The issue we do it that way is mainly to avoid having the same thing in memory twice. But I think especially for metrics like AP the only option would be to manually move it to cpu (since this will be moved to gpu together with the model). This will introduce some synchronization points though and might slow down your training! |
Beta Was this translation helpful? Give feedback.
I think
move_metrics_to_cpu
does not move the state of the metric modules but just the logged values.cc @tchaton for confirmation.
@SkafteNicki @tchaton should we maybe extend this? I see that this can be misleading.
@MeteorsHub The issue we do it that way is mainly to avoid having the same thing in memory twice. But I think especially for metrics like AP the only option would be to manually move it to cpu (since this will be moved to gpu together with the model). This will introduce some synchronization points though and might slow down your training!