Accessing all batches at the end of epoch in callback #12999
Unanswered
kevjn
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I have been using PyTorch lightning for several months now and have had a great experience overall. I have defined several callbacks that need to use all the batch inputs in the
on_train_epoch_end
callback. I have solved this by overriding theon_train_batch_end
andon_train_end
callbacks in each Callback class (contrived example):However, with large number of callbacks I believe this has a sub-optimal memory footprint since the concatenation will allocate new memory in each callback. I would rather do the concatenation once and reference the same array in all my callbacks. The documentation for the
on_train_epoch_end
, https://pytorch-lightning.readthedocs.io/en/stable/extensions/callbacks.html#on-train-epoch-end, states:I believe my implementation above is using alternative 2), but how do I use alternative 1) ?
I thought there would be some property in the lightning module that gives access to all the batches in the training loop, but looking at the source code for PyTorch-lightning 1.6.2 inside
fit_loop::on_advance_end
:It looks like the outputs of each
training_step
in the Lightning Module is only accessible in thetrain_epoch_end
method in the Lightning Module because the memory is being freed before using the callback hooks.What is the recommended way of going about this? I have thought of doing the following:
The above code snippets assume that
batch
is a single array, but I have experienced the same dilemma when trying to visualize targets and predictions of the model in isolation from thetest_step
andtest_epoch_end
methods and their corresponding callback hooks. What is the recommended way of sharing memory across multiple callbacks ? Is using thepl_module
as a proxy for accessing shared memory considered bad practice? I can't really think of any other way to do it.Beta Was this translation helpful? Give feedback.
All reactions