Accumulate features and process in training_step_end across multiple training steps. #10593
Unanswered
jipson7
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment 1 reply
-
Dear @jipson7, I am wondering why you can't implement this directly within your LightningModule ? class Model(LightningModule):
def __init__(self):
super().__init__()
self.train_batches = []
def training_step(self, batch, batch_idx):
if batch_idx > 0 and batch_idx % 10 == 0:
# do something with all your batches
return loss
else:
self.train_batches.append(batch)
return None # should skip optimization |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
I am processing images and performing an NCELoss type calculation on the features, where it is desirable to have a bigger effective batch size. Current Im processing the features using
dp
so that in training_step_end I can calculate the loss across features from all gpus. However this gets hard to manage when I have a distributed cluster where nodes have different numbers of gpus. Also its a bit slower than ddp.What I'd like is something similar to
accumulate_grad_batches
, but rather than accumulating gradients I'd like to just accumulate the feature output fromtraining_step
, and then runtraining_step_end
once every Ntraining_step
s. The benefit is that I can leverageddp
and still get a large effective batch size, and more features to calculate the NCELoss with.Any suggestions are appreciated, thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions