Separate Trainer for train()
and test()
?
#12906
-
Python: 3.9.12 The docs for evaluation say:
Similarly I get a warning during runtime:
I would like to use DDP for training on 4 GPUs, does this mean I create an entirely new Trainer for just val/test? My train Trainer: trainer = pl.Trainer(
max_epochs=self.max_epochs,
logger=logger,
num_nodes=self.num_nodes,
# use 1 processes if on cpu
devices=self.num_gpus if self.num_gpus else 1,
accelerator="gpu" if self.num_gpus else "cpu",
strategy=DDPPlugin(find_unused_parameters=False)
if self.num_gpus > 1
else None,
enable_checkpointing=False,
callbacks=callbacks,
profiler="simple", # or "advanced" which is more granular
fast_dev_run=self.fast_dev_run, # For debugging
)
trainer.fit(model, datamodule) If I want to then run testing would I then do something like: test_trainer= pl.Trainer(
max_epochs=self.max_epochs,
logger=logger,
num_nodes=self.num_nodes,
# use 1 device for test
devices=1,
accelerator="gpu" if self.num_gpus else "cpu",
strategy=DDPPlugin(find_unused_parameters=False)
if self.num_gpus > 1
else None,
enable_checkpointing=False,
callbacks=callbacks,
profiler="simple", # or "advanced" which is more granular
fast_dev_run=self.fast_dev_run, # For debugging
)
test_trainer.test(model, test_dataloader) Am I understanding correctly? I don't think it's possible to set the number of devices dynamically since there's no setter defined for the |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
yes, if you care about the accuracy of the test metrics and don't want to include extra samples, this is the only way. Not just a separate trainer, if using DDP, you should create a separate script all together else the whole script, along with the |
Beta Was this translation helpful? Give feedback.
yes, if you care about the accuracy of the test metrics and don't want to include extra samples, this is the only way. Not just a separate trainer, if using DDP, you should create a separate script all together else the whole script, along with the
trainer.test
call will be launched on each device duringtrainer.fit
call.