Teardown in ClusterEnviroment #10255
Unanswered
four4fish
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Teardown behavior in Cluster Environment is not consistent. Is this by design?
Some environment has teardown logic to delete os.env flag, some doesn't. But in TTP, setup environment config logic applied to any cluster_enviroment
In DDP:
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/plugins/training_type/ddp.py#L193-L202
In ClusterEnviroment:
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/plugins/environments/cluster_environment.py#L72-L74
In LightningEnviroment:
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/plugins/environments/lightning_environment.py#L80-L82
All other environment don't have tear down logic override.
Only world size os env has been deleted, but not master port, master_address, rank, global_rank and etc.
For example, several os.env has been set in DDP:
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/plugins/training_type/ddp.py#L197-L202
In LightningEnviroment:
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/plugins/environments/lightning_environment.py#L80-L82
Beta Was this translation helpful? Give feedback.
All reactions