Skip to content

Commit d266754

Browse files
yifuwangYifu Wang
authored andcommitted
Ensure the existence of DDPPlugin._sync_dir in reconciliate_processes (#8939)
Co-authored-by: Yifu Wang <yifuwang@[email protected]>
1 parent 7f999a2 commit d266754

File tree

2 files changed

+8
-0
lines changed

2 files changed

+8
-0
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
2222

2323
## [1.4.1] - 2021-08-03
2424

25+
- Ensure the existence of `DDPPlugin._sync_dir` in `reconciliate_processes` ([#8939](https://github.com/PyTorchLightning/pytorch-lightning/pull/8939))
26+
2527
- Restore original loaders if replaced by entrypoint ([#8885](https://github.com/PyTorchLightning/pytorch-lightning/pull/8885))
2628

2729
- Fixed `trainer.fit_loop.split_idx` always returning `None` ([#8601](https://github.com/PyTorchLightning/pytorch-lightning/pull/8601))

pytorch_lightning/plugins/training_type/ddp.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
import sys
2020
import tempfile
2121
import time
22+
from pathlib import Path
2223
from time import sleep
2324
from typing import Any, Dict, List, Optional, Union
2425

@@ -435,6 +436,11 @@ def reconciliate_processes(self, trace: str):
435436

436437
sync_dir = self._sync_dir
437438

439+
# The cluster may be configured to periodically purge the `/tmp`
440+
# directory, in which case `sync_dir` may not exist anymore at this
441+
# point. Idempotently create it to ensure its existence.
442+
Path(sync_dir).mkdir(parents=True, exist_ok=True)
443+
438444
# save a file locally.
439445
torch.save(True, os.path.join(sync_dir, f"{self.global_rank}.pl"))
440446

0 commit comments

Comments
 (0)