Commit b09e96e
Fix ModelCheckpoint file_exists OOM in DDP (#21380)
* Fix ModelCheckpoint.file_exists OOM in DDP
* Document ModelCheckpoint.file_exists DDP memory fix
* Update src/lightning/pytorch/callbacks/model_checkpoint.py
---------
Co-authored-by: Justus Schock <[email protected]>1 parent ef489f2 commit b09e96e
File tree
3 files changed
+33
-3
lines changed- src/lightning/pytorch
- callbacks
- tests/tests_pytorch/checkpointing
3 files changed
+33
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
85 | 88 | | |
86 | 89 | | |
87 | 90 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
997 | 997 | | |
998 | 998 | | |
999 | 999 | | |
1000 | | - | |
| 1000 | + | |
1001 | 1001 | | |
1002 | | - | |
1003 | | - | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
1004 | 1006 | | |
1005 | 1007 | | |
1006 | 1008 | | |
| |||
Lines changed: 25 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
121 | 121 | | |
122 | 122 | | |
123 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
0 commit comments