Commit f2c5f5b
Clear reference to training loss at the end of train step (#9336)
Without clearing this reference, the loss tensor stays live through the next training
step. This can be a problem for memory intensive models that produce very deep backward
graphs such as neural ODEs. For these models, keeping the backward graph of the previous
loss in memory can lead to OOM errors in the next training step even though the step might
have succeeded if we had cleared (and thus GC'd) the previous backward graph.
Co-authored-by: tchaton <[email protected]>
Co-authored-by: Carlos Mocholi <[email protected]>
Co-authored-by: Adrian Wälchli <[email protected]>1 parent a5ad966 commit f2c5f5b
File tree
2 files changed
+22
-1
lines changed- pytorch_lightning/trainer/connectors/logger_connector
- tests/trainer/loops
2 files changed
+22
-1
lines changedLines changed: 9 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
201 | 201 | | |
202 | 202 | | |
203 | 203 | | |
204 | | - | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
205 | 210 | | |
206 | 211 | | |
207 | 212 | | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
208 | 216 | | |
209 | 217 | | |
210 | 218 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
190 | 190 | | |
191 | 191 | | |
192 | 192 | | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
0 commit comments