How do I debug a memory leak in the training loop? #11756
Unanswered
NightMachinery
asked this question in
Q&A
Replies: 1 comment 5 replies
-
There are known memory leaks in JAX under the CUDA/CuDNN version in Colab. There isn't much we can do about them except wait for Colab to update. That may or may not be your problem. It's not going to be possible for us to debug this without a complete self-contained runnable example. JAX device objects have a |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
How do I debug a memory leak in the training loop?
I am using JAX on Colab with GPU.
I have tried stopping the training, and then running the following, but it doesn't show any big global variables:
I have attached a copy of the whole notebook I run. You need to run the headings
Bootstrap
andFanfic classification
. The memory usage will continue to grow as the training progresses until the machine crashes.And here is my training loop:
Beta Was this translation helpful? Give feedback.
All reactions