Gradient Checkpointing in LLMs #794
zslittlehelper
started this conversation in
General
Replies: 1 comment
-
For future ref, it is possible to enable it to save vram. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In Stable Diffusion, gradient checkpointing is almost taken for granted, and is often a must.
In LLM's though, I am uncertain whether it's a concept that either isn't mentioned often, or is so ingrained no one bothers to mention it.
They recently talked about it on Twitter ( https://x.com/prateeky2806/status/1717807126041534921 ) which made me curious how things are in Axolotl?
Beta Was this translation helpful? Give feedback.
All reactions