修改gradient checkpointing方法,进一步降低训练显存 #648
Closed
ninghongbo123
started this conversation in
Bad Case
Replies: 1 comment
-
你可以到 #253 问问?我没有太看懂 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
1、目前的gradient checkpointing是对每一层进行checkpoint,即28个checkpoint;
2、我想进一步降低显存,于是只对其中14个进行checkpoint,但是为什么显存需求更大导致oom?理论上会进一步降低显存才对呢?
3、可能是我修改的方式不对,或者理解不对?
请大佬指导一二。
Beta Was this translation helpful? Give feedback.
All reactions