Replies: 3 comments 1 reply
-
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
-
@guangyey , I'm not sure whether it is a bug. It is suspicious. We need to trace the memory footprint to understand the logic. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
In summary, during fine-tuning process, XPU reserved 8GB VRAM more than CUDA,
make it impossible to port to BMG-12GB.This might not be a bug, so I post here for discussion
Reproduce steps:
What I have tried:
I register a hook during backward,
I also print the memory stats between different steps
The log shows there are ~8GB allocated even before the first operator got backward
I modify the XPU allocator to record each time of the memory allocation:
Indeed the extra 8GB allocation is recorded but not reasonable
Note that 4202692608 is 4008*1024**2, 4200595456 is 4006*1024**2, and there are no tensors that has shapes divided or multiplied by 4006 or 4008.
You can see no extra memory reserved before the the first layer got backward.
Beta Was this translation helpful? Give feedback.
All reactions