You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[NVBUG: 5612606] Clear GPU cache for large models layer quantization during export (#497)
## What does this PR do?
**Type of change:** Bug fix
**Overview:** ?
For large models like llama4 maverick, the stacked weights to fp8
conversion might hit OOM. This change aim to fix that.
---------
Signed-off-by: Chenjie Luo <[email protected]>
0 commit comments