You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[#4674][bugfix] AutoDeploy Fix memory leak in fuse_moe (#7844)
Delete the unstacked weights immediately to save GPU memory, cleanup occurs automatically after the transformation, but for large models we'll run out of memory during the transformation itself.
Signed-off-by: Gal Hubara Agam <[email protected]>
0 commit comments