Commit ca0dcf8
authored
Reduce Peak WAN inference VRAM usage - part II (Comfy-Org#10062)
* flux: math: Use _addcmul to avoid expensive VRAM intermediate
The rope process can be the VRAM peak and this intermediate
for the addition result before releasing the original can OOM.
addcmul_ it.
* wan: Delete the self attention before cross attention
This saves VRAM when the cross attention and FFN are in play as the
VRAM peak.1 parent d4176c4 commit ca0dcf8
2 files changed
+5
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
41 | 44 | | |
42 | 45 | | |
43 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
237 | 237 | | |
238 | 238 | | |
239 | 239 | | |
| 240 | + | |
240 | 241 | | |
241 | 242 | | |
242 | 243 | | |
| |||
0 commit comments