Commit 519c941
authored
Prs/lora reservations (reduce massive Lora reservations especially on Flux2) (#11069)
* mp: only count the offload cost of math once
This was previously bundling the combined weight storage and computation
cost
* ops: put all post async transfer compute on the main stream
Some models have massive weights that need either complex
dequantization or lora patching. Don't do these patchings on the offload
stream, instead do them on the main stream to syncrhonize the
potentially large vram spikes for these compute processes. This avoids
having to assume a worst case scenario of multiple offload streams
all spiking VRAM is parallel with whatever the main stream is doing.1 parent 861817d commit 519c941
2 files changed
+24
-19
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
704 | 704 | | |
705 | 705 | | |
706 | 706 | | |
707 | | - | |
| 707 | + | |
708 | 708 | | |
709 | 709 | | |
710 | 710 | | |
| |||
883 | 883 | | |
884 | 884 | | |
885 | 885 | | |
886 | | - | |
| 886 | + | |
887 | 887 | | |
888 | 888 | | |
889 | 889 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
118 | 122 | | |
119 | 123 | | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
126 | 129 | | |
127 | | - | |
128 | 130 | | |
129 | | - | |
| 131 | + | |
130 | 132 | | |
131 | 133 | | |
132 | 134 | | |
| |||
135 | 137 | | |
136 | 138 | | |
137 | 139 | | |
138 | | - | |
139 | | - | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
140 | 145 | | |
141 | | - | |
| 146 | + | |
142 | 147 | | |
143 | | - | |
144 | | - | |
| 148 | + | |
| 149 | + | |
145 | 150 | | |
146 | 151 | | |
147 | 152 | | |
| |||
0 commit comments