Commit 0d556a7
[Sequential Pipeline] only cache unique offloaded values (#2366)
Updated by @brian-dellabetta
SUMMARY:
The SequentialPipeline offloads subgraph outputs as part of normal
usage. Occasionally these outputs share duplicates in kwargs that point
to the same memory location on the onloaded device. When offloading is
enabled, there was previously no check to see if any tensors to be
offloaded had already previously been offloaded, which can cause a huge
increase in memory requirements in some models, as reported in #2363.
This PR
- [x] adds an offload map to IntermediatesCache to ensure tensors are
not redundantly offloaded
- [x] wraps the map in an override to ensure `torch.equal` is used
rather than `torch.eq` (which is the one used with `==` checks).
`torch.eq` can return multiple boolean values depending on the tensors
being compared, resulting in an error. This override, which should only
be used when the tensors are immutable (the case here), allows us to
retain the original hashing function and have an `O(1)` lookup. Our
other attempts to circumvent the issue added to runtime or required
`O(N)` lookup.
Resolves #2363
TEST PLAN:
- [x] Unit test added for `OverrideEqMode`
- [x] Script from #2363 runs with ~81GB CPU RAM after first layer
propagation, increased to ~88GB CPU RAM used by layer 11/49, and then
stays consistently <89GB CPU RAM used by layer 25/49. On current main,
this script would hit ~750GB CPU RAM usage during first layer
propagastion
---------
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
Co-authored-by: Brian Dellabetta <bdellabe@redhat.com>1 parent 556b503 commit 0d556a7
File tree
2 files changed
+67
-3
lines changed- src/llmcompressor/pipelines
- tests/llmcompressor/pipelines
2 files changed
+67
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
| 11 | + | |
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
13 | 46 | | |
14 | 47 | | |
15 | 48 | | |
| |||
42 | 75 | | |
43 | 76 | | |
44 | 77 | | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
45 | 82 | | |
46 | 83 | | |
47 | 84 | | |
| |||
154 | 191 | | |
155 | 192 | | |
156 | 193 | | |
| 194 | + | |
157 | 195 | | |
158 | 196 | | |
159 | 197 | | |
160 | 198 | | |
161 | 199 | | |
162 | 200 | | |
163 | | - | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
164 | 204 | | |
165 | 205 | | |
166 | 206 | | |
| |||
239 | 279 | | |
240 | 280 | | |
241 | 281 | | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
242 | 291 | | |
243 | | - | |
| 292 | + | |
244 | 293 | | |
245 | 294 | | |
246 | 295 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
0 commit comments