Commit 457af3e
Avishek Goswami
Add optional prefetch to intermediates cache; enable for AWQ when offloading
- IntermediatesCache.iter_prefetch() overlaps onload of next batch with
consumption of current batch via a background thread
- AWQ _run_samples uses iter_prefetch when offload_device is set to
overlap CPU->device transfer with module forward passes
- Add test_iter_prefetch_matches_iter to verify prefetch yields same results as iter
Signed-off-by: Avishek Goswami <avishek.goswami@ibm.com>1 parent a33d4ff commit 457af3e
File tree
3 files changed
+12
-3
lines changed- src/llmcompressor
- modifiers/awq
- pipelines
- tests/llmcompressor/pipelines
3 files changed
+12
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
608 | 608 | | |
609 | 609 | | |
610 | 610 | | |
| 611 | + | |
611 | 612 | | |
612 | 613 | | |
613 | 614 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
206 | | - | |
| 206 | + | |
207 | 207 | | |
208 | 208 | | |
209 | 209 | | |
| |||
215 | 215 | | |
216 | 216 | | |
217 | 217 | | |
218 | | - | |
| 218 | + | |
219 | 219 | | |
220 | | - | |
| 220 | + | |
221 | 221 | | |
222 | 222 | | |
223 | 223 | | |
224 | 224 | | |
225 | 225 | | |
226 | 226 | | |
| 227 | + | |
227 | 228 | | |
228 | 229 | | |
229 | 230 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
56 | 63 | | |
57 | 64 | | |
58 | 65 | | |
| |||
0 commit comments