Commit 98e7f22
authored
enable skipping of SW attention layers when using FP8 KV cache (vllm-project#33695)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>1 parent b111f8a commit 98e7f22
File tree
4 files changed
+58
-0
lines changed- tests/quantization
- vllm
- config
- engine
- model_executor/layers/attention
4 files changed
+58
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
466 | 466 | | |
467 | 467 | | |
468 | 468 | | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
90 | 93 | | |
91 | 94 | | |
92 | 95 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
597 | 597 | | |
598 | 598 | | |
599 | 599 | | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
600 | 603 | | |
601 | 604 | | |
602 | 605 | | |
| |||
1003 | 1006 | | |
1004 | 1007 | | |
1005 | 1008 | | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
1006 | 1012 | | |
1007 | 1013 | | |
1008 | 1014 | | |
| |||
1578 | 1584 | | |
1579 | 1585 | | |
1580 | 1586 | | |
| 1587 | + | |
1581 | 1588 | | |
1582 | 1589 | | |
1583 | 1590 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
240 | 240 | | |
241 | 241 | | |
242 | 242 | | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
243 | 268 | | |
244 | 269 | | |
245 | 270 | | |
| |||
0 commit comments