Commit f8ac20a
committed
[Whisper] Enable CUDA graph support for encoder-decoder models
Replace manual BMM cross-attention with RadixAttention to enable CUDA
graph capture/replay for the Whisper decode path. The encoder KV cache
is now stored in the standard KV pool via the attention backend's
encoder_out_cache_loc mechanism.
Key changes:
- Cross-attention uses RadixAttention with k=None,v=None during decode
to read cached encoder KV from the pool
- pad_input_ids prepends dummy encoder tokens and sets num_image_tokens
so prepare_encoder_info_extend allocates encoder KV cache locations
- Auto-select flashinfer backend for encoder-decoder models
- Auto-disable radix cache to avoid prefix matching conflicts
- Set encoder_len_fill_value to actual encoder length during CUDA graph
capture so cross-attention kernels are properly recorded
- Fix cross-attention seq_lens_cpu in FlashInfer decode updater: use
encoder_lens instead of decoder seq_lens to prevent
global_override_indptr_cpu from overriding the correct KV length
- Add encoder_out_cache_loc support in trtllm_mha backend
- Clamp decoder position_ids to max_target_positions
Benchmark (earnings22, 511 samples, concurrency=1):
WER: 12.77% (identical with/without CUDA graph)
Throughput: 3.26 req/s (+36% vs 2.40 without CUDA graph)
Avg latency: 0.297s (-27% vs 0.406s)1 parent 32a85ef commit f8ac20a
File tree
7 files changed
+233
-103
lines changed- python/sglang/srt
- layers/attention
- model_executor
- models
- test/manual
7 files changed
+233
-103
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1048 | 1048 | | |
1049 | 1049 | | |
1050 | 1050 | | |
| 1051 | + | |
| 1052 | + | |
1051 | 1053 | | |
1052 | 1054 | | |
1053 | | - | |
1054 | 1055 | | |
1055 | 1056 | | |
| 1057 | + | |
1056 | 1058 | | |
1057 | | - | |
| 1059 | + | |
1058 | 1060 | | |
1059 | 1061 | | |
1060 | 1062 | | |
| 1063 | + | |
1061 | 1064 | | |
1062 | 1065 | | |
1063 | 1066 | | |
| |||
1067 | 1070 | | |
1068 | 1071 | | |
1069 | 1072 | | |
1070 | | - | |
| 1073 | + | |
1071 | 1074 | | |
1072 | 1075 | | |
1073 | 1076 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
703 | 703 | | |
704 | 704 | | |
705 | 705 | | |
706 | | - | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
707 | 711 | | |
708 | 712 | | |
709 | 713 | | |
| |||
788 | 792 | | |
789 | 793 | | |
790 | 794 | | |
791 | | - | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
792 | 800 | | |
793 | 801 | | |
794 | 802 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
590 | 590 | | |
591 | 591 | | |
592 | 592 | | |
593 | | - | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
594 | 599 | | |
595 | 600 | | |
596 | 601 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2044 | 2044 | | |
2045 | 2045 | | |
2046 | 2046 | | |
2047 | | - | |
| 2047 | + | |
| 2048 | + | |
| 2049 | + | |
| 2050 | + | |
| 2051 | + | |
2048 | 2052 | | |
2049 | 2053 | | |
2050 | 2054 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
| 97 | + | |
97 | 98 | | |
| 99 | + | |
98 | 100 | | |
99 | 101 | | |
100 | 102 | | |
101 | 103 | | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
161 | 107 | | |
162 | 108 | | |
163 | 109 | | |
| |||
394 | 340 | | |
395 | 341 | | |
396 | 342 | | |
| 343 | + | |
397 | 344 | | |
398 | 345 | | |
399 | 346 | | |
| |||
420 | 367 | | |
421 | 368 | | |
422 | 369 | | |
423 | | - | |
424 | 370 | | |
425 | 371 | | |
426 | 372 | | |
| |||
468 | 414 | | |
469 | 415 | | |
470 | 416 | | |
471 | | - | |
472 | | - | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
473 | 425 | | |
474 | 426 | | |
475 | 427 | | |
| |||
479 | 431 | | |
480 | 432 | | |
481 | 433 | | |
482 | | - | |
483 | | - | |
484 | | - | |
485 | | - | |
486 | | - | |
487 | | - | |
488 | | - | |
489 | | - | |
490 | | - | |
491 | | - | |
492 | | - | |
493 | | - | |
494 | | - | |
495 | | - | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
496 | 440 | | |
497 | | - | |
498 | | - | |
499 | | - | |
500 | | - | |
| 441 | + | |
| 442 | + | |
501 | 443 | | |
502 | 444 | | |
503 | | - | |
504 | | - | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
505 | 450 | | |
506 | 451 | | |
507 | 452 | | |
| |||
513 | 458 | | |
514 | 459 | | |
515 | 460 | | |
516 | | - | |
| 461 | + | |
517 | 462 | | |
518 | 463 | | |
519 | | - | |
520 | | - | |
521 | | - | |
522 | | - | |
| 464 | + | |
| 465 | + | |
523 | 466 | | |
524 | 467 | | |
525 | | - | |
526 | | - | |
527 | | - | |
| 468 | + | |
528 | 469 | | |
529 | 470 | | |
530 | | - | |
| 471 | + | |
531 | 472 | | |
532 | 473 | | |
533 | 474 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2181 | 2181 | | |
2182 | 2182 | | |
2183 | 2183 | | |
| 2184 | + | |
| 2185 | + | |
| 2186 | + | |
| 2187 | + | |
2184 | 2188 | | |
2185 | 2189 | | |
2186 | 2190 | | |
| |||
2256 | 2260 | | |
2257 | 2261 | | |
2258 | 2262 | | |
2259 | | - | |
2260 | | - | |
2261 | | - | |
2262 | | - | |
| 2263 | + | |
| 2264 | + | |
| 2265 | + | |
| 2266 | + | |
| 2267 | + | |
2263 | 2268 | | |
2264 | | - | |
| 2269 | + | |
2265 | 2270 | | |
2266 | 2271 | | |
2267 | 2272 | | |
| |||
0 commit comments