Commit ae1bb2c
llama : add high-throughput mode (llama/14363)
* kv-cache : prepare K/V buffers for separation
ggml-ci
* batched-bench : fix oob write
ggml-ci
* llama : add "virtual sequences"
ggml-ci
* llama : use "stream" vs "virtual sequence"
ggml-ci
* graph : fix stream splitting when KV cache is not used
ggml-ci
* kv-cache : add multi-stream save/load support
ggml-ci
* llama : add "--attn-streams" flag
ggml-ci
* kv-cache : fix handling when find_slot fails
ggml-ci
* kv-cache : restore find_slot impl
ggml-ci
* kv-cache : add comments
* kv-cache : add bounds checks for sequence id
ggml-ci
* cont : add n_seq_max to batch allocr
ggml-ci
* kv-cache : perform stream copies lazily after llama_synchronize
ggml-ci
* kv-cache : avoid throwing exceptions across the C boundary
ggml-ci
* CUDA: 4D FlashAttention support (llama/14628)
* CUDA: 4D FlashAttention support
* CUDA: fix WMMA FA kernel
* llama : rename attn_streams -> kv_unified
ggml-ci
* common : rename kv_split -> kv_unified
ggml-ci
---------
Co-authored-by: Johannes Gäßler <[email protected]>1 parent 9cc645f commit ae1bb2c
File tree
8 files changed
+141
-100
lines changed- ggml/src/ggml-cuda
8 files changed
+141
-100
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
36 | 37 | | |
37 | 38 | | |
| 39 | + | |
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
| |||
521 | 523 | | |
522 | 524 | | |
523 | 525 | | |
524 | | - | |
| 526 | + | |
525 | 527 | | |
526 | 528 | | |
527 | 529 | | |
| |||
535 | 537 | | |
536 | 538 | | |
537 | 539 | | |
538 | | - | |
539 | | - | |
| 540 | + | |
| 541 | + | |
540 | 542 | | |
541 | 543 | | |
542 | 544 | | |
| |||
545 | 547 | | |
546 | 548 | | |
547 | 549 | | |
548 | | - | |
549 | | - | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
550 | 553 | | |
551 | 554 | | |
552 | 555 | | |
553 | 556 | | |
554 | 557 | | |
555 | | - | |
| 558 | + | |
556 | 559 | | |
557 | 560 | | |
558 | 561 | | |
| |||
571 | 574 | | |
572 | 575 | | |
573 | 576 | | |
574 | | - | |
| 577 | + | |
575 | 578 | | |
576 | 579 | | |
577 | 580 | | |
| |||
617 | 620 | | |
618 | 621 | | |
619 | 622 | | |
620 | | - | |
621 | | - | |
622 | | - | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
623 | 641 | | |
624 | 642 | | |
625 | 643 | | |
626 | 644 | | |
627 | 645 | | |
628 | 646 | | |
629 | | - | |
| 647 | + | |
630 | 648 | | |
631 | 649 | | |
632 | 650 | | |
| |||
644 | 662 | | |
645 | 663 | | |
646 | 664 | | |
647 | | - | |
| 665 | + | |
648 | 666 | | |
649 | 667 | | |
650 | 668 | | |
651 | | - | |
| 669 | + | |
652 | 670 | | |
653 | 671 | | |
654 | 672 | | |
| |||
705 | 723 | | |
706 | 724 | | |
707 | 725 | | |
708 | | - | |
709 | | - | |
710 | 726 | | |
711 | 727 | | |
712 | 728 | | |
| |||
853 | 869 | | |
854 | 870 | | |
855 | 871 | | |
856 | | - | |
857 | | - | |
| 872 | + | |
| 873 | + | |
858 | 874 | | |
859 | 875 | | |
860 | 876 | | |
| |||
869 | 885 | | |
870 | 886 | | |
871 | 887 | | |
872 | | - | |
| 888 | + | |
873 | 889 | | |
874 | 890 | | |
875 | 891 | | |
876 | | - | |
| 892 | + | |
877 | 893 | | |
878 | 894 | | |
879 | 895 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1224 | 1224 | | |
1225 | 1225 | | |
1226 | 1226 | | |
| 1227 | + | |
1227 | 1228 | | |
1228 | 1229 | | |
| 1230 | + | |
1229 | 1231 | | |
1230 | 1232 | | |
1231 | 1233 | | |
| |||
1274 | 1276 | | |
1275 | 1277 | | |
1276 | 1278 | | |
1277 | | - | |
1278 | | - | |
| 1279 | + | |
| 1280 | + | |
1279 | 1281 | | |
1280 | 1282 | | |
1281 | 1283 | | |
| |||
1285 | 1287 | | |
1286 | 1288 | | |
1287 | 1289 | | |
1288 | | - | |
1289 | | - | |
| 1290 | + | |
| 1291 | + | |
| 1292 | + | |
1290 | 1293 | | |
1291 | | - | |
1292 | | - | |
| 1294 | + | |
| 1295 | + | |
1293 | 1296 | | |
1294 | | - | |
1295 | | - | |
| 1297 | + | |
| 1298 | + | |
1296 | 1299 | | |
1297 | | - | |
| 1300 | + | |
1298 | 1301 | | |
1299 | | - | |
| 1302 | + | |
1300 | 1303 | | |
1301 | 1304 | | |
1302 | 1305 | | |
| |||
1325 | 1328 | | |
1326 | 1329 | | |
1327 | 1330 | | |
1328 | | - | |
1329 | | - | |
| 1331 | + | |
| 1332 | + | |
| 1333 | + | |
1330 | 1334 | | |
1331 | | - | |
1332 | | - | |
| 1335 | + | |
| 1336 | + | |
1333 | 1337 | | |
1334 | | - | |
1335 | | - | |
| 1338 | + | |
| 1339 | + | |
1336 | 1340 | | |
1337 | | - | |
| 1341 | + | |
1338 | 1342 | | |
1339 | | - | |
| 1343 | + | |
1340 | 1344 | | |
1341 | 1345 | | |
1342 | 1346 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
| 37 | + | |
36 | 38 | | |
37 | 39 | | |
38 | 40 | | |
| |||
62 | 64 | | |
63 | 65 | | |
64 | 66 | | |
| 67 | + | |
| 68 | + | |
65 | 69 | | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
70 | 74 | | |
71 | 75 | | |
72 | 76 | | |
73 | | - | |
| 77 | + | |
74 | 78 | | |
75 | 79 | | |
76 | 80 | | |
| |||
255 | 259 | | |
256 | 260 | | |
257 | 261 | | |
| 262 | + | |
| 263 | + | |
258 | 264 | | |
259 | 265 | | |
260 | 266 | | |
| |||
266 | 272 | | |
267 | 273 | | |
268 | 274 | | |
| 275 | + | |
| 276 | + | |
269 | 277 | | |
270 | | - | |
271 | | - | |
| 278 | + | |
| 279 | + | |
272 | 280 | | |
273 | | - | |
| 281 | + | |
274 | 282 | | |
275 | 283 | | |
276 | 284 | | |
277 | | - | |
278 | | - | |
279 | | - | |
| 285 | + | |
280 | 286 | | |
281 | 287 | | |
282 | 288 | | |
283 | | - | |
| 289 | + | |
284 | 290 | | |
285 | 291 | | |
286 | 292 | | |
| |||
290 | 296 | | |
291 | 297 | | |
292 | 298 | | |
293 | | - | |
294 | | - | |
| 299 | + | |
| 300 | + | |
295 | 301 | | |
296 | 302 | | |
297 | 303 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
| 37 | + | |
36 | 38 | | |
37 | 39 | | |
38 | 40 | | |
| |||
74 | 76 | | |
75 | 77 | | |
76 | 78 | | |
| 79 | + | |
| 80 | + | |
77 | 81 | | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
82 | 86 | | |
83 | 87 | | |
84 | 88 | | |
85 | | - | |
| 89 | + | |
86 | 90 | | |
87 | 91 | | |
88 | 92 | | |
| |||
265 | 269 | | |
266 | 270 | | |
267 | 271 | | |
| 272 | + | |
| 273 | + | |
268 | 274 | | |
269 | 275 | | |
270 | 276 | | |
| |||
276 | 282 | | |
277 | 283 | | |
278 | 284 | | |
| 285 | + | |
| 286 | + | |
279 | 287 | | |
280 | | - | |
281 | | - | |
| 288 | + | |
| 289 | + | |
282 | 290 | | |
283 | | - | |
| 291 | + | |
284 | 292 | | |
285 | 293 | | |
286 | 294 | | |
287 | 295 | | |
288 | | - | |
289 | | - | |
290 | | - | |
| 296 | + | |
291 | 297 | | |
292 | 298 | | |
293 | 299 | | |
294 | | - | |
| 300 | + | |
295 | 301 | | |
296 | 302 | | |
297 | 303 | | |
| |||
0 commit comments