Commit fffeb4c
ssjia
[ET-VK][ez] Ensure that attn_weight buffers do not exceed GPU buffer numel limit
Pull Request resolved: #15651
Title says it all!
To give a concrete example, Llama3.2-1B-Instruct will have attn weights with size `{1, 32, max_seq_len, max_context_len}`. Usually `max_seq_len == max_context_len`, and if `max_context_len = 2048` Then the attention weight tensors will have sizes `{1, 32, 2048, 2048}` which will contain 134217728 elements. The `maxStorageBufferRange` for Adreno 750 is also 134217728 (2^27), so using context length of 2048 will produce incorrect results on Adreno 750.
In practice, it is unlikely that the prompt sequence length will be equal to the context length, so the solution is to adjust down the `max_seq_len` dim of the attention weight tensors to ensure that the GPU buffer numel limit is not hit.
ghstack-source-id: 321555042
@exported-using-ghexport
Differential Revision: [D86443407](https://our.internmc.facebook.com/intern/diff/D86443407/)1 parent ef3e85a commit fffeb4c
File tree
2 files changed
+32
-7
lines changed- backends/vulkan/runtime/graph
- ops/impl
2 files changed
+32
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
639 | 639 | | |
640 | 640 | | |
641 | 641 | | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
642 | 646 | | |
643 | 647 | | |
644 | 648 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
471 | 471 | | |
472 | 472 | | |
473 | 473 | | |
474 | | - | |
475 | | - | |
| 474 | + | |
476 | 475 | | |
477 | 476 | | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
478 | 499 | | |
479 | 500 | | |
480 | 501 | | |
| |||
485 | 506 | | |
486 | 507 | | |
487 | 508 | | |
488 | | - | |
| 509 | + | |
489 | 510 | | |
490 | 511 | | |
491 | 512 | | |
492 | 513 | | |
493 | 514 | | |
494 | 515 | | |
495 | | - | |
| 516 | + | |
496 | 517 | | |
497 | 518 | | |
498 | 519 | | |
| |||
528 | 549 | | |
529 | 550 | | |
530 | 551 | | |
531 | | - | |
| 552 | + | |
532 | 553 | | |
533 | | - | |
| 554 | + | |
534 | 555 | | |
535 | 556 | | |
536 | 557 | | |
| |||
573 | 594 | | |
574 | 595 | | |
575 | 596 | | |
576 | | - | |
| 597 | + | |
577 | 598 | | |
578 | 599 | | |
579 | 600 | | |
| |||
0 commit comments