Commit 764be74
committed
kvcache: Group shift operations into batches
Currently, when we need to do a shift on the cache, it is one
RoPE operation on the entire size of the cache (per layer). In
some cases, this can create a compute graph that is larger than
the forward pass since the forward pass is working in batches.
Since we don't consider shifting in our memory estimates, it's
possible for this to cause a crash if we run out of memory.
By limiting the size of the RoPE calls to batch size chunks, we
ensure that the shift will never exceed the size of the forward
pass, since the forward pass will also contain a RoPE of the same
size. This does not have a sigificant impact on performance since
RoPE is a math operation that is mostly proportional to the size
of its inputs.
In theory defrag could have the same issue since it also creates a
compute graph outside of the forward pass, however, since it is
only copies, it does not require any working space.1 parent b72e5ad commit 764be74
1 file changed
+36
-29
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
28 | 31 | | |
29 | 32 | | |
30 | 33 | | |
| |||
147 | 150 | | |
148 | 151 | | |
149 | 152 | | |
| 153 | + | |
150 | 154 | | |
151 | 155 | | |
152 | 156 | | |
| |||
639 | 643 | | |
640 | 644 | | |
641 | 645 | | |
642 | | - | |
643 | | - | |
644 | | - | |
645 | 646 | | |
646 | | - | |
647 | 647 | | |
648 | | - | |
649 | | - | |
650 | | - | |
| 648 | + | |
| 649 | + | |
651 | 650 | | |
652 | | - | |
653 | | - | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
654 | 659 | | |
655 | | - | |
656 | 660 | | |
657 | | - | |
| 661 | + | |
658 | 662 | | |
659 | | - | |
660 | | - | |
661 | | - | |
662 | | - | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
663 | 667 | | |
664 | | - | |
665 | | - | |
666 | | - | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
667 | 671 | | |
668 | | - | |
669 | | - | |
670 | | - | |
671 | | - | |
672 | | - | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
673 | 677 | | |
674 | | - | |
675 | | - | |
676 | | - | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
677 | 685 | | |
678 | 686 | | |
679 | | - | |
| 687 | + | |
| 688 | + | |
680 | 689 | | |
681 | 690 | | |
682 | | - | |
683 | | - | |
684 | 691 | | |
685 | 692 | | |
686 | 693 | | |
| |||
0 commit comments