Commit 0001ec0
llamafile : ppc64le GEMV forwarding for FP32. (llama/12594)
This patch enables usage of MMA when one of the
dimensions of the matrix(ie either M or N) is 1. This
is useful in case of token generation where N < 2.
The concept of 'GEMV Forwarding' is used where when one
of the matrix has a single row/column, the elements are
broadcasted, instead of using packing routine to prepack
the matrix elements.
This change results in 5% - 15% improvement in total
speed(ie all tokens/total time), across various batch
sizes. This is in comparision with the corresponding
dot product implementation.
The patch is tested with FP32 models of Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf on a IBM POWER10 machine.
Signed-off-by: Amrita H S <[email protected]>1 parent 5bad2e5 commit 0001ec0
1 file changed
+16
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2680 | 2680 | | |
2681 | 2681 | | |
2682 | 2682 | | |
2683 | | - | |
| 2683 | + | |
| 2684 | + | |
| 2685 | + | |
| 2686 | + | |
| 2687 | + | |
| 2688 | + | |
2684 | 2689 | | |
2685 | | - | |
| 2690 | + | |
2686 | 2691 | | |
2687 | 2692 | | |
2688 | 2693 | | |
2689 | 2694 | | |
| 2695 | + | |
| 2696 | + | |
| 2697 | + | |
| 2698 | + | |
| 2699 | + | |
| 2700 | + | |
| 2701 | + | |
2690 | 2702 | | |
2691 | 2703 | | |
2692 | 2704 | | |
| |||
2790 | 2802 | | |
2791 | 2803 | | |
2792 | 2804 | | |
| 2805 | + | |
2793 | 2806 | | |
2794 | 2807 | | |
| 2808 | + | |
2795 | 2809 | | |
2796 | 2810 | | |
2797 | 2811 | | |
| |||
0 commit comments