Commit 7d1d1c1
committed
llamafile : ppc64le GEMV forwarding for FP32.
This patch enables usage of MMA when one of the
dimensions of the matrix(ie either M or N) is 1. This
is useful in case of token generation where N < 2.
The concept of 'GEMV Forwarding' is used where when one
of the matrix has a single row/column, the elements are
broadcasted, instead of using packing routine to prepack
the matrix elements.
This change results in 5% - 15% improvement in total
speed(ie all tokens/total time), across various batch
sizes. This is in comparision with the corresponding
dot product implementation.
The patch is tested with FP32 models of Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf on a IBM POWER10 machine.
Signed-off-by: Amrita H S <[email protected]>1 parent d7cfe1f commit 7d1d1c1
1 file changed
+11
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2261 | 2261 | | |
2262 | 2262 | | |
2263 | 2263 | | |
2264 | | - | |
| 2264 | + | |
2265 | 2265 | | |
2266 | | - | |
| 2266 | + | |
2267 | 2267 | | |
2268 | 2268 | | |
2269 | 2269 | | |
2270 | 2270 | | |
| 2271 | + | |
| 2272 | + | |
| 2273 | + | |
| 2274 | + | |
| 2275 | + | |
| 2276 | + | |
| 2277 | + | |
2271 | 2278 | | |
2272 | 2279 | | |
2273 | 2280 | | |
| |||
2371 | 2378 | | |
2372 | 2379 | | |
2373 | 2380 | | |
| 2381 | + | |
2374 | 2382 | | |
2375 | 2383 | | |
| 2384 | + | |
2376 | 2385 | | |
2377 | 2386 | | |
2378 | 2387 | | |
| |||
0 commit comments