Commit e013b4d
committed
llamafile : ppc64le GEMV forwarding for FP32.
This patch enables usage of MMA when one of the
dimensions of the matrix(ie either M or N) is 1. This
is useful in case of token generation where N < 2.
The concept of 'GEMV Forwarding' is used where when one
of the matrix has a single row/column, the elements are
broadcasted, instead of using packing routine to prepack
the matrix elements.
This change results in 5% - 15% improvement in total
speed(ie all tokens/total time), across various batch
sizes. This is in comparision with the corresponding
dot product implementation.
The patch is tested with FP32 models of Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf on a IBM POWER10 machine.
Signed-off-by: Amrita H S <[email protected]>1 parent d7cfe1f commit e013b4d
1 file changed
+16
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2261 | 2261 | | |
2262 | 2262 | | |
2263 | 2263 | | |
2264 | | - | |
| 2264 | + | |
| 2265 | + | |
| 2266 | + | |
| 2267 | + | |
| 2268 | + | |
| 2269 | + | |
2265 | 2270 | | |
2266 | | - | |
| 2271 | + | |
2267 | 2272 | | |
2268 | 2273 | | |
2269 | 2274 | | |
2270 | 2275 | | |
| 2276 | + | |
| 2277 | + | |
| 2278 | + | |
| 2279 | + | |
| 2280 | + | |
| 2281 | + | |
| 2282 | + | |
2271 | 2283 | | |
2272 | 2284 | | |
2273 | 2285 | | |
| |||
2371 | 2383 | | |
2372 | 2384 | | |
2373 | 2385 | | |
| 2386 | + | |
2374 | 2387 | | |
2375 | 2388 | | |
| 2389 | + | |
2376 | 2390 | | |
2377 | 2391 | | |
2378 | 2392 | | |
| |||
0 commit comments