Commit e892134
authored
ggml : optimize llamafile cpu matrix multiplication for ppc64le (ggml-org#10156)
This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for FP32 datatype.
This change results in a consistent 90%
improvement in input processing time, and 20%
to 80% improvement in output processing time,
across various batch sizes.
The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.
Signed-off-by: Amrita H S <[email protected]>1 parent 8fc393f commit e892134
2 files changed
+615
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1265 | 1265 | | |
1266 | 1266 | | |
1267 | 1267 | | |
1268 | | - | |
1269 | | - | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
| 1271 | + | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
1270 | 1275 | | |
1271 | 1276 | | |
1272 | 1277 | | |
| |||
0 commit comments