Commit a80ff18
authored
ggml-cpu : fix leftover handling in ggml_vec_scale_f32 for SVE (ggml-org#16443)
This commit updates the leftover handling in ggml_vec_scale_f32.
The motivation for this is that the code currently incorrectly assumes
there would be fewer than ggml_f32_epr leftover elements. However,
since the main loop processes 2*ggml_f32_epr elements per iteration
, there can be up to (2*ggml_f32_epr - 1) leftover elements.
The original single-pass leftover code could only process ggml_f32_epr
elements, leaving some elements unscaled.
Example scenario with 256-bit SVE:
```
ggml_f32_epr = 8 (elements per register)
ggml_f32_step = 16 (two registers per iteration)
n = 25
np = 16
leftovers = 9 elements (16-24)
Original : processes only elements 16-23, misses element 24
This commit : loop processes elements 16-23, then element 24
```
Refs: https://github.com/ggml-org/llama.cpp/actions/runs/18070620247/job/514198556301 parent 1d49ca3 commit a80ff18
1 file changed
+4
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
654 | 654 | | |
655 | 655 | | |
656 | 656 | | |
657 | | - | |
658 | | - | |
659 | | - | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
660 | 660 | | |
661 | | - | |
| 661 | + | |
662 | 662 | | |
663 | 663 | | |
664 | 664 | | |
| |||
0 commit comments