Possible fix: use ne0..ne3 (dst dims) instead of ne00..ne03 in ggml_compute_forward_dup_f16 #15626
+8
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I might have found a small mistake in ggml/src/ggml-cpu/ops.cpp, in the non-contiguous path of ggml_compute_forward_dup_f16 when dst->type == GGML_TYPE_F16.
The carry after each element uses ne00/ne01/ne02/ne03 (source dims). Since the indices i10..i13 point to dst, I think it should compare against ne0/ne1/ne2/ne3 (destination dims). The adjacent F32 branch seems to use ne0..ne3.
Apologies if I’m missing context. If this intentional, feel free to close.