Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Sep 4, 2025

Motivation:

  • When working with kyutai-mimi, at some points, I realized that I cannot do further calculation from the output of ggml_argmax or ggml_top_k due to missing an op to convert i32 --> f32
  • Allowing Llama 4 attn_scale to be calculated on cgraph (just pointing this out, but probably we don't need to change this code)
  • This discussion: model : add GroveMoE support #15510 (comment) (cc @CISC)
  • Maybe useful for future models

Note: casting from f32 --> i32 will discard the fractional part

Planned to implement it on these backends:

  • CPU
  • Metal
  • CUDA
  • Vulkan

test-backend-ops:

  CPY(type_src=f32,type_dst=i32,ne=[256,2,3,4],permute_src=[0,0,0,0],permute_dst=[0,0,0,0]): OK
  CPY(type_src=f32,type_dst=i32,ne=[256,2,3,4],permute_src=[1,0,2,3],permute_dst=[0,0,0,0]): OK
  CPY(type_src=i32,type_dst=f32,ne=[256,2,3,4],permute_src=[0,0,0,0],permute_dst=[0,0,0,0]): OK
  CPY(type_src=i32,type_dst=f32,ne=[256,2,3,4],permute_src=[1,0,2,3],permute_dst=[0,0,0,0]): OK
  11841/11841 tests passed
  Backend Metal: OK
  CPY(type_src=f32,type_dst=i32,ne=[256,2,3,4],permute_src=[0,0,0,0],permute_dst=[0,0,0,0]): OK
  CPY(type_src=f32,type_dst=i32,ne=[256,2,3,4],permute_src=[1,0,2,3],permute_dst=[0,0,0,0]): OK
  CPY(type_src=i32,type_dst=f32,ne=[256,2,3,4],permute_src=[0,0,0,0],permute_dst=[0,0,0,0]): OK
  CPY(type_src=i32,type_dst=f32,ne=[256,2,3,4],permute_src=[1,0,2,3],permute_dst=[0,0,0,0]): OK
  11841/11841 tests passed
  Backend CUDA0: OK
  CPY(type_src=f32,type_dst=i32,ne=[256,2,3,4],permute_src=[0,0,0,0],permute_dst=[0,0,0,0]): OK
  CPY(type_src=f32,type_dst=i32,ne=[256,2,3,4],permute_src=[1,0,2,3],permute_dst=[0,0,0,0]): OK
  CPY(type_src=i32,type_dst=f32,ne=[256,2,3,4],permute_src=[0,0,0,0],permute_dst=[0,0,0,0]): OK
  CPY(type_src=i32,type_dst=f32,ne=[256,2,3,4],permute_src=[1,0,2,3],permute_dst=[0,0,0,0]): OK
  11841/11841 tests passed
  Backend Vulkan0: OK

@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Sep 4, 2025
@github-actions github-actions bot added the Vulkan Issues specific to the Vulkan backend label Sep 4, 2025
@ngxson ngxson marked this pull request as ready for review September 4, 2025 16:29
@ngxson ngxson requested a review from 0cc4m as a code owner September 4, 2025 16:29
@ngxson ngxson requested review from CISC, ggerganov and slaren September 4, 2025 16:29
@slaren
Copy link
Member

slaren commented Sep 4, 2025

Note: casting from f32 --> i32 will be equivalent to floor()

For C/C++, the behavior of float to int cast is to discard the fractional part, truncating the value towards zero. For negative values, this is not the same as floor().

@ngxson
Copy link
Collaborator Author

ngxson commented Sep 4, 2025

@slaren thanks, I've updated the test and comment to reflect this. according to the test, the behavior is currently the same on all backends

Copy link
Collaborator

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test that actually verifies that the cast produces the intended values (ie, 1.5->1, 1->1.0. etc) would be nice I guess.

@ngxson
Copy link
Collaborator Author

ngxson commented Sep 5, 2025

Yes that would be nice, and also be useful for many other ops. It can also act as examples for how to use certain ops. However, we need to adapt the code of test-backend-ops to support this, which can be quite complicated.

@CISC
Copy link
Collaborator

CISC commented Sep 5, 2025

Yes that would be nice, and also be useful for many other ops. It can also act as examples for how to use certain ops. However, we need to adapt the code of test-backend-ops to support this, which can be quite complicated.

Yeah, figured as much, another thing on the collective consciousness might-TODO-list. :)

@ggerganov
Copy link
Member

Yes that would be nice, and also be useful for many other ops. It can also act as examples for how to use certain ops. However, we need to adapt the code of test-backend-ops to support this, which can be quite complicated.

I think we can implement this by setting the suitable values in initialize_tensors. For example, setting setting values of -0.5 will cover that all backends are truncating towards zero.

@ngxson
Copy link
Collaborator Author

ngxson commented Sep 5, 2025

I think we can implement this by setting the suitable values in initialize_tensors. For example, setting setting values of -0.5 will cover that all backends are truncating towards zero.

Yes that's kinda what I'm doing, I set the range to [-150.0, 150.0] (basically copy the same code from test_gelu). So, we should randomly have negative values, which confirms the consistent behavior on all backends.

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Vulkan change looks good,

Comment on lines +779 to +796
} else if (dst->type == GGML_TYPE_I32) {
size_t id = 0;
int32_t * dst_ptr = (int32_t *) dst->data;

for (int i03 = 0; i03 < ne03; i03++) {
for (int i02 = 0; i02 < ne02; i02++) {
id += ne00 * ir0;
for (int i01 = ir0; i01 < ir1; i01++) {
for (int i00 = 0; i00 < ne00; i00++) {
const float * src0_ptr = (float *) ((char *) src0->data + i00*nb00 + i01*nb01 + i02*nb02 + i03*nb03);

dst_ptr[id] = *src0_ptr;
id++;
}
}
id += ne00 * (ne01 - ir1);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we merge this into the F32 branch above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and indeed I also want to migrate some of these codes into template function. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, refactoring this code is welcome.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this PR as-is and will open another PR to refactor this code

@ngxson ngxson merged commit 9fcb29f into ggml-org:master Sep 8, 2025
48 checks passed
njsyw1997 pushed a commit to aizip/llama.cpp that referenced this pull request Sep 10, 2025
* ggml: allow casting between f32 and i32

* fix cuda

* add vulkan

* fix CPU non-cont

* add non-cont test case

* add note

* extend test number range

* correct note

* add cont version for vulkan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants