Skip to content

Conversation

@yeahdongcn
Copy link
Collaborator

@yeahdongcn yeahdongcn commented Aug 11, 2025

Make sure to read the contributing guidelines before submitting a PR

Ref: #15131

Testing Done

ToT:

root@xiaodongye-s80:/ws# ./build/bin/test-backend-ops  | grep FAIL
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 MUSA devices:
  Device 0: MTT S80, compute capability 2.1, VMM: yes
[MUL_MAT_ID] NMSE = 0.473073968 > 0.000500000   MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=4,n_used=1,b=0,m=512,n=32,k=256): FAIL
[MUL_MAT_ID] NMSE = 0.602911949 > 0.000500000   MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=4,n_used=1,b=1,m=512,n=32,k=256): FAIL
[MUL_MAT_ID] NMSE = 0.801315053 > 0.000500000   MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): FAIL
[MUL_MAT_ID] NMSE = 2.501388765 > 0.000500000   MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=4,n_used=2,b=1,m=512,n=32,k=256): FAIL
[MUL_MAT_ID] NMSE = 1.142949490 > 0.000500000   MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=8,n_used=1,b=0,m=512,n=129,k=256): FAIL
[MUL_MAT_ID] NMSE = 0.820664836 > 0.000500000   MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=8,n_used=1,b=1,m=512,n=129,k=256): FAIL
[MUL_MAT_ID] NMSE = 0.730463410 > 0.000500000   MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=8,n_used=2,b=0,m=512,n=32,k=256): FAIL
[MUL_MAT_ID] NMSE = 1.249068448 > 0.000500000   MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=8,n_used=2,b=1,m=512,n=32,k=256): FAIL
[MUL_MAT_ID] NMSE = 1.097372514 > 0.000500000   MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=8,n_used=4,b=0,m=512,n=32,k=256): FAIL
[MUL_MAT_ID] NMSE = 0.734338886 > 0.000500000   MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=8,n_used=4,b=1,m=512,n=32,k=256): FAIL
  Backend MUSA0: FAIL
FAIL

With this fix:

root@xiaodongye-s80:/ws# ./build/bin/test-backend-ops | grep FAIL
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 MUSA devices:
  Device 0: MTT S80, compute capability 2.1, VMM: yes

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 11, 2025
Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github doesn't let me make the correct suggestion, but please apply the same fix to cp_async_available from which I copied the logic (and where it seems the defect doesn't manifest as a bug).

Signed-off-by: Xiaodong Ye <[email protected]>
@yeahdongcn yeahdongcn merged commit 25ff6f7 into ggml-org:master Aug 12, 2025
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants