Skip to content

Conversation

@jeffbolznv
Copy link
Collaborator

I've seen intermittent failures in these tests both locally and in CI. It occurs when precision/rounding differences cause an index to be off by one, and the tolerance isn't high enough to allow for even one rounding error. This change estimates what the nmse error would be if one value is rounded differently and uses that as the max err. I've run many thousands of iterations with this error bound and it passes.

Here are the values it's computing in the existing test cases:

err_estimate 0.000001272
  SET_ROWS(type=q8_0,type_idx=i32,ne=[256,5,1,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000081380
  SET_ROWS(type=q4_0,type_idx=i64,ne=[256,5,1,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000005813
  SET_ROWS(type=q4_0,type_idx=i64,ne=[256,11,1,1],nr23=[2,3],r=7,v=0): OK
err_estimate 0.000054253
  SET_ROWS(type=q4_0,type_idx=i64,ne=[96,3,1,1],nr23=[2,3],r=2,v=0): OK
err_estimate 0.000081380
  SET_ROWS(type=q4_0,type_idx=i64,ne=[256,5,1,3],nr23=[1,1],r=1,v=1): OK
err_estimate 0.000005813
  SET_ROWS(type=q4_0,type_idx=i64,ne=[256,11,1,1],nr23=[2,3],r=7,v=1): OK
err_estimate 0.000054253
  SET_ROWS(type=q4_0,type_idx=i64,ne=[96,3,1,1],nr23=[2,3],r=2,v=1): OK
err_estimate 0.000011626
  SET_ROWS(type=q4_0,type_idx=i64,ne=[256,5,7,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000000830
  SET_ROWS(type=q4_0,type_idx=i64,ne=[256,11,1,7],nr23=[2,3],r=7,v=0): OK
err_estimate 0.000007750
  SET_ROWS(type=q4_0,type_idx=i64,ne=[96,3,7,1],nr23=[2,3],r=2,v=0): OK
err_estimate 0.000011626
  SET_ROWS(type=q4_0,type_idx=i64,ne=[256,5,7,3],nr23=[1,1],r=1,v=1): OK
err_estimate 0.000000830
  SET_ROWS(type=q4_0,type_idx=i64,ne=[256,11,1,7],nr23=[2,3],r=7,v=1): OK
err_estimate 0.000007750
  SET_ROWS(type=q4_0,type_idx=i64,ne=[96,3,7,1],nr23=[2,3],r=2,v=1): OK
err_estimate 0.000081380
  SET_ROWS(type=q4_1,type_idx=i64,ne=[256,5,1,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000005813
  SET_ROWS(type=q4_1,type_idx=i64,ne=[256,11,1,1],nr23=[2,3],r=7,v=0): OK
err_estimate 0.000054253
  SET_ROWS(type=q4_1,type_idx=i64,ne=[96,3,1,1],nr23=[2,3],r=2,v=0): OK
err_estimate 0.000081380
  SET_ROWS(type=q4_1,type_idx=i64,ne=[256,5,1,3],nr23=[1,1],r=1,v=1): OK
err_estimate 0.000005813
  SET_ROWS(type=q4_1,type_idx=i64,ne=[256,11,1,1],nr23=[2,3],r=7,v=1): OK
err_estimate 0.000054253
  SET_ROWS(type=q4_1,type_idx=i64,ne=[96,3,1,1],nr23=[2,3],r=2,v=1): OK
err_estimate 0.000011626
  SET_ROWS(type=q4_1,type_idx=i64,ne=[256,5,7,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000000830
  SET_ROWS(type=q4_1,type_idx=i64,ne=[256,11,1,7],nr23=[2,3],r=7,v=0): OK
err_estimate 0.000007750
  SET_ROWS(type=q4_1,type_idx=i64,ne=[96,3,7,1],nr23=[2,3],r=2,v=0): OK
err_estimate 0.000011626
  SET_ROWS(type=q4_1,type_idx=i64,ne=[256,5,7,3],nr23=[1,1],r=1,v=1): OK
err_estimate 0.000000830
  SET_ROWS(type=q4_1,type_idx=i64,ne=[256,11,1,7],nr23=[2,3],r=7,v=1): OK
err_estimate 0.000007750
  SET_ROWS(type=q4_1,type_idx=i64,ne=[96,3,7,1],nr23=[2,3],r=2,v=1): OK
err_estimate 0.000020345
  SET_ROWS(type=q5_0,type_idx=i64,ne=[256,5,1,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000001453
  SET_ROWS(type=q5_0,type_idx=i64,ne=[256,11,1,1],nr23=[2,3],r=7,v=0): OK
err_estimate 0.000013563
  SET_ROWS(type=q5_0,type_idx=i64,ne=[96,3,1,1],nr23=[2,3],r=2,v=0): OK
err_estimate 0.000020345
  SET_ROWS(type=q5_0,type_idx=i64,ne=[256,5,1,3],nr23=[1,1],r=1,v=1): OK
err_estimate 0.000001453
  SET_ROWS(type=q5_0,type_idx=i64,ne=[256,11,1,1],nr23=[2,3],r=7,v=1): OK
err_estimate 0.000013563
  SET_ROWS(type=q5_0,type_idx=i64,ne=[96,3,1,1],nr23=[2,3],r=2,v=1): OK
err_estimate 0.000002906
  SET_ROWS(type=q5_0,type_idx=i64,ne=[256,5,7,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000000208
  SET_ROWS(type=q5_0,type_idx=i64,ne=[256,11,1,7],nr23=[2,3],r=7,v=0): OK
err_estimate 0.000001938
  SET_ROWS(type=q5_0,type_idx=i64,ne=[96,3,7,1],nr23=[2,3],r=2,v=0): OK
err_estimate 0.000002906
  SET_ROWS(type=q5_0,type_idx=i64,ne=[256,5,7,3],nr23=[1,1],r=1,v=1): OK
err_estimate 0.000000208
  SET_ROWS(type=q5_0,type_idx=i64,ne=[256,11,1,7],nr23=[2,3],r=7,v=1): OK
err_estimate 0.000001938
  SET_ROWS(type=q5_0,type_idx=i64,ne=[96,3,7,1],nr23=[2,3],r=2,v=1): OK
err_estimate 0.000020345
  SET_ROWS(type=q5_1,type_idx=i64,ne=[256,5,1,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000001453
  SET_ROWS(type=q5_1,type_idx=i64,ne=[256,11,1,1],nr23=[2,3],r=7,v=0): OK
err_estimate 0.000013563
  SET_ROWS(type=q5_1,type_idx=i64,ne=[96,3,1,1],nr23=[2,3],r=2,v=0): OK
err_estimate 0.000020345
  SET_ROWS(type=q5_1,type_idx=i64,ne=[256,5,1,3],nr23=[1,1],r=1,v=1): OK
err_estimate 0.000001453
  SET_ROWS(type=q5_1,type_idx=i64,ne=[256,11,1,1],nr23=[2,3],r=7,v=1): OK
err_estimate 0.000013563
  SET_ROWS(type=q5_1,type_idx=i64,ne=[96,3,1,1],nr23=[2,3],r=2,v=1): OK
err_estimate 0.000002906
  SET_ROWS(type=q5_1,type_idx=i64,ne=[256,5,7,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000000208
  SET_ROWS(type=q5_1,type_idx=i64,ne=[256,11,1,7],nr23=[2,3],r=7,v=0): OK
err_estimate 0.000001938
  SET_ROWS(type=q5_1,type_idx=i64,ne=[96,3,7,1],nr23=[2,3],r=2,v=0): OK
err_estimate 0.000002906
  SET_ROWS(type=q5_1,type_idx=i64,ne=[256,5,7,3],nr23=[1,1],r=1,v=1): OK
err_estimate 0.000000208
  SET_ROWS(type=q5_1,type_idx=i64,ne=[256,11,1,7],nr23=[2,3],r=7,v=1): OK
err_estimate 0.000001938
  SET_ROWS(type=q5_1,type_idx=i64,ne=[96,3,7,1],nr23=[2,3],r=2,v=1): OK
err_estimate 0.000001272
  SET_ROWS(type=q8_0,type_idx=i64,ne=[256,5,1,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000000091
  SET_ROWS(type=q8_0,type_idx=i64,ne=[256,11,1,1],nr23=[2,3],r=7,v=0): OK
err_estimate 0.000000848
  SET_ROWS(type=q8_0,type_idx=i64,ne=[96,3,1,1],nr23=[2,3],r=2,v=0): OK
err_estimate 0.000001272
  SET_ROWS(type=q8_0,type_idx=i64,ne=[256,5,1,3],nr23=[1,1],r=1,v=1): OK
err_estimate 0.000000091
  SET_ROWS(type=q8_0,type_idx=i64,ne=[256,11,1,1],nr23=[2,3],r=7,v=1): OK
err_estimate 0.000000848
  SET_ROWS(type=q8_0,type_idx=i64,ne=[96,3,1,1],nr23=[2,3],r=2,v=1): OK
err_estimate 0.000000182
  SET_ROWS(type=q8_0,type_idx=i64,ne=[256,5,7,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000000013
  SET_ROWS(type=q8_0,type_idx=i64,ne=[256,11,1,7],nr23=[2,3],r=7,v=0): OK
err_estimate 0.000000121
  SET_ROWS(type=q8_0,type_idx=i64,ne=[96,3,7,1],nr23=[2,3],r=2,v=0): OK
err_estimate 0.000000182
  SET_ROWS(type=q8_0,type_idx=i64,ne=[256,5,7,3],nr23=[1,1],r=1,v=1): OK
err_estimate 0.000000013
  SET_ROWS(type=q8_0,type_idx=i64,ne=[256,11,1,7],nr23=[2,3],r=7,v=1): OK
err_estimate 0.000000121
  SET_ROWS(type=q8_0,type_idx=i64,ne=[96,3,7,1],nr23=[2,3],r=2,v=1): OK
err_estimate 0.000081380
  SET_ROWS(type=iq4_nl,type_idx=i64,ne=[256,5,1,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000005813
  SET_ROWS(type=iq4_nl,type_idx=i64,ne=[256,11,1,1],nr23=[2,3],r=7,v=0): OK
err_estimate 0.000054253
  SET_ROWS(type=iq4_nl,type_idx=i64,ne=[96,3,1,1],nr23=[2,3],r=2,v=0): OK
err_estimate 0.000081380
  SET_ROWS(type=iq4_nl,type_idx=i64,ne=[256,5,1,3],nr23=[1,1],r=1,v=1): OK
err_estimate 0.000005813
  SET_ROWS(type=iq4_nl,type_idx=i64,ne=[256,11,1,1],nr23=[2,3],r=7,v=1): OK
err_estimate 0.000054253
  SET_ROWS(type=iq4_nl,type_idx=i64,ne=[96,3,1,1],nr23=[2,3],r=2,v=1): OK
err_estimate 0.000011626
  SET_ROWS(type=iq4_nl,type_idx=i64,ne=[256,5,7,3],nr23=[1,1],r=1,v=0): OK
err_estimate 0.000000830
  SET_ROWS(type=iq4_nl,type_idx=i64,ne=[256,11,1,7],nr23=[2,3],r=7,v=0): OK
err_estimate 0.000007750
  SET_ROWS(type=iq4_nl,type_idx=i64,ne=[96,3,7,1],nr23=[2,3],r=2,v=0): OK
err_estimate 0.000011626
  SET_ROWS(type=iq4_nl,type_idx=i64,ne=[256,5,7,3],nr23=[1,1],r=1,v=1): OK
err_estimate 0.000000830
  SET_ROWS(type=iq4_nl,type_idx=i64,ne=[256,11,1,7],nr23=[2,3],r=7,v=1): OK
err_estimate 0.000007750

@jeffbolznv jeffbolznv requested a review from slaren September 28, 2025 01:34
@github-actions github-actions bot added the testing Everything test related label Sep 28, 2025
Copy link
Member

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar logic could possibly be applied to test_cpy, since it has the issue.

@jeffbolznv
Copy link
Collaborator Author

Thanks, I hadn't seen test_cpy failing, but once I ran it in a loop it was easy to reproduce. I've added similar logic there.

@jeffbolznv jeffbolznv merged commit a74a0d6 into ggml-org:master Sep 30, 2025
65 of 67 checks passed
yael-works pushed a commit to yael-works/llama.cpp that referenced this pull request Oct 15, 2025
…ounding differences (ggml-org#16295)

* tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences

* apply similar error bounds to test_cpy
pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025
…ounding differences (ggml-org#16295)

* tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences

* apply similar error bounds to test_cpy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants