Conversation
db5a46b to
a0b1b1f
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3b1d2ed to
e0b748c
Compare
|
Very odd failure here on just windows, the matrix |
|
Any idea what might have changed here @benjeffery? I.e., did some of the library updates change the input to the test? Nonetheless, seems like a bug has been exposed here, which we should try and characterise? |
|
I'm guessing it is scipy/numpy related - will do some digging. Maybe something related to default types in the CPython API, or something fiddly like that. |
|
So it's not unexpected that S would look different on Windows vs Unix here if there's random numbers involved - what seems most likely to me is that random numbers have changed and Windows has picked out some very unlikely edge case that happens to cause problems. If that's the case, we can skipIf that test to get the build running, and file an issue for follow-up |
c009b6c to
c01ecd0
Compare
|
Ok, digging complete, there is no randomess. On Linux, the largest eigenvalue is exactly 1, which clearly meets the tolerance of 1e-8. On Windows this value is 0.99885! Likely due to very different underlying BLAS libraries where numerical error is accumulating on Windows. I've had a look around for a fix here and these are the options:
I went with option 2 in this PR. Would like an expert review though as I am stretching ancient neurons from long ago here. |
e9f75f8 to
3810cd0
Compare
|
Also: Maybe error or warn on failure to converge? |
|
@andrewkern I believe you wrote this code initially, so some review would be appreciated. |
|
wow this is weird! I think the eigenvalue solver I originally implemented can be unstable for really large matrices, but I don't think we are in that space here. The power iteration you implemented should also work fine and be more stable for large matrices, but IIRC it is slower. Just catching up -- you downgraded the CI test reqs to a different scipy version and this started throwing an error? looking over the reqs we aren't pinning a specific numpy version.... I'm guessing what's happening is the downgraded scipy version is using a different BLAS backend on Windows. any chance you could get the output from |
|
also is there a reason we don't have numpy pinned in the CI reqs? |
3810cd0 to
7f3e1e4
Compare
|
The change in this test is upgrading scipy 1.11.3 -> 1.13.1. (There are multiple CI's one got downgraded by mistake so thanks for the spot!) We don't pin dependencies that come as a result of other dependencies - this is a trade off of stability vs detecting breakage. Here is the np.show_configs: Windows: |
|
BTW Even if we can pin CI to a version so this doesn't happen, we can't pin numpy in the distributed package so it needs a fix. |
|
I'd be curious if adding @petrelharp might have strong feelings about using power iteration to find the stationary distribution versus solving the eigensystem |
|
Power iteration is a great way to go. Nice work. I don't think we want to be pinning the linear algebra backend, since then if it changes and causes a bug we find out about it? |
msprime/mutations.py
Outdated
| hi=None, | ||
| root_distribution=None, | ||
| ): | ||
| print(np.show_config()) |
|
Hm, actually, I see the solve is in production, not testing code, so thinking harder about it: I'm not worried about it being slower because we only have to do this once per I never saw the error, though - how big was the matrix? And, what was the second eigenvalue? We should throw a error on lack of convergence; it's possible that the transition matrix is periodic, I suppose. |
|
Maybe we could spin this into a specific issue and follow-up PR? We can pytest skip the test on Windows to get the major stuff through and then the actual problem will come up. My interpretation is that this change as exposed a bug that we need to fix. |
7f3e1e4 to
725c912
Compare
|
Ok, reverted code and skipped the test. Issue filed at #2349. |

No description provided.