MAINT: generate random positive definite matrices, not just stacks of identities #296

ev-br · 2024-11-13T16:32:58Z

Just poking around while looking at #228
Not sure what is the right way to generate a random array with hypothesis and am not sure if we can assume that numpy is available. (On CI it should be, via array-api-strict)

… identities

asmeurer · 2024-11-13T21:25:49Z

It depends on what exactly you need in terms of "random". For the most part, you should think of the native hypothesis arrays strategy as generating a "random" array. It's not truly random in the sense of being evenly distributed across some probability distribution. Instead, hypothesis tries to generate examples that it thinks will be "interesting". This means it generates corner cases (like an empty array or an array of all zeros), but it will also generate arrays that are effectively random. The easiest way to get an idea of how it works is to run one of the tests with the flags --hypothesis-verbosity=verbose -s. This will make hypothesis output what inputs it is generating.

The only instance where you should just use arrays is if the distribution of values generated by arrays isn't sufficient to hit interesting examples for the given test. Often, this means you just need to take the input array and modify it somehow. This sort of thing could be the case for some of the linear algebra inputs, but generally wouldn't be elsewhere. The main reason this could happen is that hypothesis generally doesn't generate an array filled with random values. Instead of generates a random mask and assigns a given value to that mask. That means that most arrays generated by hypothesis have relatively few unique values. This is done for efficiency reasons, because these arrays are faster to generate. But it could lead, for instance, to generating matrices that are almost always low rank.

However, you should understand the advantages of using Hypothesis to generate inputs. Hypothesis doesn't just "generate random examples". It does a lot of things on top of this, which you would use if you just used np.random. Firstly, as I mentioned, it tries to generate "interesting" examples. These are corner cases that tend to be implemented incorrectly, but might only appear from np.random with very low or zero probability. Secondly, when an error is found, hypothesis will shrink it. This means it will try to find an example input that is as small as possible that still leads to the same error. So instead of getting a failure with a 100 x 100 matrix, you might get one with a 2 x 2 matrix. Lastly, hypothesis can remember previous failures in its example database. This is mostly a usability thing, but it makes regenerating previous failures much easier, while still generating new random examples each time (which would not happen if you set a seed).

I have a set of slides that go over hypothesis at https://speakerdeck.com/asmeurer/testing-with-hypothesis. I'm happy to go over this with you if you like, or answer any questions you might have.

Regarding the question about NumPy, the test suite should avoid using NumPy. We don't want the tests to depend on NumPy because the standard should not depend on NumPy, but rather the other way around. In this case, there's no need to use NumPy because generating "random" inputs is already handled by hypothesis. And more to a technical point, the standard does not require that np.asarray(any_array_library_array) to work (and indeed, it won't work with CuPy).

ev-br · 2024-11-14T14:12:42Z

Thanks for the slide deck, it's really helpful!

Regarding this PR specifically, it seems we could reject it just as well. For what is really tested, it does not matter much whether inputs are identify matrices or something else; the hassle to generate a "random" orthogonal matrix does not seem to be worth it.

asmeurer · 2024-11-18T21:17:45Z

Yes, the patch here is not the right one, so we can close this.

We do need to generate better positive definite matrices. Right now, the only positive definite matrices that are tested are identity matrices, which isn't great. But the approach that should be taken though is to just assume that arrays from hypothesis will generate a "random" array, and generate a positive definite matrix from that.

That's also not to say that this is particularly high priority. This is only used by one test, test_chokesky, which is not a particularly complicated function, in the sense that most libraries are conformant. I'd say the biggest problem with only generating identity matrices is that the test for the upper keyword argument doesn't actually function, because the cholesky of a identity matrix is always both upper and lower triangular by definition. But there's also much bigger holes in test coverage, including many that should be much easier to fix than this one.

MAINT: generate random positive definite matrices, not just stacks of…

d5bcfcb

… identities

ev-br force-pushed the posdef_matrices branch from d885bac to d5bcfcb Compare November 13, 2024 16:38

asmeurer closed this Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MAINT: generate random positive definite matrices, not just stacks of identities #296

MAINT: generate random positive definite matrices, not just stacks of identities #296

Uh oh!

ev-br commented Nov 13, 2024

Uh oh!

asmeurer commented Nov 13, 2024

Uh oh!

ev-br commented Nov 14, 2024

Uh oh!

asmeurer commented Nov 18, 2024

Uh oh!

Uh oh!

MAINT: generate random positive definite matrices, not just stacks of identities #296

MAINT: generate random positive definite matrices, not just stacks of identities #296

Uh oh!

Conversation

ev-br commented Nov 13, 2024

Uh oh!

asmeurer commented Nov 13, 2024

Uh oh!

ev-br commented Nov 14, 2024

Uh oh!

asmeurer commented Nov 18, 2024

Uh oh!

Uh oh!