Different (but mathematically correct) result of Eigendecomposition on GPU versus CPU #8904

Edenhofer · 2021-12-10T22:11:37Z

Edenhofer
Dec 10, 2021

jax.numpy.linalg.eigh yields different results for the eigenvectors of the matrix depending on the backend. Mathematically both results are correct but the eigenvectors still differ in their sign. This backend-dependent behavior makes it impossible for me to efficiently reproduce results on the CPU which I produced on the GPU and vice versa. Might this be considered a bug in JAX or should one work around this by e.g. fixing the signs at some arbitrary location in the eigenvectors for reproducability?

Below is an exemplary matrix from the model for which the decomposition yields different signs for the eigenvectors depending on the backend.

from jax import numpy as jnp

info_prop_inv = jnp.array([[ 0.86263333, -0.25096436, -0.25096437, -0.15021721,
              -0.004183  , -0.00377802, -0.00377802, -0.00337444],
             [-0.25096436,  0.86263333, -0.15021721, -0.25096437,
              -0.00377802, -0.004183  , -0.00337444, -0.00377802],
             [-0.25096437, -0.15021721,  0.86263333, -0.25096436,
              -0.00377802, -0.00337444, -0.004183  , -0.00377802],
             [-0.15021721, -0.25096437, -0.25096436,  0.86263333,
              -0.00337444, -0.00377802, -0.00377802, -0.004183  ],
             [-0.004183  , -0.00377802, -0.00377802, -0.00337444,
               0.8614207 , -0.25057809, -0.25057809, -0.15001035],
             [-0.00377802, -0.004183  , -0.00337444, -0.00377802,
              -0.25057809,  0.8614207 , -0.15001035, -0.25057809],
             [-0.00377802, -0.00337444, -0.004183  , -0.00377802,
              -0.25057809, -0.15001035,  0.8614207 , -0.25057809],
             [-0.00337444, -0.00377802, -0.00377802, -0.004183  ,
              -0.15001035, -0.25057809, -0.25057809,  0.8614207 ]])

v, w = jnp.linalg.eigh(info_prop_inv)

On the CPU, the eigenvectors have the following signs:

In [82]: v
Out[82]:
DeviceArray([0.19525685, 0.22548471, 1.01106492, 1.01106492, 1.01321666,
             1.01321668, 1.21256653, 1.21434485], dtype=float64)

In [83]: w
Out[83]:
DeviceArray([[ 3.52186847e-01, -3.54914673e-01,  2.06249780e-01,
              -2.06248168e-01,  4.55478907e-01,  4.55479623e-01,
              -3.93629577e-04,  4.99999845e-01],
# snip
             [ 3.54914673e-01,  3.52186847e-01, -4.55478927e-01,
               4.55479602e-01,  2.06249770e-01,  2.06248178e-01,
              -4.99999845e-01, -3.93629577e-04]], dtype=float64)

On the GPU, the eigenvectors sometimes have a different sign:

In [3]: v
Out[3]:
DeviceArray([0.19525681, 0.22548468, 1.0110648 , 1.011065  , 1.0132167 ,
             1.0132169 , 1.2125663 , 1.2143449 ], dtype=float32)

w
In [4]: w
Out[4]:
DeviceArray([[ 3.5218695e-01,  3.5491458e-01,  2.0626545e-01,
               2.0649450e-01,  4.5536759e-01,  4.5547181e-01,
               3.9434023e-04, -5.0000030e-01],
# snip
             [ 3.5491461e-01, -3.5218698e-01, -4.5547181e-01,
              -4.5536801e-01,  2.0649454e-01,  2.0626551e-01,
               4.9999982e-01,  3.9421770e-04]], dtype=float32)

Background

Part of my model in JAX relies on having access to the eigenvectors of matrices. These matrices are constant within the model and fully determined by a simple function. They are intricately involved in applying the model and the parameters of the mode depend on the sign of the eigenvectors. For saving the model, I do not store the resulting eigenvectors of the matrix but instead only save the few parameters of the function with which I can generate the matrix and thus the eigenvectors of it.

However, since the eigenvectors are different depending on the backend, the parameters of the model when learned with the code bein compiled for the GPU are incompatible with the code being compiled for the CPU.

Answered by shoyer

Dec 11, 2021

Unfortunately, I don't think it's possible for JAX to guarantee deterministic eigendecompositions, even on CPU.

JAX's eigh relies on LAPACK's syevd and GPU implementations by Nvidia and AMD:
https://github.com/google/jax/blob/53318a2a7a644e5ed1ac657f408e31eeb1fe5a0d/jax/_src/lax/linalg.py#L562-L576

Even if there are no degeneracies, eigenvectors are only uniquely defined up a global phase. Each platform has the freedom to pick its own way of calculating eigenvectors, in whichever manner is most convenient. You will even find that different LAPACK distributions on CPU (e.g., Intel MKL vs OpenBLAS) may calculated eigenvectors different.

You certainly could pick your own convention for norma…

View full answer

shoyer · 2021-12-11T05:40:20Z

shoyer
Dec 11, 2021
Collaborator

Unfortunately, I don't think it's possible for JAX to guarantee deterministic eigendecompositions, even on CPU.

JAX's eigh relies on LAPACK's syevd and GPU implementations by Nvidia and AMD:
https://github.com/google/jax/blob/53318a2a7a644e5ed1ac657f408e31eeb1fe5a0d/jax/_src/lax/linalg.py#L562-L576

Even if there are no degeneracies, eigenvectors are only uniquely defined up a global phase. Each platform has the freedom to pick its own way of calculating eigenvectors, in whichever manner is most convenient. You will even find that different LAPACK distributions on CPU (e.g., Intel MKL vs OpenBLAS) may calculated eigenvectors different.

You certainly could pick your own convention for normalizing eigenvectors across platforms, but my main suggestion is to look for ways to reformulate your model such that it is invariant to this global phase. In my experience, this is a sign that you're doing something wrong in your model, because your model is no longer a mathematically well defined function.

5 replies

Edenhofer Dec 13, 2021
Author

Thank you very much for your detailed reply. I did not consider that even LAPACK flavors might differ in their output.

My problem is in a way very similar to drawing random numbers from a multivariate normal distribution by first drawing a standard normal vector and then correlating it via the square root of some matrix (for multivariate normal, the square root of the covariance). By the nature of the problem, there is no constrain on the sign of the transformation since only the square of the quantity is constrained. For example, the same problem appears when drawing random multivariate normal vectors in JAX:

https://github.com/google/jax/blob/f6e3f1b4add8655249f16d4621d0a7511b20332e/jax/_src/random.py#L604-L613

Since JAX relies on the eigendecomposition (/SVD/Cholesky) within the random number drawing-routine, the method is not a true function either. Note, this is despite the PRNG key. The same holds for the SVD and the Cholesky decomposition since all of these methods are invariant under a global change in sign. Of course all of these approaches are mathematically sound but they nevertheless lead to different random number realizations depending on the backend and as you pointed out also depending on the flavor of LAPACK.

Unfortunately I do not see a way to reformulate my model in a way to make this phase disappear. There simply is no further information which could produce additional constraints. The only option I see is to somehow fix the signs in a deterministic way.

I would be very interested in how you would go about making jax.random.multivariate_normal a mathematical function.

Edenhofer Dec 20, 2021
Author

@shoyer I hope my last reply did not come off as offensive or overly defensive. It was certainly not meant in this way. I am genuinely interested in how you would change jax.random.multivariate_normal to make it independent of the choice of backend! Any ideas would most probably have a direct translation to my problem.

P.S. I think for the Cholesky decomposition, the convention of a positive diagonal is used and the decomposition is thus unique. Hence, multivariate pseudo random numbers with method=cholesky (the default) are backend independent.

shoyer Dec 22, 2021
Collaborator

Nope, I just lost track of this discussion :).

Well, random sampling with some fixed covariance matrix isn't really a well defined function, either. But you can differentiate functions that calculate expectation values. I'm not an expert in this, but I would look into methods from reinforcement learning, like REINFORCE or the "reparameterization trick."

jakevdp Dec 22, 2021
Maintainer

If this issue causes random samples to differ between backends, I think that should be considered a bug. Can you open an issue with a minimal repro that produces different sequences of values on different backends?

Edenhofer Dec 22, 2021
Author

Nope, I just lost track of this discussion :).

Well, random sampling with some fixed covariance matrix isn't really a well defined function, either. But you can differentiate functions that calculate expectation values. I'm not an expert in this, but I would look into methods from reinforcement learning, like REINFORCE or the "reparameterization trick."

Ok, no worries. I am glad that I pinged you then :)

In a way this is the problem I am facing; The constraint (e.g. RVS should have some given covariance) provides insufficient information to uniquely specify a function. This problem arises while applying a form of the reparameterization trick. However, I am not aware of a general solution/convention for retrieving a unique matrix after applying some specific functions to the eigenvalues. For a few cases there is such a convention/solution like the Cholesky decomposition with a positive diagonal for taking the square root of a matrix. I have to admit though that I am not deeply familiar with the literature in this field. Unfortunately, I don't get where this or a similar problem arises in REINFORCE.

Anyways, thank you very much for your comments! I will do a bit more research in the literature about the reparametrization trick. Luckily I have a fallback plan (namely inventing some convention to resolve the degeneracies) if I don't find anything in the literature.

If this issue causes random samples to differ between backends, I think that should be considered a bug. Can you open an issue with a minimal repro that produces different sequences of values on different backends?

Thanks for the comment. jax.random.multivariate_normal with method="eigh" or method="svd" indeed yields backend-dependent random numbers. I'll be happy to file a bug about this and will keep a close eye on how you will address this :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Different (but mathematically correct) result of Eigendecomposition on GPU versus CPU #8904

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Different (but mathematically correct) result of Eigendecomposition on GPU versus CPU #8904

Uh oh!

Uh oh!

Edenhofer Dec 10, 2021

Background

Replies: 1 comment · 5 replies

Uh oh!

shoyer Dec 11, 2021 Collaborator

Uh oh!

Edenhofer Dec 13, 2021 Author

Uh oh!

Edenhofer Dec 20, 2021 Author

Uh oh!

Uh oh!

shoyer Dec 22, 2021 Collaborator

Uh oh!

jakevdp Dec 22, 2021 Maintainer

Uh oh!

Edenhofer Dec 22, 2021 Author

Edenhofer
Dec 10, 2021

Replies: 1 comment 5 replies

shoyer
Dec 11, 2021
Collaborator

Edenhofer Dec 13, 2021
Author

Edenhofer Dec 20, 2021
Author

shoyer Dec 22, 2021
Collaborator

jakevdp Dec 22, 2021
Maintainer

Edenhofer Dec 22, 2021
Author