WIP: add adjoint for eigen by baggepinnen · Pull Request #327 · FluxML/Zygote.jl

baggepinnen · 2019-09-12T06:02:51Z

Works for eigenvalues but there seems to be an issue with the eigenvectors I can't figure out.

I'm also a bit unsure how to best test the results, the current approach of using magnitude etc. might not be ideal.

Reference: https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf

CarloLucibello · 2019-09-12T12:05:53Z

src/lib/array.jl

+      (inv(V)'*Diagonal(Δe)*V', )
+    elseif Δe === nothing
+      F = [i==j ? 0 : inv(e[j] - e[i]) for i=1:n, j=1:n]
+      (inv(V)'*(F .* (V'ΔV))*V', )


Maybe this is more gpu friendly

F = inv.(e .- e') F[diagind(F)] .= 0

I was struggling to find the solution you hinted at with diagind, thanks!

I wonder if it is safe to divide by zero on the gpu or it could throw an error

mateuszbaran · 2019-09-13T08:45:21Z

Nice! For a test, you could maybe check it against my implementation of forward mode AD of eigen? It's here: JuliaManifolds/Manifolds.jl#27 (comment) (based on the same paper).

willtebbutt · 2019-09-25T16:27:23Z

Is there anything in particular holding this up? From the looks of the test failures, there's still more work required -- is it known what the source of the issue is?

baggepinnen · 2019-09-25T23:12:18Z

The issue is that I don't understand what's wrong with the eigenvectors, I believe zygote might be following a different convention compared to the reference I implemented from.

sethaxen · 2019-09-27T22:52:40Z

I know this isn't working yet, but another thing I'd like to see is a test that it computes the adjoint correctly for complex matrices. Also, currently the test throws away the imaginary parts of the eigenvalues and eigenvectors, which isn't really testing the full adjoint.

antoine-levitt · 2019-12-02T07:16:27Z

I haven't looked at this properly, but note that eigenvectors are only defined up to a phase (and degenerate eigenvectors up to a rotation). So things like sum(real.(e.vectors)) are not defined and should not be differentiated, which might be your problem here @baggepinnen . The only valid outputs are things that are gauge-invariant (do not depend on the phase), like X[:,1]' B X[:,1] where B is another matrix, or, in the case of degenerate - or almost degenerate - eigenvectors, a sum over degenerate eigenvectors of such terms. That ensures that the 1/0 terms in the adjoint get compensated by another 0. It might be useful to introduce such a check, that the V'DeltaV corresponding to almost-degenerate eigenvectors are almost zero, and bypass the division there.

sethaxen · 2019-12-02T07:40:16Z

@antoine-levitt I ran into this as well while working on #355. I recall thinking that could be the issue here and adapting those tests for this PR and still errors for eigenvectors, but I could be misremembering. For what it's worth, there are a few papers that cover adjoints of the eigendecomposition and have a more complicated form than that covered in Giles (e.g. this paper). I also seem to recall trying one of these implementations and having no luck, but someone should try again.

The moral of the story is when I go down a rabbit hole and fail I should probably immediately summarize in a comment for posterity. 🤷‍♂

antoine-levitt · 2019-12-02T07:50:24Z

It's not a question of more complicated expressions, it's that "taking the eigenvectors of a matrix" is a not in general a differentiable operation.

eigenvectors are not uniquely defined, only eigenspaces are. This shows up eg as an arbitrariness of the imaginary part (for real symmetric problems) of the derivative of the eigenvectors, which is conventionally fixed to zero.
even eigenspaces are not well-defined at eigenvalue degeneracies (think of the identity matrix). This shows up as a division by zero in the derivative.

The big thing here is that just because the derivative of the eigenvectors is not defined does not mean the total computation isn't (because well-behaved computations do not care about the phase, or about the rotation in the case of degenerate eigenvectors). But this is tricky to express in AD.

This is for hermitian (or normal) problems. For non-hermitian problems you have the above problems, plus as a bonus the eigenvector matrix might not be invertible.

MikeInnes · 2019-12-05T11:28:17Z

So things like sum(real.(e.vectors)) are not defined and should not be differentiated

I think there's a conceptual gap here that often comes up in AD: it's usually good enough to think of Julia code as being equivalent to the mathematical operation it names. But they are actually not the same thing, they just behave closely enough that you can substitute one for the other in a lot of the cases you're interested in.

In this case, yes "the sum of the real parts of the eigenvectors" may not be well defined, but the Julia program sum(real.(e.vectors)) is perfectly well defined (if only by the code that implements it) and can be differentiated.

Although this sounds messy, it does work: if you implement the derivative of the Julia program, then in cases it's well-behaved -- i.e. gets a mathematically sensible result -- you'll get a mathematically sensible derivative too. That's pretty much how we deal with complex differentiation, too.

antoine-levitt · 2019-12-05T15:27:09Z

I agree with what you said, but two caveats

if the thing is not well defined, the code will do a "best effort" to get something sensible (eg pick sign / phase of eigenvectors). That's fine, but will produce discontinuities at random points
even if the thing is well defined, dividing by small numbers is tricky. Imagine your output is Tr(X'BX) where X is the matrix of the first two eigenvectors associated with the smallest eigenvalues of a (symmetric) matrix A (this is typical in a number of applications). If the two smallest eigenvalues are well separated from the rest of the spectrum, this is a well defined (differentiable) output wrt A and B. But if the two eigenvalues are close together, then the adjoint divides by this difference. This is theoretically OK because the numerator also contains this difference, but in floating point arithmetic, we lose precision. I'm not sure there's a clean way out of this, but it has to be kept in mind (and possibly documented somewhere)

CarloLucibello · 2023-01-24T05:41:30Z

This PR doesn't seem to be needed anymore:

julia> A = rand(3,3)
3×3 Matrix{Float64}:
 0.806359  0.116632  0.252335
 0.2195    0.654797  0.480057
 0.456541  0.729541  0.126571

julia> B = A * A'
3×3 Matrix{Float64}:
 0.72749   0.374502  0.485162
 0.374502  0.707395  0.638673
 0.485162  0.638673  0.756679

julia> gradient(x-> sum(abs.(eigen(x).values)), B)[1]
3×3 Matrix{Float64}:
 1.0  0.0  -2.22045e-16
 0.0  1.0   2.22045e-16
 0.0  0.0   1.0

julia> gradient(x-> sum(abs.(eigen(x).values)), A)[1]
3×3 Matrix{Float64}:
  0.930532  -0.179063   0.385769
 -0.190663   0.508539   1.05879
  0.259142   0.667975  -0.439072

WIP: add adjoint for eigen

7c808c7

CarloLucibello reviewed Sep 12, 2019

View reviewed changes

MikeInnes force-pushed the master branch from 0d5df1b to 8f30e9d Compare October 14, 2019 12:16

CarloLucibello closed this Jan 24, 2023

Uh oh!

Conversation

baggepinnen commented Sep 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CarloLucibello Sep 12, 2019

Choose a reason for hiding this comment

Uh oh!

baggepinnen Sep 12, 2019

Choose a reason for hiding this comment

Uh oh!

CarloLucibello Sep 12, 2019

Choose a reason for hiding this comment

Uh oh!

mateuszbaran commented Sep 13, 2019

Uh oh!

willtebbutt commented Sep 25, 2019

Uh oh!

baggepinnen commented Sep 25, 2019

Uh oh!

sethaxen commented Sep 27, 2019

Uh oh!

antoine-levitt commented Dec 2, 2019

Uh oh!

sethaxen commented Dec 2, 2019

Uh oh!

antoine-levitt commented Dec 2, 2019

Uh oh!

MikeInnes commented Dec 5, 2019

Uh oh!

antoine-levitt commented Dec 5, 2019

Uh oh!

CarloLucibello commented Jan 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

baggepinnen commented Sep 12, 2019 •

edited

Loading