Skip to content

WIP: add adjoint for eigen#327

Closed
baggepinnen wants to merge 1 commit intoFluxML:masterfrom
baggepinnen:eigen
Closed

WIP: add adjoint for eigen#327
baggepinnen wants to merge 1 commit intoFluxML:masterfrom
baggepinnen:eigen

Conversation

@baggepinnen
Copy link
Contributor

@baggepinnen baggepinnen commented Sep 12, 2019

Works for eigenvalues but there seems to be an issue with the eigenvectors I can't figure out.

I'm also a bit unsure how to best test the results, the current approach of using magnitude etc. might not be ideal.

Reference: https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf

(inv(V)'*Diagonal(Δe)*V', )
elseif Δe === nothing
F = [i==j ? 0 : inv(e[j] - e[i]) for i=1:n, j=1:n]
(inv(V)'*(F .* (V'ΔV))*V', )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is more gpu friendly

F = inv.(e .- e')
F[diagind(F)] .= 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was struggling to find the solution you hinted at with diagind, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is safe to divide by zero on the gpu or it could throw an error

@mateuszbaran
Copy link

Nice! For a test, you could maybe check it against my implementation of forward mode AD of eigen? It's here: JuliaManifolds/Manifolds.jl#27 (comment) (based on the same paper).

@willtebbutt
Copy link
Member

Is there anything in particular holding this up? From the looks of the test failures, there's still more work required -- is it known what the source of the issue is?

@baggepinnen
Copy link
Contributor Author

The issue is that I don't understand what's wrong with the eigenvectors, I believe zygote might be following a different convention compared to the reference I implemented from.

@sethaxen
Copy link
Contributor

I know this isn't working yet, but another thing I'd like to see is a test that it computes the adjoint correctly for complex matrices. Also, currently the test throws away the imaginary parts of the eigenvalues and eigenvectors, which isn't really testing the full adjoint.

@antoine-levitt
Copy link
Contributor

I haven't looked at this properly, but note that eigenvectors are only defined up to a phase (and degenerate eigenvectors up to a rotation). So things like sum(real.(e.vectors)) are not defined and should not be differentiated, which might be your problem here @baggepinnen . The only valid outputs are things that are gauge-invariant (do not depend on the phase), like X[:,1]' B X[:,1] where B is another matrix, or, in the case of degenerate - or almost degenerate - eigenvectors, a sum over degenerate eigenvectors of such terms. That ensures that the 1/0 terms in the adjoint get compensated by another 0. It might be useful to introduce such a check, that the V'DeltaV corresponding to almost-degenerate eigenvectors are almost zero, and bypass the division there.

@sethaxen
Copy link
Contributor

sethaxen commented Dec 2, 2019

@antoine-levitt I ran into this as well while working on #355. I recall thinking that could be the issue here and adapting those tests for this PR and still errors for eigenvectors, but I could be misremembering. For what it's worth, there are a few papers that cover adjoints of the eigendecomposition and have a more complicated form than that covered in Giles (e.g. this paper). I also seem to recall trying one of these implementations and having no luck, but someone should try again.

The moral of the story is when I go down a rabbit hole and fail I should probably immediately summarize in a comment for posterity. 🤷‍♂

@antoine-levitt
Copy link
Contributor

It's not a question of more complicated expressions, it's that "taking the eigenvectors of a matrix" is a not in general a differentiable operation.

  • eigenvectors are not uniquely defined, only eigenspaces are. This shows up eg as an arbitrariness of the imaginary part (for real symmetric problems) of the derivative of the eigenvectors, which is conventionally fixed to zero.
  • even eigenspaces are not well-defined at eigenvalue degeneracies (think of the identity matrix). This shows up as a division by zero in the derivative.

The big thing here is that just because the derivative of the eigenvectors is not defined does not mean the total computation isn't (because well-behaved computations do not care about the phase, or about the rotation in the case of degenerate eigenvectors). But this is tricky to express in AD.

This is for hermitian (or normal) problems. For non-hermitian problems you have the above problems, plus as a bonus the eigenvector matrix might not be invertible.

@MikeInnes
Copy link
Member

So things like sum(real.(e.vectors)) are not defined and should not be differentiated

I think there's a conceptual gap here that often comes up in AD: it's usually good enough to think of Julia code as being equivalent to the mathematical operation it names. But they are actually not the same thing, they just behave closely enough that you can substitute one for the other in a lot of the cases you're interested in.

In this case, yes "the sum of the real parts of the eigenvectors" may not be well defined, but the Julia program sum(real.(e.vectors)) is perfectly well defined (if only by the code that implements it) and can be differentiated.

Although this sounds messy, it does work: if you implement the derivative of the Julia program, then in cases it's well-behaved -- i.e. gets a mathematically sensible result -- you'll get a mathematically sensible derivative too. That's pretty much how we deal with complex differentiation, too.

@antoine-levitt
Copy link
Contributor

I agree with what you said, but two caveats

  • if the thing is not well defined, the code will do a "best effort" to get something sensible (eg pick sign / phase of eigenvectors). That's fine, but will produce discontinuities at random points
  • even if the thing is well defined, dividing by small numbers is tricky. Imagine your output is Tr(X'BX) where X is the matrix of the first two eigenvectors associated with the smallest eigenvalues of a (symmetric) matrix A (this is typical in a number of applications). If the two smallest eigenvalues are well separated from the rest of the spectrum, this is a well defined (differentiable) output wrt A and B. But if the two eigenvalues are close together, then the adjoint divides by this difference. This is theoretically OK because the numerator also contains this difference, but in floating point arithmetic, we lose precision. I'm not sure there's a clean way out of this, but it has to be kept in mind (and possibly documented somewhere)

@CarloLucibello
Copy link
Member

This PR doesn't seem to be needed anymore:

julia> A = rand(3,3)
3×3 Matrix{Float64}:
 0.806359  0.116632  0.252335
 0.2195    0.654797  0.480057
 0.456541  0.729541  0.126571

julia> B = A * A'
3×3 Matrix{Float64}:
 0.72749   0.374502  0.485162
 0.374502  0.707395  0.638673
 0.485162  0.638673  0.756679

julia> gradient(x-> sum(abs.(eigen(x).values)), B)[1]
3×3 Matrix{Float64}:
 1.0  0.0  -2.22045e-16
 0.0  1.0   2.22045e-16
 0.0  0.0   1.0

julia> gradient(x-> sum(abs.(eigen(x).values)), A)[1]
3×3 Matrix{Float64}:
  0.930532  -0.179063   0.385769
 -0.190663   0.508539   1.05879
  0.259142   0.667975  -0.439072

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants