Conversation
| (inv(V)'*Diagonal(Δe)*V', ) | ||
| elseif Δe === nothing | ||
| F = [i==j ? 0 : inv(e[j] - e[i]) for i=1:n, j=1:n] | ||
| (inv(V)'*(F .* (V'ΔV))*V', ) |
There was a problem hiding this comment.
Maybe this is more gpu friendly
F = inv.(e .- e')
F[diagind(F)] .= 0There was a problem hiding this comment.
I was struggling to find the solution you hinted at with diagind, thanks!
There was a problem hiding this comment.
I wonder if it is safe to divide by zero on the gpu or it could throw an error
|
Nice! For a test, you could maybe check it against my implementation of forward mode AD of |
|
Is there anything in particular holding this up? From the looks of the test failures, there's still more work required -- is it known what the source of the issue is? |
|
The issue is that I don't understand what's wrong with the eigenvectors, I believe zygote might be following a different convention compared to the reference I implemented from. |
|
I know this isn't working yet, but another thing I'd like to see is a test that it computes the adjoint correctly for complex matrices. Also, currently the test throws away the imaginary parts of the eigenvalues and eigenvectors, which isn't really testing the full adjoint. |
|
I haven't looked at this properly, but note that eigenvectors are only defined up to a phase (and degenerate eigenvectors up to a rotation). So things like |
|
@antoine-levitt I ran into this as well while working on #355. I recall thinking that could be the issue here and adapting those tests for this PR and still errors for eigenvectors, but I could be misremembering. For what it's worth, there are a few papers that cover adjoints of the eigendecomposition and have a more complicated form than that covered in Giles (e.g. this paper). I also seem to recall trying one of these implementations and having no luck, but someone should try again. The moral of the story is when I go down a rabbit hole and fail I should probably immediately summarize in a comment for posterity. 🤷♂ |
|
It's not a question of more complicated expressions, it's that "taking the eigenvectors of a matrix" is a not in general a differentiable operation.
The big thing here is that just because the derivative of the eigenvectors is not defined does not mean the total computation isn't (because well-behaved computations do not care about the phase, or about the rotation in the case of degenerate eigenvectors). But this is tricky to express in AD. This is for hermitian (or normal) problems. For non-hermitian problems you have the above problems, plus as a bonus the eigenvector matrix might not be invertible. |
I think there's a conceptual gap here that often comes up in AD: it's usually good enough to think of Julia code as being equivalent to the mathematical operation it names. But they are actually not the same thing, they just behave closely enough that you can substitute one for the other in a lot of the cases you're interested in. In this case, yes "the sum of the real parts of the eigenvectors" may not be well defined, but the Julia program Although this sounds messy, it does work: if you implement the derivative of the Julia program, then in cases it's well-behaved -- i.e. gets a mathematically sensible result -- you'll get a mathematically sensible derivative too. That's pretty much how we deal with complex differentiation, too. |
|
I agree with what you said, but two caveats
|
|
This PR doesn't seem to be needed anymore: julia> A = rand(3,3)
3×3 Matrix{Float64}:
0.806359 0.116632 0.252335
0.2195 0.654797 0.480057
0.456541 0.729541 0.126571
julia> B = A * A'
3×3 Matrix{Float64}:
0.72749 0.374502 0.485162
0.374502 0.707395 0.638673
0.485162 0.638673 0.756679
julia> gradient(x-> sum(abs.(eigen(x).values)), B)[1]
3×3 Matrix{Float64}:
1.0 0.0 -2.22045e-16
0.0 1.0 2.22045e-16
0.0 0.0 1.0
julia> gradient(x-> sum(abs.(eigen(x).values)), A)[1]
3×3 Matrix{Float64}:
0.930532 -0.179063 0.385769
-0.190663 0.508539 1.05879
0.259142 0.667975 -0.439072 |
Works for eigenvalues but there seems to be an issue with the eigenvectors I can't figure out.
I'm also a bit unsure how to best test the results, the current approach of using magnitude etc. might not be ideal.
Reference: https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf