Skip to content

Conversation

DhairyaLGandhi
Copy link
Member

@DhairyaLGandhi DhairyaLGandhi commented Sep 29, 2021

This PR revives working with RecursiveArrayTools (ref SciML/SciMLSensitivity.jl#493), and as discussed with @ChrisRackauckas , AD systems shouldn't be projecting internally, so we can reinstate RAT/ DiffEqSensitivity.

It is not known ahead of time where in the call stack a custom adjoint may be used or a custom type, we should allow for custom adjoints to manipulate code as they see fit.

Another change is that we allow complex gradients. These work most of the time, and the times they don't can be considered for bug fixes. I have a some methods which do technically do the same thing as _project in the accumulation stage rather than adding copies with every invocation of an adjoint.

@mcabbott
Copy link
Member

The conclusion of SciML/SciMLSensitivity.jl#493 is that RecursiveArrayTools needs a 3-line update to be fully compatible with ChainRulesCore 1.0.

If there are other issues, it would be helpful to describe and try to isolate them.

we allow complex gradients

Yes, Zygote still allows complex gradients. If you think something is wrong with them, you'll need to provide an example. I think you're proposing to re-break #342 etc, which isn't a great idea.

@DhairyaLGandhi
Copy link
Member Author

#342 has a pretty complete discussion about why the gradients are expected and correct, and with projection, it is possible to project it into the space of numbers the users want, which can be different for different cases. If the gradients are complex, the gradients are complex.

Beyond compat with RAT, it shouldn't be the AD that forces projection, but something either the compiler determines via promotion/ conversion or the user opts in via calling project explicitly.

The discussion in SciML/SciMLSensitivity.jl#493 also suggests that Zygote should be compatible with the broadcasting mechanism as RAT defines it since it satisfies the Base definition, even if by using fallbacks, which seems reasonable especially considering we were able to AD through it without special handling.

@ChrisRackauckas
Copy link
Member

The conclusion of SciML/SciMLSensitivity.jl#493 is that RecursiveArrayTools needs a 3-line update to be fully compatible with ChainRulesCore 1.0.

What 3 lines? I'll just add them.

The discussion in SciML/SciMLSensitivity.jl#493 also suggests that Zygote should be compatible with the broadcasting mechanism as RAT defines it since it satisfies the Base definition, even if by using fallbacks, which seems reasonable especially considering we were able to AD through it without special handling.

I thought the current issue was that broadcasting VectorOfArray produces an Array. This is the default behavior of AbstractArray objects though, so that's what I find confusing. If you define an AbstractArray object and just define indexing, x .+ x will give an Array, which is why I would've expected Zygote to handle that.

@mcabbott
Copy link
Member

Maybe we can discuss RAT on an issue there? The "3 lines" I mean are here: SciML/SciMLSensitivity.jl#493 (comment) .

The reason projection wants to be involved in broadcasting is that [1,2,3]' .+ 4 isa Matrix, but in reverse, we want to get back to an Adjoint vector. Likewise Diagonal([1,2,3]) .+ 4. To get this right it has to know some properties of x beyond its size. But the RAT examples linked involved iteration not broadcasting. Anyway, again, happy to discuss but perhaps this isn't the place.

@oxinabox
Copy link
Member

oxinabox commented Sep 29, 2021

This PR seems like it does at least 3 different things.
We probably should not try to talk about any of them here.
Because the conversation will go no where.

It should be broken up into several thing so each can be reviewed.

@ChrisRackauckas and @mcabbott should finish talking about ProjectTo for RecursiveArrayTools in SciML/SciMLSensitivity.jl#493

Beyond compat with RAT, it shouldn't be the AD that forces projection, but something either the compiler determines via promotion/ conversion or the user opts in via calling project explicitly.

@DhairyaLGandhi , @mcabbott and I should talk about this in a video call.
This is a pretty bold claim and goes against what a fair few people spent a whole lot of time thinking about.
Especially when you go back to the lead in discussions with worked out the complex number behavour (including but not limited to https://discourse.julialang.org/t/taking-complex-autodiff-seriously-in-chainrules/39317)
which was by people a fair bit more informed than me.
Could be true, could be false, could be we are talking about different things.
Almost certainly there are different understandings about what different things are doing and why.

But trying to talk this out in writing this thread is going to leads to a lot of time spent to reach a resolution that can be reached in a 30 minute call.

@ChrisRackauckas
Copy link
Member

Let's jump on a call in 5 minutes to flesh this out? I think it would be good to get some answers.

@CarloLucibello
Copy link
Member

closing as stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants