Skip to content

Reusable and NonReusable capability #592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chengchingwen
Copy link

addressed in #591

cc: @oxinabox, @mzgubic

@codecov-commenter
Copy link

codecov-commenter commented Oct 23, 2022

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.16%. Comparing base (9c8fcd2) to head (7f47c2e).
⚠️ Report is 155 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #592   +/-   ##
=======================================
  Coverage   93.16%   93.16%           
=======================================
  Files          15       15           
  Lines         907      907           
=======================================
  Hits          845      845           
  Misses         62       62           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@oxinabox
Copy link
Member

Before merging I want to see a PR on Zygote where this is hooked into the code for gradient vs jacobian.
So we know this isn't a feature that won't be used and will just go stale

@chengchingwen
Copy link
Author

Still working on that. I tried to hook that into Zygote with the idea of injecting the information into Context. However, it break some of the test of Zygote and I haven't figure out what was the cause. I would mark this PR as ready if I get things to work in Zygote.

p.s. if anyone have time and know how to make it work, feel free to take over.

@ToucheSir
Copy link
Contributor

ToucheSir commented Oct 28, 2022

Happy to help workshop the Zygote PR. I can't think of too many places where we'd be able to use it without adding new rules, however, given that the rules which would use it are likely to use mutation and there aren't many second-order rules for those rules.

@chengchingwen
Copy link
Author

I can't think of too many places where we'd be able to use it without adding new rules, however, given that the rules which would use it are likely to use mutation and there aren't many second-order rules for those rules.

We should be able to use it to add new optimized rules. Ideally the change shouldn't affect all the old rule and we could decide whether to add an optimized rule with mutation. But if we are adding those optimized rules, we also need to get the info about whether the pullback is execute inside an AD context to avoid breaking higher-order AD. Do we have a interface for second-order rules? It seems Zygote is using jacobian with gradient to compute second-order derivative and we can't specify a rule for second-order directly?

@ToucheSir
Copy link
Contributor

Do we have a interface for second-order rules?

I don't think so. This is something I'd really like to see in CRC, but it's not clear to me what a higher-order rules interface would look like. Diffractor appears to have something for this (usage example here?), but I don't quite understand how it works.

It seems Zygote is using jacobian with gradient to compute second-order derivative and we can't specify a rule for second-order directly?

AIUI Zygote only needs jacobian if you're calculating the full hessian. Otherwise nesting gradient should be enough. To your point though, there isn't any way to define a second order rule.

On possible ideas, I've been reading through https://github.com/pytorch/pytorch/blob/master/tools/autograd/derivatives.yaml and their model looks the most similar to FluxML/NNlib.jl#434. As noted on that PR though, such a model might not work for all ChainRules-compatible ADs. Another way that works today would be to manually split out an augmented_primal and callable PullbackMemory like in https://juliadiff.org/ChainRulesCore.jl/stable/design/changing_the_primal.html, then define rules for each. This would require a decent amount more work and code, but it should work today with any AD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants