-
Notifications
You must be signed in to change notification settings - Fork 10
feat(autogram): Remove batched optimizations #470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Remove FunctionalJacobianComputer * Remove args and kwargs from interface of JacobianComputer, GramianComputer and JacobianAccumulator because they were only needed for the functional interface * Remove kwargs from interface of Hook and stop registering it with with_kwargs=True (args are mandatory though, so rename them as _). * Change JacobianComputer to compute generalized jacobians (shape [m0, ..., mk, n]) and change GramianComputer to compute optional generalized gramians (shape [m0, ..., mk, mk, ..., m0]) * Change engine.compute_gramian to always simply do one vmap level per dimension of the output, without caring about the batch_dim. * Remove all reshapes and movedims in engine.compute_gramian: we don't need reshape anymore since the gramian is directly a generalized gramian, and we dont need movedim anymore since we vmap on all dimensions the same way, without having to put the non-batched dim in front. Merge compute_gramian and _compute_square_gramian. * Use a DiagonalSparseTensor as initial jac_output of compute_gramian.
Codecov Report✅ All modified and coverable lines are covered by tests.
🚀 New features to boost your workflow:
|
|
This PR is a pre-requesite to be able to use DiagonalSparseTensors. It highly simplifies the engine, making all the necessary changes so that the optimization is now all about what type of tensor we give a jac_output. So in a future PR (after #466 is merged), we will be able to simply change: jac_output = _make_initial_jac_output(output)by jac_output = DiagonalSparseTensor(...)and to remove In fact, it even works if we cherry-pick this into #466 and use a DiagonalSparseTensor as jac_output, but it's densified super quickly so it's not really using sparsity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah bravo Nils, le code il a disparu mais c'est pas grave. (LGTM after few discussions)
Basically the idea of this PR is to remove the batched optimization, because this optimization should be made internally by backpropagating a diagonal sparse jacobian.
Compared to main, this simplifies a lot of things. The batched optimization was done in FunctionalJacobianComputer, but required a different usage compared to AutogradJacobianComputer, which made the engine require special cases based on the batch_dim, which in turn required the user to provide the batch_dim. I think all of this can be dropped.