-
Notifications
You must be signed in to change notification settings - Fork 155
Use pack and unpack in minimize and root
#1806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The scan bug happens here, because shape information is being destroyed somewhere. If I comment out that check, all tests pass. |
2fd5ae0 to
be39ef6
Compare
|
Regarding the notebook error reported in #1586, the notebooks runs now but with rewrite warnings. The specific rewrite has to do with squeeze, but it is arising because of the use of We potentially ought to rewrite to scalar_minimize in that case, but scipy.minimize handles this case gracefully and we should too. |
d080317 to
8f4fed6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances optimize.minimize and optimize.root to accept multiple input variables of arbitrary shapes, addressing issues #1550, #1465, and #1586. The implementation uses pack and unpack operations to handle multiple variables by flattening them into a single vector for scipy optimization, then reshaping results back to their original forms.
Key Changes
- Added
pack/unpacksupport tominimizeandrootfor handling multiple variables of different shapes - Implemented
L_op(gradient) methods forJoinDimsandSplitDimsops to support autodiff through pack/unpack operations - Refactored
implict_optimization_gradsto use packed variables internally, simplifying jacobian construction
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| pytensor/tensor/optimize.py | Added _maybe_pack_input_variables_and_rewrite_objective function; updated minimize and root signatures to accept sequences; refactored gradient computation to use packed variables |
| pytensor/tensor/reshape.py | Added L_op implementations for JoinDims and SplitDims; added connection_pattern to SplitDims; improved unpack to handle single-element lists without splitting |
| tests/tensor/test_optimize.py | Added comprehensive tests for multiple minimands, MVN logp gradient regression, multiple root inputs, and vectorized root gradients |
| tests/tensor/test_reshape.py | Added gradient verification tests for join_dims and split_dims; changed rewrite pass from "specialize" to "canonicalize" |
66cf591 to
3e8a3f3
Compare
|
I'm not sure about moving the complexity of handling multiple inputs to our op helpers. Is it that hard to ask users to use pack/unpack themselves. This way the PR is also harder to review as you're doing new feature and bugfix together, and the changes are by no means trivial. In doing so, you're also moving quite away from the scipy API so it will be less obvious how these work |
|
I want this API as the front end. We're already far from the scipy API -- we don't take jac or hess functions, and we don't take args. The use-defined single vector case is still valid, so this is a strict upgrade with 100% backwards compat. I was quite careful to keep the commits split by change, it's difficult to review commit by commit? I am willing to split them into separate PRs if you insist. |
|
It's not hard to review commit by commit, but I have less trust that the bugfix commit fixed the bug, and not the API change logic for instance |
|
Sure I can split it out then. I can also address #1466 in a single bugfix PR then circle back to this. |
|
On further consideration, the other two bugs are directly addressed by this PR. I split the gradients out, since those are different. Both #1550 #1586 are reporting the same bug. Root/minimize currently fail when computing gradients with respect args with ndim > 2. This PR will handle that natively by using pack/unpack. Specifically, this line assumes that the return from jacobian is always <2d, which isn't the case in general. An intermediate PR would have to ravel all the args and do book-keeping on their shapes, which is exactly what pack/unpack are for. So I don't see anything to split out, except the L_ops, which I already did. |
3e8a3f3 to
468fd98
Compare
|
I got rid of the eager graph_rewrite, which fixed the scan bug I was hitting. As a result, I had to implement some logic to filter non-differentiable args, which was a bug you had previously hit. The |
|
Sounds like nice progress. If you're just splitting the lop, don't bother, those are simple enough |
Too late. You're already tagged too. |
You better not have used reshape |
3facb1c to
83dfea1
Compare
|
Shape of unpacked inputs (and indices) and most cases of integers will Were you actually seeing integers used in unpack showing up as connected? That would mean connection_pattern helper isn't working as expected. You shouldn't need an extra manual filter. Edit: That was indeed the problem, see next comment |
|
Okay, so the confusion for me comes from Split Op not considering the gradient disconnected wrt to the split_size argument, which is puzzling (fixed in #1828). I'm opening an issue to discuss this #1827 The bigger issue is that |
|
@jessegrabowski This is a summary of my current understanding of what Minimize.L_op should do:
This approach would rule out the shape variables used in Split regardless of #1828, as they would still be linked to Does that make sense? Lines 2547 to 2555 in 79a4bc1
Lines 3086 to 3110 in 79a4bc1
|
Add broadcastable check before squeezing in `split_dims`
53eac5c to
2d4074b
Compare
|
Tests are passing with #1806 I ended up having to use |
You shouldn't have to? As long as you are not returing disconnected that were not in connection_pattern I think that's fine. You may be replacing nulltypes with disconnected accidentally? And at most is should just result in a warning from PyTensor |
| ) | ||
|
|
||
| with pytest.raises(NullTypeGradError): | ||
| with pytest.raises(DisconnectedInputError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this why io_connection_pattern didn't suffice? It seems you converted NullType grads to Disconnected, beyond what would be implied by the connection pattern. It doesn't bother me too much, but could mean you can simplify that Op method to not call grad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, see my comment below. My only other thought is that there's a bug upstream where the indexing grads are DisconnectedType but they should be NullType
ricardoV94
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, just some minor questions / outdated comments
pytensor/tensor/optimize.py
Outdated
| [atleast_2d(df_dx), df_dtheta], replace=replace | ||
| ) | ||
| if arg_grad is None: | ||
| final_grads.append(DisconnectedType()()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that if the original grad was NullType, you should return that, unless it would also be disconnected even if it wasn't Null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a biggie, but maybe why you needed the call to grad in connection_pattern method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My code here is incorrect, but its not the reason why i needed to call grad in the connection_pattern. For example, in the test_optimize_multiple_minimands test case, the (outer) args have the following types:
[(ExpandDims{axis=0}.0, TensorType(int8, shape=(1,))),
(Subtensor{start:stop}.0, TensorType(int64, shape=(1,))),
(Prod{axes=None}.0, TensorType(int64, shape=())),
(Prod{axes=None}.0, TensorType(int64, shape=())),
(Prod{axes=None}.0, TensorType(int64, shape=())),
(input 7, TensorType(float64, shape=(100, 5))),
(Subtensor{start:stop}.0, TensorType(int64, shape=(0,))),
(input 6, TensorType(float64, shape=(100,))),
(Subtensor{start:stop}.0, TensorType(int64, shape=(0,))),
(input 5, TensorType(float64, shape=(100,))),
(Subtensor{start:stop}.0, TensorType(int64, shape=(0,))),
(input 4, TensorType(float64, shape=(100,))),
(input 8, TensorType(float64, shape=(100,)))]
Here is the connection pattern generated by io_connection_pattern:
[[True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False]]
And here are the gradients of the inner function:
[Squeeze{axis=0}.0, <DisconnectedType>, <DisconnectedType>, <DisconnectedType>, <DisconnectedType>, Reshape{2}.0, <DisconnectedType>, Squeeze{axis=0}.0, <DisconnectedType>, Squeeze{axis=0}.0, <DisconnectedType>, Squeeze{axis=0}.0, Squeeze{axis=0}.0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a look in the debugger
Description
Allows
optimize.minimizeandoptimize.rootto be called with multiple inputs of arbitrary shapes, as in:Internally,
packandunpackare used to convert the problem to 1d and return results in the same shape as the inputs.packandunpackare also used in the gradients, to simplify the construction of the jacobian with respect to arguments to the optimization function. We should consider simply using pack/unpack in thejacobianfunction itself, and add an option to get back the unpacked form (what we currently give back -- the columns of the jacobian matrix) or the backed form (a single matrix).Tests are failing because of a bug in
scan, I'm going to have to beg @ricardoV94 to help me understand how to fix that.This PR also adds
L_opimplementations forSplitDimsandJoinDims. I found this was easier than constantly rewriting the graph to try to remove these ops. Their pullbacks are alsoSplitDimsandJoinDims, so in the end the gradients will be rewritten intoReshapeas well, so I don't see any harm.Related Issue
pytensor.optimize.root#1465 BUG: Example in notebook foroptimizeno longer works #1586Checklist
Type of change