Use pack and unpack in `minimize` and `root` #1806

jessegrabowski · 2026-01-01T02:36:18Z

Description

Allows optimize.minimize and optimize.root to be called with multiple inputs of arbitrary shapes, as in:

    x0, x1, x2 = pt.dvectors("x1", "x2", "d3")
    x3 = pt.dmatrix("x3")
    b0, b1, b2 = pt.dscalars("b0", "b1", "b2")
    b3 = pt.dvector("b3")

    y = pt.dvector("y")

    y_hat = x0 * b0 + x1 * b1 + x2 * b2 + x3 @ b3
    objective = ((y - y_hat) ** 2).sum()

    minimized_x, success = minimize(
        objective,
        [b0, b1, b2, b3],
        jac=True,
        hess=True,
        method="Newton-CG",
        use_vectorized_jac=True,
    )

Internally, pack and unpack are used to convert the problem to 1d and return results in the same shape as the inputs.

pack and unpack are also used in the gradients, to simplify the construction of the jacobian with respect to arguments to the optimization function. We should consider simply using pack/unpack in the jacobian function itself, and add an option to get back the unpacked form (what we currently give back -- the columns of the jacobian matrix) or the backed form (a single matrix).

Tests are failing because of a bug in scan, I'm going to have to beg @ricardoV94 to help me understand how to fix that.

This PR also adds L_op implementations for SplitDims and JoinDims. I found this was easier than constantly rewriting the graph to try to remove these ops. Their pullbacks are also SplitDims and JoinDims, so in the end the gradients will be rewritten into Reshape as well, so I don't see any harm.

Related Issue

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

jessegrabowski · 2026-01-01T02:44:32Z

The scan bug happens here, because shape information is being destroyed somewhere. If I comment out that check, all tests pass.

jessegrabowski · 2026-01-01T04:15:47Z

Regarding the notebook error reported in #1586, the notebooks runs now but with rewrite warnings. The specific rewrite has to do with squeeze, but it is arising because of the use of vectorize_graph on the gradients of root.

We potentially ought to rewrite to scalar_minimize in that case, but scipy.minimize handles this case gracefully and we should too.

Copilot

Pull request overview

This PR enhances optimize.minimize and optimize.root to accept multiple input variables of arbitrary shapes, addressing issues #1550, #1465, and #1586. The implementation uses pack and unpack operations to handle multiple variables by flattening them into a single vector for scipy optimization, then reshaping results back to their original forms.

Key Changes

Added pack/unpack support to minimize and root for handling multiple variables of different shapes
Implemented L_op (gradient) methods for JoinDims and SplitDims ops to support autodiff through pack/unpack operations
Refactored implict_optimization_grads to use packed variables internally, simplifying jacobian construction

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File	Description
pytensor/tensor/optimize.py	Added `_maybe_pack_input_variables_and_rewrite_objective` function; updated `minimize` and `root` signatures to accept sequences; refactored gradient computation to use packed variables
pytensor/tensor/reshape.py	Added `L_op` implementations for `JoinDims` and `SplitDims`; added `connection_pattern` to `SplitDims`; improved `unpack` to handle single-element lists without splitting
tests/tensor/test_optimize.py	Added comprehensive tests for multiple minimands, MVN logp gradient regression, multiple root inputs, and vectorized root gradients
tests/tensor/test_reshape.py	Added gradient verification tests for `join_dims` and `split_dims`; changed rewrite pass from "specialize" to "canonicalize"

pytensor/tensor/optimize.py

tests/tensor/test_optimize.py

pytensor/tensor/optimize.py

pytensor/tensor/reshape.py

pytensor/tensor/optimize.py

pytensor/tensor/reshape.py

pytensor/tensor/optimize.py

ricardoV94 · 2026-01-02T11:00:58Z

I'm not sure about moving the complexity of handling multiple inputs to our op helpers. Is it that hard to ask users to use pack/unpack themselves. This way the PR is also harder to review as you're doing new feature and bugfix together, and the changes are by no means trivial.

In doing so, you're also moving quite away from the scipy API so it will be less obvious how these work

jessegrabowski · 2026-01-02T15:39:11Z

I want this API as the front end. We're already far from the scipy API -- we don't take jac or hess functions, and we don't take args. The use-defined single vector case is still valid, so this is a strict upgrade with 100% backwards compat.

I was quite careful to keep the commits split by change, it's difficult to review commit by commit? I am willing to split them into separate PRs if you insist.

ricardoV94 · 2026-01-02T15:42:17Z

It's not hard to review commit by commit, but I have less trust that the bugfix commit fixed the bug, and not the API change logic for instance

jessegrabowski · 2026-01-02T15:42:59Z

Sure I can split it out then. I can also address #1466 in a single bugfix PR then circle back to this.

jessegrabowski · 2026-01-02T17:48:06Z

On further consideration, the other two bugs are directly addressed by this PR. I split the gradients out, since those are different.

Both #1550 #1586 are reporting the same bug. Root/minimize currently fail when computing gradients with respect args with ndim > 2. This PR will handle that natively by using pack/unpack. Specifically, this line assumes that the return from jacobian is always <2d, which isn't the case in general.

An intermediate PR would have to ravel all the args and do book-keeping on their shapes, which is exactly what pack/unpack are for. So I don't see anything to split out, except the L_ops, which I already did.

jessegrabowski · 2026-01-02T18:32:53Z

I got rid of the eager graph_rewrite, which fixed the scan bug I was hitting.

As a result, I had to implement some logic to filter non-differentiable args, which was a bug you had previously hit. The disconnected_inputs='raise' case works now, so I adjusted the test.

ricardoV94 · 2026-01-02T18:55:53Z

Sounds like nice progress. If you're just splitting the lop, don't bother, those are simple enough

jessegrabowski · 2026-01-02T18:59:51Z

Sounds like nice progress. If you're just splitting the lop, don't bother, those are simple enough

Too late. You're already tagged too.

ricardoV94 · 2026-01-02T19:13:56Z

Sounds like nice progress. If you're just splitting the lop, don't bother, those are simple enough

Too late. You're already tagged too.

You better not have used reshape

ricardoV94 · 2026-01-06T07:06:54Z

Shape of unpacked inputs (and indices) and most cases of integers will ~~show up as disconnected correctly~~. I'm saying just not every single integer input is of that nature.

Were you actually seeing integers used in unpack showing up as connected? That would mean connection_pattern helper isn't working as expected. You shouldn't need an extra manual filter.

Edit: That was indeed the problem, see next comment

ricardoV94 · 2026-01-06T11:17:34Z

Okay, so the confusion for me comes from Split Op not considering the gradient disconnected wrt to the split_size argument, which is puzzling (fixed in #1828). I'm opening an issue to discuss this #1827

The bigger issue is that io_connection_pattern doesn't give us all we want. It doesn't inform about the null gradients that are still "connected", whatever those mean (if they mean anything). See my revised proposal in the next comment.

ricardoV94 · 2026-01-06T12:16:01Z

@jessegrabowski This is a summary of my current understanding of what Minimize.L_op should do:

Implement connection_pattern, because pytensor.grad will be annoyed if you return disconnected_grad without telling it in advance, but don't rely on it in L_op because it's not sufficient.
Call gradient in the inner graph, asking it to return null types, and keep track of which are disconnected_grad/undefined_grad/not_implemented_grad. These can be removed from the internal jacobian, and the L_op, should just remember what it is to preserve the meaning of disconnected / undefined / not_implemented at the end. I think this is pretty much what Scan is doing in the snippet below.
If by any-chance there's a non-numerical input for which you get a non-null gradient back, still exclude it from the internal jacobian and return grad_not_implemented for this input. Supposedly it is differentiable but you just don't know how to handle it alongside regular TensorVariables. Think about SparseVariables (if they aren't supported yet), or a newer type you haven't seen that is also differentiable.
Do not worry whether the remaining numerical inputs are integers or floats, the gradient should make sense as per current PyTensor API. We can debate that, I don't care much about it, but I don't feel it has to be done in this specific PR and for now we should remain consistent.
If you still see variables that you feel shouldn't be differentiable wrt, this is more a discussion akin to Confusion between grad_undefined / grad_disconnected #1827 or Fix issues with split and split_dims #1828, and not Minimize specifically.

This approach would rule out the shape variables used in Split regardless of #1828, as they would still be linked to grad_undefined.

Does that make sense?

pytensor/pytensor/scan/op.py

Lines 2547 to 2555 in 79a4bc1

    
           grads = grad( 
        
               cost=None, 
        
               known_grads=known_grads, 
        
               wrt=wrt, 
        
               consider_constant=wrt, 
        
               disconnected_inputs="ignore", 
        
               return_disconnected="None", 
        
               null_gradients="return", 
        
           )

pytensor/pytensor/scan/op.py

Lines 3086 to 3110 in 79a4bc1

    
           if t == "connected": 
        
               # If the forward scan is in as_while mode, we need to pad 
        
               # the gradients, so that they match the size of the input 
        
               # sequences. 
        
               if info.as_while: 
        
                   n_zeros = inputs[0] - n_steps 
        
                   shp = (n_zeros,) 
        
                   if x.ndim > 1: 
        
                       shp = shp + tuple(x.shape[i] for i in range(1, x.ndim)) 
        
                   z = pt.zeros(shp, dtype=x.dtype) 
        
                   x = pt.concatenate([x[::-1], z], axis=0) 
        
                   gradients.append(x) 
        
               else: 
        
                   gradients.append(x[::-1]) 
        
           elif t == "disconnected": 
        
               gradients.append(DisconnectedType()()) 
        
           elif t == "through_untraced": 
        
               gradients.append( 
        
                   grad_undefined( 
        
                       self, p + 1, inputs[p + 1], "Depends on a untraced variable" 
        
                   ) 
        
               ) 
        
           else: 
        
               # t contains the "why_null" string of a NullType 
        
               gradients.append(NullType(t)())

Add broadcastable check before squeezing in `split_dims`

…gs in L_op

jessegrabowski · 2026-01-08T03:26:18Z

Tests are passing with #1806

I ended up having to use grad in the connection_pattern, because io_connection_pattern wasn't cutting it. Maybe you had something else in mind, but I hope not.

ricardoV94 · 2026-01-09T02:00:17Z

I ended up having to use grad in the connection_pattern, because io_connection_pattern wasn't cutting it. Maybe you had something else in mind, but I hope not.

You shouldn't have to? As long as you are not returing disconnected that were not in connection_pattern I think that's fine. You may be replacing nulltypes with disconnected accidentally?

And at most is should just result in a warning from PyTensor

pytensor/tensor/optimize.py

tests/tensor/test_optimize.py

ricardoV94 · 2026-01-09T02:08:43Z

tests/tensor/test_optimize.py

    )

-    with pytest.raises(NullTypeGradError):
+    with pytest.raises(DisconnectedInputError):


Is this why io_connection_pattern didn't suffice? It seems you converted NullType grads to Disconnected, beyond what would be implied by the connection pattern. It doesn't bother me too much, but could mean you can simplify that Op method to not call grad

No, see my comment below. My only other thought is that there's a bug upstream where the indexing grads are DisconnectedType but they should be NullType

tests/tensor/test_optimize.py

ricardoV94

This looks great, just some minor questions / outdated comments

ricardoV94 · 2026-01-09T02:11:53Z

pytensor/tensor/optimize.py

-        [atleast_2d(df_dx), df_dtheta], replace=replace
-    )
+            if arg_grad is None:
+                final_grads.append(DisconnectedType()())


My understanding is that if the original grad was NullType, you should return that, unless it would also be disconnected even if it wasn't Null

Not a biggie, but maybe why you needed the call to grad in connection_pattern method

My code here is incorrect, but its not the reason why i needed to call grad in the connection_pattern. For example, in the test_optimize_multiple_minimands test case, the (outer) args have the following types:

[(ExpandDims{axis=0}.0, TensorType(int8, shape=(1,))), (Subtensor{start:stop}.0, TensorType(int64, shape=(1,))), (Prod{axes=None}.0, TensorType(int64, shape=())), (Prod{axes=None}.0, TensorType(int64, shape=())), (Prod{axes=None}.0, TensorType(int64, shape=())), (input 7, TensorType(float64, shape=(100, 5))), (Subtensor{start:stop}.0, TensorType(int64, shape=(0,))), (input 6, TensorType(float64, shape=(100,))), (Subtensor{start:stop}.0, TensorType(int64, shape=(0,))), (input 5, TensorType(float64, shape=(100,))), (Subtensor{start:stop}.0, TensorType(int64, shape=(0,))), (input 4, TensorType(float64, shape=(100,))), (input 8, TensorType(float64, shape=(100,)))]

Here is the connection pattern generated by io_connection_pattern:

[[True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False], [True, False]]

And here are the gradients of the inner function:

[Squeeze{axis=0}.0, <DisconnectedType>, <DisconnectedType>, <DisconnectedType>, <DisconnectedType>, Reshape{2}.0, <DisconnectedType>, Squeeze{axis=0}.0, <DisconnectedType>, Squeeze{axis=0}.0, <DisconnectedType>, Squeeze{axis=0}.0, Squeeze{axis=0}.0]

Let me take a look in the debugger

jessegrabowski added bug Something isn't working enhancement New feature or request feature request SciPy compatibility labels Jan 1, 2026

jessegrabowski force-pushed the optimize-use-pack branch 2 times, most recently from 2fd5ae0 to be39ef6 Compare January 1, 2026 04:12

jessegrabowski force-pushed the optimize-use-pack branch 3 times, most recently from d080317 to 8f4fed6 Compare January 1, 2026 05:32

jessegrabowski requested review from Copilot and ricardoV94 January 1, 2026 06:09

Copilot started reviewing on behalf of jessegrabowski January 1, 2026 06:10 View session

Copilot AI reviewed Jan 1, 2026

View reviewed changes

jessegrabowski force-pushed the optimize-use-pack branch 2 times, most recently from 66cf591 to 3e8a3f3 Compare January 1, 2026 21:07

ricardoV94 reviewed Jan 2, 2026

View reviewed changes

jessegrabowski mentioned this pull request Jan 2, 2026

Implement L_Op for join_dims and split_dims #1812

Merged

11 tasks

jessegrabowski force-pushed the optimize-use-pack branch from 3e8a3f3 to 468fd98 Compare January 2, 2026 17:49

jessegrabowski force-pushed the optimize-use-pack branch from 3facb1c to 83dfea1 Compare January 2, 2026 23:01

This was referenced Jan 6, 2026

Confusion between grad_undefined / grad_disconnected #1827

Closed

Fix issues with split and split_dims #1828

Open

jessegrabowski and others added 14 commits January 7, 2026 20:23

Add special case for unpacking a single input

5b549fb

Add broadcastable check before squeezing in `split_dims`

Use pack and unpack in minimize and root

9df4117

Add regression test for pymc-devs#1550

0878cd6

Add regression test for pymc-devs#1586

321de01

Dont eagerly rewrite graph in minimize/root helpers

6a48ffa

Don't use helper functions in split/join_dims L_ops

8a8126c

Remove special case for split_dims when shape is empty

7e31b30

Special case for scalars in implicit grad

5ac8f1a

Implement connection_pattern for optimization ops

427c056

Tidy up optimize.py

658108e

Don't cache connection_pattern

dd08513

Filter non-numeric types from connection_pattern

f15c866

Fix circular sparse imports

334d907

Use grad to determine connection pattern and filter disconnected ar…

2d4074b

…gs in L_op

jessegrabowski force-pushed the optimize-use-pack branch from 53eac5c to 2d4074b Compare January 8, 2026 03:24