fix(pytorch): Correct gradient for array-valued functions in wrapper (#2608)

yaugenst-flex · txdai · web-flow · commit b01733e11de3 · 2025-07-01T09:04:51.000Z
* fix(pytorch): Correct gradient for array-valued functions in wrapper

The `to_torch` wrapper, which connects `autograd` functions to PyTorch's autograd system, failed to compute correct gradients for functions that returned multi-element arrays.

The root cause was in the `_Wrapper.backward` method:
1.  The vector-Jacobian product function (`vjp`) was called with an array of ones instead of the true upstream gradient (`grad_output`).
2.  The result was then incorrectly multiplied by `grad_output` again.

This worked by coincidence for scalar outputs, where the upstream gradient is often `1.0`, but produced incorrect gradients for array outputs.

This commit corrects the implementation by passing the NumPy-converted `grad_output` directly to the `vjp` function and removing the subsequent redundant multiplication. The wrapper now correctly supports differentiation through functions that return tensors of any shape.

* test(pytorch): Add test for array-valued function gradients

---------

Co-authored-by: Tianxiang Dai &lt;txdai@stanford.edu&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,16 +11,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Add support for `np.unwrap` in `tidy3d.plugins.autograd`.
 - Add Nunley variant to germanium material library based on Nunley et al. 2016 data.
 
-### Fixed
-- Arrow lengths are now scaled consistently in the X and Y directions, and their lengths no longer exceed the height of the plot window.
-- Bug in `PlaneWave` defined with a negative `angle_theta` which would lead to wrong injection.
-- Plots of objects defined by shape intersection logic will no longer display thin line artifacts.
-
 ### Changed
 - Switched to an analytical gradient calculation for spatially-varying pole-residue models (`CustomPoleResidue`).
-- Significantly improved performance of the `tidy3d.plugins.autograd.grey_dilation` morphological operation and its gradient calculation. The new implementation is orders of magnitude faster, especially for large arrays and kernel sizes.
 - `GaussianBeam` and `AstigmaticGaussianBeam` default `num_freqs` reset to 1 (it was set to 3 in v2.8.0) and a warning is issued for a broadband, angled beam for which `num_freqs` may not be sufficiently large.
 - Set the maximum `num_freqs` to 20 for all broadband sources (we have been warning about the introduction of this hard limit for a while).
+- Significantly improved performance of the `tidy3d.plugins.autograd.grey_dilation` morphological operation and its gradient calculation. The new implementation is orders of magnitude faster, especially for large arrays and kernel sizes.
+
+### Fixed
+- Arrow lengths are now scaled consistently in the X and Y directions, and their lengths no longer exceed the height of the plot window.
+- Bug in `PlaneWave` defined with a negative `angle_theta` which would lead to wrong injection.
+- Plots of objects defined by shape intersection logic will no longer display thin line artifacts.
+- Fixed incorrect gradient computation in PyTorch plugin (`to_torch`) for functions returning multi-element arrays.
 
 ## [2.9.0rc1] - 2025-06-10
 
diff --git a/tests/test_plugins/pytorch/test_wrapper.py b/tests/test_plugins/pytorch/test_wrapper.py
@@ -44,3 +44,42 @@ def f_np(x, y):
     expected_grad = elementwise_grad(f_np, argnum=[0, 1])(x_np, y_np)
 
     assert_allclose(grad, expected_grad)
+
+
+def test_to_torch_array_valued_function(rng):
+    """Test that gradients are computed correctly for functions returning arrays with different shapes than input."""
+    x_np = rng.uniform(-1, 1, (2, 2)).astype("f4")
+    x_torch = torch.tensor(x_np, requires_grad=True)
+
+    # define a function that returns a different shape than input
+    # this function maps (2,2) -> (2,3)
+    def f_np(x):
+        return anp.stack([x.sum(axis=1), x.mean(axis=1) * 2, x[:, 0] * x[:, 1]], axis=1)
+
+    f_torch = to_torch(f_np)
+
+    output = f_torch(x_torch)
+    assert output.shape == (2, 3)
+
+    # create upstream gradient (simulating backprop from a loss)
+    grad_output = torch.ones_like(output)  # shape (2, 3)
+
+    output.backward(grad_output)
+
+    h = 1e-5
+    expected_grad = anp.zeros_like(x_np)
+
+    for i in range(x_np.shape[0]):
+        for j in range(x_np.shape[1]):
+            x_plus = x_np.copy()
+            x_plus[i, j] += h
+            x_minus = x_np.copy()
+            x_minus[i, j] -= h
+
+            f_plus = f_np(x_plus)
+            f_minus = f_np(x_minus)
+
+            expected_grad[i, j] = anp.sum((f_plus - f_minus) / (2 * h))
+
+    computed_grad = x_torch.grad.numpy()
+    assert_allclose(computed_grad, expected_grad, rtol=1e-3, atol=1e-3)
diff --git a/tidy3d/plugins/pytorch/wrapper.py b/tidy3d/plugins/pytorch/wrapper.py
@@ -4,7 +4,6 @@
 
 import torch
 from autograd import make_vjp
-from autograd.extend import vspace
 
 
 def to_torch(fun):
@@ -79,10 +78,11 @@ def forward(ctx, *args):
 
         @staticmethod
         def backward(ctx, grad_output):
-            _grads = ctx.vjp(vspace(grad_output.detach().cpu().numpy()).ones())
+            numpy_grad_output = grad_output.detach().cpu().numpy()
+            _grads = ctx.vjp(numpy_grad_output)
             grads = [None] * ctx.num_args
             for idx, grad in zip(ctx.grad_argnums, _grads):
-                grads[idx] = torch.as_tensor(grad, device=ctx.device) * grad_output
+                grads[idx] = torch.as_tensor(grad, device=ctx.device)
             return tuple(grads)
 
     def apply(*args, **kwargs):