Skip to content

Solves RuntimeError in Torch optimization, and NaN in scaled torch optimization#485

Merged
manuelFragata merged 1 commit intomasterfrom
bugfixes/optimization_related_torch
Feb 21, 2026
Merged

Solves RuntimeError in Torch optimization, and NaN in scaled torch optimization#485
manuelFragata merged 1 commit intomasterfrom
bugfixes/optimization_related_torch

Conversation

@manuelFragata
Copy link
Collaborator

The issue #483 is solved with the following logic:
1. Fix TorchBaseOptimizer and parameter space consistency

Files: base.py in torch optimizers

torch optimizers now work consistently in scaled parameter space, matching the scipy optimizers:

  • Init: var.variable.get_value()var.value (scaled)
  • Update: var.variable.update_value(param)var.update(param) (inverse-scales internally)
  • Bounds: var.bounds (unchanged — already scaled)

This ensures _apply_bounds() clamps parameters in the same space they live in, preventing catastrophic value corruption.


The issue #484 is solved with the following logic:

  1. Material n / k caching causes RuntimeError under torch + grad mode

When using the torch backend with gradient tracking enabled, calling .backward() more than once during optimization raises:
RuntimeError: Trying to backward through the graph a second time
(or directly access saved tensors after they have already been freed).

Why this happens: When BaseMaterial.n() computes a refractive index under torch, the result is a tensor connected to a computation graph. If that tensor is cached and reused in a later forward pass, the new .backward() call tries to traverse the old (already freed) graph — causing the RuntimeError.

The previous fix I applied privately bypassed the cache entirely whenever be.grad_mode.requires_grad is True:

if be.get_backend() == "torch" and be.grad_mode.requires_grad:
    return self._calculate_n(wavelength, **kwargs) # skip cache'

This solved the RuntimeError but introduced two problems:

Performance: Every call recomputes the refractive index from scratch, even for identical wavelengths. During optimization, the same materials are queried thousands of times — a significant penalty.
Broke the caching test: test_caching[backend=torch] failed because the cache was never populated.

The fix in this PR user a more robust approach - before caching, check what kind of result we got:

  • If the result requires grad (i.e., the index itself is an optimization variable like IdealMaterial(n=nn.Parameter(...))): skip cache, return fresh — gradient flow must be preserved.

  • If the result does not require grad (i.e., it's a constant like Material("N-BK7")): detach and cache — .detach() severs the link to the old computation graph so it can't cause the RuntimeError, and caching avoids redundant recomputation.

This gives us correct gradient behavior, no RuntimeError, and full caching performance for the common case where materials are constants.

Fixes #483 and #484

@codecov
Copy link

codecov bot commented Feb 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #485      +/-   ##
==========================================
+ Coverage   93.16%   93.18%   +0.01%     
==========================================
  Files         304      304              
  Lines       17951    17963      +12     
==========================================
+ Hits        16724    16738      +14     
+ Misses       1227     1225       -2     
Files with missing lines Coverage Δ
optiland/materials/base.py 97.26% <100.00%> (+0.53%) ⬆️
optiland/optimization/optimizer/torch/base.py 87.09% <100.00%> (+3.22%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@manuelFragata manuelFragata merged commit a431fac into master Feb 21, 2026
14 checks passed
@manuelFragata manuelFragata deleted the bugfixes/optimization_related_torch branch February 21, 2026 16:27
@HarrisonKramer
Copy link
Owner

Thanks for applying these fixes! Nice catches.

Kramer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Torch optimizers produce NaN loss with constrained variables

2 participants