feat(autogram): Add `ModuleBasedGramianComputer`. #458

PierreQuinton · 2025-10-15T18:17:09Z

No description provided.

ValerianRey · 2025-10-15T19:54:14Z

This is big for architectures with very big linear layers, like AlexNet.
For AlexNet, on cuda, with batch_dim=0, this leads to:

Double max batch size (from batch_size=19 to batch_size=38 on my gpu). Sadly, this is still very far from the max batch size of 1268 of SGD with autograd). Do you think there might be a theoretical way to bridge this gap even more? EDIT: I have no idea why, but re-running the same tests (or maybe I did a mistake before) yields completely different results. Max batch size is now 18 for main and 468 for this PR - much, much closer to the 1268 of autograd). EDIT2: the big memory improvement (and a small speed improvement) comes from just installing opt_einsum (without even changing the code). uv pip install opt_einsum.
x2 to x4 speed (depending on the batch size) of the whole autogram_forward_backward function (so this includes not only the gramian computation of the linear layers, but also of all other layers + the forward passes).

For other architectures, the differences are not very noticeable though. But this is very promising. Let's fully focus on this direction IMO.

ValerianRey · 2025-10-15T20:18:20Z

src/torchjd/autogram/_gramian_computer.py

+        """
+
+        G_b = torch.einsum("ik,jk->ij", dY1, dY2)
+        G_W = torch.einsum("ik,il,jl,jk->ij", dY1, X, X, dY2)


This can be replaced by:

G_W = oe.contract("ik,il,jl,jk->ij", dY1, X, X, dY2, optimize="optimal", backend="torch")

with import opt_einsum as oe
but it seems to be the exact same runtime and memory usage.

Actually whenever opt_einsum is installed, the contraction is already done even without changing the line:

G_W = torch.einsum("ik,il,jl,jk->ij", dY1, X, X, dY2)

We could still add the line just to make it explicit maybe.

whatever you prefer. I prefer not having to give the two additional parameters, for me what is important here is

It is an einsum

It is fast

But the second criteria is more of an "how" than a "what" so we don't really need to know. For this reason I would vouch slightly for torch.einsum. The negative part is that a user could set the global settings of opt_einsum to non-optimized thereby making it slow, but I guess that is the user's responsability.

ValerianRey · 2025-10-15T20:33:02Z

src/torchjd/autogram/_gramian_computer.py

+
+    def _compute_gramian(self, dY1: Tensor, dY2: Tensor, X: Tensor) -> Tensor:
+        """
+        X is a matrix of shape [k, n] and dY1, dY2 are matrices of shape [k, m].


There's actually no guarantee that X, dY1 and dY2 are matrices.

From the documentation of nn.Linear:

In particular, when there is no batch dim, I think the * dimension could be empty, and in transformers, the * dimension is (batch_size, seq_length), which is why transformers fail with this PR.

I made it work on Transformers with that:

if dY1.ndim == 1: G_b = torch.einsum("k,k->", dY1, dY2) G_W = torch.einsum("k,l,l,k->", dY1, X, X, dY2) elif dY1.ndim == 2: G_b = torch.einsum("ak,ik->ai", dY1, dY2) G_W = torch.einsum("ak,al,il,ik->ai", dY1, X, X, dY2) elif dY1.ndim == 3: # Typical in transformers G_b = torch.einsum("abk,ijk->ai", dY1, dY2) G_W = torch.einsum("abk,abl,ijl,ijk->ai", dY1, X, X, dY2) else: raise ValueError("Higher dimensions not supported. Open an issue if needed.")

Not elegant at all but it seems to work. Maybe there's a clean way to write this that works for any number of dimensions without having ifs. Also, please review the equations. I did them basically with trial and error until the tests passed.

Well it needs to be at least matrices (2<=ndim) as we know it's a batched scenario. However, We could in principle add the no batched dimension scenario, but I'm not sure it would be faster than the classical Jacobian based GramianComputer.

I did them basically with trial and error until the tests passed.

I which I could have done that ^^

ValerianRey · 2025-10-15T21:02:43Z

Also pretty big for Transformers (with the change i suggested to handle higher order tensors).

Times for forward + backward on WithTransformerLarge with BS=256, A=Mean on cuda:0.
Reduced from 3.13 sec (main) to 2.20 (this PR)

Memory is however increased. Max batch size went from 273 (main) to 256 (this PR).

ValerianRey · 2025-10-15T21:28:47Z

This seems to break NoFreeParam (tiny errors) and ModuleReuse (large errors). Need to investigate that.

For ModuleReuse, my guess is that it simply doesn't consider cross terms anymore, so it's normal that it fails the test.

PierreQuinton · 2025-10-16T10:04:06Z

Of interest: https://optimized-einsum.readthedocs.io/en/stable/reusing_paths.html
If we can try to explore what contraction is the optimal and if it is essentially always the same, then we may want to use a self forged contraction. It would be very helpful to know the optimal contraction order.

…tic of the module.

ValerianRey · 2025-10-20T02:29:37Z

@PierreQuinton I found a way to compute the gramian with autograd with no cross terms from module reuse / inter-module param reuse: 30fdc00. Basically, the idea is to have a module pre-hook that clones each parameter before using them, and a module post-hook that restores to the original params. This way, each module usage corresponds to a different clone, and you can compute a gradient wrt each clone. The implementation is with a context manager so it's quite clean IMO. Current limitations:

Does not work on WithMultiheadAttention, WithTransformer and WithFreeParam, because they all involve some indirect parameters for the hooked module. Need to investigate and fix that (it's probably doable).
Still counts cross-terms from intra-module parameter reuse: we'd need a node-based algo (rather than module-based) to fix that. But since autogram is still module based, it doesn't matter yet.

* Add ModuleFactory and use it to instantiate models in tests * Add get_in_out_shapes and use it to obtain input and output shapes in tests

* Move conftest.py from tests.unit to tests * Separate DEVICE creation from conftest.py to device.py * Add pytest_make_parametrize_id

* Add support for RNN, BatchNorm2d and InstanceNorm2d in get_in_out_shapes * Remove WithRNN, WithBatchNorm and WithModuleTrackingRunningStats - use simple factories instead

* Revert removal of WithRNN (part of aff0abc) * Fix output of WithRNN to not include the hidden state

* Extract rng forking into contexts.py * Make _forward_pass do rng forking * Make _forward_pass take reduction parameter * Make forward_pass public * Use forward_pass in test_engine.py, stop reseeding (it's now done by forward_pass) * Make zipping strict in make_mse_loss_fn * Stop requiring params in autograd_gramian_forward_backward * Improve parameter order of autogram_forward_backward * Rename some variables * Factorize input and target creation into make_inputs_and_targets * Reorder some code

* Add CloneParams context to consider each parameter usage on a per-module-usage basis. * Add _get_losses_and_params_with_cross_terms, _get_losses_and_params_without_cross_terms, and _get_losses_and_params to select between both.

Add LinearBasedGramianComputer.

5221159

PierreQuinton requested a review from ValerianRey October 15, 2025 18:17

PierreQuinton added feat New feature or request package: autogram labels Oct 15, 2025

ValerianRey reviewed Oct 15, 2025

View reviewed changes

ValerianRey added 2 commits October 15, 2025 23:24

Add opt-einsum dependency

6e7d051

Add support for ndim==3 inputs / outputs.

8862c16

PierreQuinton added 2 commits October 16, 2025 10:55

Fix elif

0f1c909

ndim=1 cannot happen (we are batched for now).

33a9721

PierreQuinton added 5 commits October 16, 2025 17:49

Use interleaved input style for einsum.

47363bd

Handle bias is None in LinearBasedGramianComputer

99e4c78

Rename ComputeLinearGramian to ComputerGramian, it was made agnos…

956e6ce

…tic of the module.

Add ModuleBasedGramianComputer

9884888

Reorder functions

ac384a0

PierreQuinton changed the title ~~feat(autogram): Add LinearBasedGramianComputer.~~ feat(autogram): Add ModuleBasedGramianComputer. Oct 17, 2025

ValerianRey force-pushed the linear-gramian-computer branch from 7cc40ca to 0ae4695 Compare October 20, 2025 15:28

ValerianRey mentioned this pull request Oct 20, 2025

feat(autogram): Do not consider cross-terms #468

Open

ValerianRey added 7 commits October 20, 2025 17:58

test: Add ModuleFactory (#459)

d8c54e4

* Add ModuleFactory and use it to instantiate models in tests * Add get_in_out_shapes and use it to obtain input and output shapes in tests

test: Improve test ids (#460)

ff52d98

* Move conftest.py from tests.unit to tests * Separate DEVICE creation from conftest.py to device.py * Add pytest_make_parametrize_id

test: Remove trivial ShapedModules (#461)

85a16fc

* Add support for RNN, BatchNorm2d and InstanceNorm2d in get_in_out_shapes * Remove WithRNN, WithBatchNorm and WithModuleTrackingRunningStats - use simple factories instead

test: Fix RNN testing (#463)

ea21281

* Revert removal of WithRNN (part of aff0abc) * Fix output of WithRNN to not include the hidden state

test: Rename SimpleParamReuse to IntraModuleParamReuse (#465)

6ea78c0

ValerianRey force-pushed the linear-gramian-computer branch from 1d134b4 to 7a95b96 Compare October 20, 2025 15:59

ValerianRey added 2 commits October 20, 2025 18:00

Merge branch 'main' into linear-gramian-computer

5f72220

Merge branch 'main' into linear-gramian-computer

a4101ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(autogram): Add `ModuleBasedGramianComputer`. #458

feat(autogram): Add `ModuleBasedGramianComputer`. #458

Uh oh!

PierreQuinton commented Oct 15, 2025

Uh oh!

ValerianRey commented Oct 15, 2025 •

edited

Loading

Uh oh!

ValerianRey Oct 15, 2025

Uh oh!

ValerianRey Oct 15, 2025

Uh oh!

PierreQuinton Oct 16, 2025 •

edited

Loading

Uh oh!

ValerianRey Oct 15, 2025

Uh oh!

ValerianRey Oct 15, 2025 •

edited

Loading

Uh oh!

PierreQuinton Oct 16, 2025 •

edited

Loading

Uh oh!

ValerianRey commented Oct 15, 2025 •

edited

Loading

Uh oh!

ValerianRey commented Oct 15, 2025 •

edited

Loading

Uh oh!

PierreQuinton commented Oct 16, 2025 •

edited

Loading

Uh oh!

ValerianRey commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(autogram): Add ModuleBasedGramianComputer. #458

Are you sure you want to change the base?

feat(autogram): Add ModuleBasedGramianComputer. #458

Uh oh!

Conversation

PierreQuinton commented Oct 15, 2025

Uh oh!

ValerianRey commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ValerianRey Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

ValerianRey Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

PierreQuinton Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ValerianRey Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

ValerianRey Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PierreQuinton Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ValerianRey commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ValerianRey commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PierreQuinton commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ValerianRey commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(autogram): Add `ModuleBasedGramianComputer`. #458

feat(autogram): Add `ModuleBasedGramianComputer`. #458

ValerianRey commented Oct 15, 2025 •

edited

Loading

PierreQuinton Oct 16, 2025 •

edited

Loading

ValerianRey Oct 15, 2025 •

edited

Loading

PierreQuinton Oct 16, 2025 •

edited

Loading

ValerianRey commented Oct 15, 2025 •

edited

Loading

ValerianRey commented Oct 15, 2025 •

edited

Loading

PierreQuinton commented Oct 16, 2025 •

edited

Loading