Givens orthogonal layer #57

VolodyaCO · 2025-12-19T00:01:02Z

This PR adds an orthogonal layer given by Givens rotations, using the parallel algorithm described by Firas in https://arxiv.org/abs/2106.00003, which gives a forward complexity of O(n) and backward complexity of O(n log(n)), even though there are O(n^2) rotations.

This PR still is in draft. I wrote it for even n. Probably some more unit tests are to be done, but I am quite lazy (will do it after all math is checked for odd n).

VolodyaCO · 2025-12-19T00:25:33Z

I somehow broke @kevinchern's tests, what the hell...

tests/helper_models.py

kevinchern · 2025-12-19T00:36:26Z

tests/test_nn.py

    def test_store_config(self):
        with self.subTest("Simple case"):
+
            class MyModel(torch.nn.Module):


Remove formatting changes. Is this "black" formatting?

Yes. I have it by default on my vscode

kevinchern · 2025-12-19T00:39:51Z

I somehow broke @kevinchern's tests, what the hell...

@VolodyaCO which tests? I'm seeing test_forward_agreement and test_backward_agreement failures on this CI test

VolodyaCO · 2025-12-19T14:34:49Z

I somehow broke @kevinchern's tests, what the hell...

@VolodyaCO which tests? I'm seeing test_forward_agreement and test_backward_agreement failures on this CI test

I forgot to update my tests to float64 precision. Now that I've done it, it's weird that all of the current failing tests are failing on

  File "/Users/distiller/project/tests/test_nn.py", line 144, in test_LinearBlock
    self.assertTrue(model_probably_good(model, (din,), (dout,)))

kevinchern · 2025-12-19T15:41:10Z

I somehow broke @kevinchern's tests, what the hell...

@VolodyaCO which tests? I'm seeing test_forward_agreement and test_backward_agreement failures on this CI test

I forgot to update my tests to float64 precision. Now that I've done it, it's weird that all of the current failing tests are failing on
  File "/Users/distiller/project/tests/test_nn.py", line 144, in test_LinearBlock
    self.assertTrue(model_probably_good(model, (din,), (dout,)))

Ahhhhhh. OK Theo also flagged this at #50 . It's a poorly-written test.. you can ignore it.

thisac · 2025-12-30T00:41:23Z

dwave/plugins/torch/nn/modules/orthogonal.py

+    Returns:
+        list[list[tuple[int, int]]]: Blocks of edges for parallel Givens rotations.
+
+    Note:


Better as a note directive: https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-note

Where should I put this? in the release notes? or in the docstring itself?

Simply change the Note: to

.. note:: Lorem ipsum...

which would render a note box if we generate docs with Sphinx.

I have done this now

thisac · 2025-12-30T00:44:33Z

dwave/plugins/torch/nn/modules/orthogonal.py

+            angles (torch.Tensor): A ((n - 1) * n // 2,) shaped tensor containing all rotations
+                between pairs of dimensions.
+            blocks (torch.Tensor): A (n-1, n//2, 2) shaped tensor containing the indices that
+                specify rotations between pairs of dimensions. Each of the n-1 blocks contains n//2
+                pairs of independent rotations.


Code formatting?

Suggested change

angles (torch.Tensor): A ((n - 1) * n // 2,) shaped tensor containing all rotations

between pairs of dimensions.

blocks (torch.Tensor): A (n-1, n//2, 2) shaped tensor containing the indices that

specify rotations between pairs of dimensions. Each of the n-1 blocks contains n//2

pairs of independent rotations.

angles (torch.Tensor): A ``((n - 1) * n // 2,)`` shaped tensor containing all rotations

between pairs of dimensions.

blocks (torch.Tensor): A ``(n - 1, n // 2, 2)`` shaped tensor containing the indices that

specify rotations between pairs of dimensions. Each of the n-1 blocks contains n // 2

pairs of independent rotations.

Done, thanks.

thisac · 2025-12-30T00:47:11Z

dwave/plugins/torch/nn/modules/orthogonal.py

+        """
+        # Blocks is of shape (n_blocks, n/2, 2) containing indices for angles
+        # Within each block, each Givens rotation is commuting, so we can apply them in parallel
+        U = torch.eye(n, device=angles.device, dtype=angles.dtype)


Slight preference to keep variables lower-case.

I changed this in the main GivensRotationLayer class. In the other code, I kept the capital letters just so that if someone is reading the algorithm in the paper alongside the code, each part of the algorithm is more easily understood.

I also favour variable names in lower case with upper case reserved for constants, primarily because it is a widely adopted convention. I agree having a 1-1 correspondence between paper notation and implementation is important for readability, but I think making exceptions paper-by-paper can become messy. I suggest noting the correspondence between variable names and paper notation in the docstring.
--
I think I snuck in some upper case variable names in the codebase... should track those down at some point 😅

Playing devils' advocate against myself here: sometimes descriptive variable names are unnecessarily verbose and unhelpful in describing the algorithm.
🤷‍♀️

I think that for some algorithms, readers are more used to certain notations, for example, that U is orthogonal (actually orthonormal). I would make a vote for picking a convention 😆

thisac · 2025-12-30T00:50:29Z

dwave/plugins/torch/nn/modules/orthogonal.py

+        angles, blocks, Ufwd_saved = ctx.saved_tensors
+        Ufwd = Ufwd_saved.clone()
+        M = grad_output.t()  # dL/dU, i.e., grad_output is of shape (n, n)
+        n = M.size(1)
+        block_size = n // 2
+        A = torch.zeros((block_size, n), device=angles.device, dtype=angles.dtype)


Same here re lowercase for Ufwd, M, and A. Avoids incorrect colour highlighting in themes.

Hmmm, I didn't read this about the incorrect colour highlighting before I made my previous comment. I still think that it is easier to read the algorithm alongside the code if the use of lower/upper case match. For example, lower case m is usually used for an integer variable, not a tensor.

thisac · 2025-12-30T00:50:53Z

dwave/plugins/torch/nn/modules/orthogonal.py

+        return U
+
+    @staticmethod
+    def backward(ctx, grad_output: torch.Tensor):


Missing return type hint.

I added the type hint as well as a longer explanation on what this return is.

thisac · 2025-12-30T00:52:24Z

dwave/plugins/torch/nn/modules/orthogonal.py

+        U = self._create_rotation_matrix()
+        rotated_x = einsum(x, U, "... i, o i -> ... o")
+        if self.bias is not None:
+            rotated_x = rotated_x + self.bias


Suggested change

rotated_x = rotated_x + self.bias

rotated_x += self.bias

thisac · 2025-12-30T01:17:39Z

tests/helper_models.py

+from einops import einsum
+
+
+class NaiveGivensRotationLayer(nn.Module):


I'm not very keen on having a full on separate implementation here just to compare with/test the GivensRotationLayer. If this NaiveGivensRotationLayer is useful, should it be part of the package instead?

We discussed this in our one on one but, just for the record, there is no difference between the NaiveGivensRotationLayer and the GivensRotationLayer in the forward or backward passes. The naïve implementation is there to make sure that the forward and backward passes indeed match. The GivensRotationLayer should always be used because it has a substantially better runtime complexity. Thus, the naïve implementation is not useful—other than for a sanity check.

thisac · 2025-12-30T01:22:17Z

tests/test_nn.py

+    @parameterized.expand([(n, bias) for n in [4, 5, 6, 9, 10] for bias in [True, False]])
+    def test_forward_agreement(self, n, bias):


These tests do seem a bit too.. complex. Better to try and test more minimal aspects of the class, if possible. I'd much rather have separate integration-like tests that can assert that model behave as expected, while having these be strictly, small scale, isolated unit tests.

I added some tests to test invalid inputs too. These forward and backward tests are for testing that the correct input/output is given when compared to the naïve implementation. The model_probably_good test is done as unit test.

I added other unit tests where I test incorrect inputs as well. In ML models, the forward and backward passes should be what one expects them to be, and this module gives the opportunity to test this correctly. I do agree that we should separate other tests that (at least) I wrote, which have to do with training a model to see if the intended final trained state is what is expected. However, the tests I present in this PR are not the result of training but explicit comparisons with the naïve approach; I don't know if we could regard those as integration tests.

VolodyaCO · 2025-12-31T19:16:03Z

After a bit of git wrangling, I was able to clean my whole mess of merge commits 😆.

anahitamansouri

This is a nice PR Vlad. It took me a while to go over the paper and this PR :) The only thing is the tests that are failing. Thanks for the great work.

anahitamansouri · 2026-01-02T22:37:02Z

dwave/plugins/torch/nn/modules/orthogonal.py

+        self.n = n
+        self.n_angles = n * (n - 1) // 2
+        self.angles = nn.Parameter(torch.randn(self.n_angles))
+        blocks_edges = _get_blocks_edges(n)


You could directly return torch.LongTensor from get_blocks_edges to avoid the conversion.

I set _get_blocks_edges to a private function, so it shouldn't make a difference if I convert the list to a tensor in the orthogonal module or in the function itself.

kevinchern

Did a quick pass to provide some feedback before taking some time to take a deep dive into the paper.

kevinchern · 2026-01-05T19:26:17Z

dwave/plugins/torch/nn/modules/orthogonal.py

+
+
+def _get_blocks_edges(n: int) -> list[list[tuple[int, int]]]:
+    """Uses the circle method for Round Robin pairing to create blocks of edges for parallel Givens


Suggested change

"""Uses the circle method for Round Robin pairing to create blocks of edges for parallel Givens

"""Uses the circle method for round-robin pairing to create blocks of edges for parallel Givens

(and other occurrences)

Should _get_blocks_edges should be a method in GivensRotation instead? The orthogonal module is general while this function is a helper function bespoke to GivensRotation.
cc @thisac

Maybe... though what would be the attribute of GivensRotation used in _get_blocks_edges? n only?

kevinchern · 2026-01-08T18:53:33Z

dwave/plugins/torch/nn/modules/orthogonal.py

+        return grad_theta, None, None
+
+
+class GivensRotationLayer(nn.Module):


Can we rename to GivensRotation (parallel to nn.Linear)

kevinchern · 2026-01-08T19:36:07Z

dwave/plugins/torch/nn/modules/orthogonal.py

+    if n % 2 != 0:
+        n += 1  # Add a dummy dimension for odd n
+        is_odd = True
+    else:
+        is_odd = False


Could be cleaner like this 😛

Suggested change

if n % 2 != 0:

n += 1 # Add a dummy dimension for odd n

is_odd = True

else:

is_odd = False

odd = n % 2 != 0

if odd:

n += 1

or

odd = n % 2 ! = 0 n += odd

but this is less obvious.. (edit: not a big fan of n+=odd notation 😆)

It is cleaner! (the first suggestion, not the n+=odd 😆 )

kevinchern · 2026-01-08T19:39:10Z

dwave/plugins/torch/nn/modules/orthogonal.py

+        ignored.
+    """
+    if n % 2 != 0:
+        n += 1  # Add a dummy dimension for odd n


Rule-of-thumb for comments: explain the "why" or motivation as opposed to "what" (which is clear in this context)

kevinchern · 2026-01-08T19:46:05Z

dwave/plugins/torch/nn/modules/orthogonal.py

+    for _ in range(n - 1):
+        pairs = circle_method(sequence)
+        if is_odd:
+            # Remove pairs involving the dummy dimension:


Suggested change

# Remove pairs involving the dummy dimension:

# Remove pairs involving the dummy dimension

kevinchern · 2026-01-08T20:52:13Z

dwave/plugins/torch/nn/modules/orthogonal.py

+
+    @staticmethod
+    def backward(ctx, grad_output: torch.Tensor) -> tuple[torch.Tensor, None, None]:
+        """Computes the VJP needed for backward propagation.


Suggested change

"""Computes the VJP needed for backward propagation.

"""Computes the vector-Jacobian product needed for backward propagation.

kevinchern · 2026-01-08T23:03:15Z

dwave/plugins/torch/nn/modules/orthogonal.py

+        idx_block = torch.arange(block_size, device=angles.device)
+        for b, block in enumerate(blocks):
+            # angles is of shape (n_angles,) containing all angles for contiguous blocks.
+            angles_in_block = angles[idx_block + b * blocks.size(1)]  # shape (n/2,)


Suggested change

angles_in_block = angles[idx_block + b * blocks.size(1)] # shape (n/2,)

angles_in_block = angles[idx_block + b * block_size] # shape (n/2,)

If I understand correctly, blocks.size(1) will be block_size

Ah yes, while writing the algorithm I though blocks could have different sizes if n is odd, but that is not true. All blocks will have the same block size.

kevinchern · 2026-01-08T23:23:10Z

dwave/plugins/torch/nn/modules/orthogonal.py

+            c = torch.cos(angles_in_block)
+            s = torch.sin(angles_in_block)
+            i_idx = block[:, 0]
+            j_idx = block[:, 1]
+            r_i = c.unsqueeze(0) * U[:, i_idx] + s.unsqueeze(0) * U[:, j_idx]
+            r_j = -s.unsqueeze(0) * U[:, i_idx] + c.unsqueeze(0) * U[:, j_idx]


Unsqueeze once in the beginning

Suggested change

c = torch.cos(angles_in_block)

s = torch.sin(angles_in_block)

i_idx = block[:, 0]

j_idx = block[:, 1]

r_i = c.unsqueeze(0) * U[:, i_idx] + s.unsqueeze(0) * U[:, j_idx]

r_j = -s.unsqueeze(0) * U[:, i_idx] + c.unsqueeze(0) * U[:, j_idx]

c = torch.cos(angles_in_block).unsqueeze(0)

s = torch.sin(angles_in_block).unsqueeze(0)

i_idx = block[:, 0]

j_idx = block[:, 1]

r_i = c * U[:, i_idx] + s * U[:, j_idx]

r_j = -s * U[:, i_idx] + c * U[:, j_idx]

kevinchern · 2026-01-08T23:28:15Z

dwave/plugins/torch/nn/modules/orthogonal.py

+        U = torch.eye(n, device=angles.device, dtype=angles.dtype)
+        block_size = n // 2
+        idx_block = torch.arange(block_size, device=angles.device)
+        for b, block in enumerate(blocks):


If we commit to using paper variable names here, we should be consistent and use, e.g., B instead of blocks.
If that's the case, I'd prefer to be a little more wasteful and have B = blocks to keep the input argument blocks instead of B. This inconsistency makes me lean towards named variables more (with a look-up table in the docstring).

kevinchern · 2026-01-08T23:29:49Z

dwave/plugins/torch/nn/modules/orthogonal.py

+            r_i = c.unsqueeze(0) * U[:, i_idx] + s.unsqueeze(0) * U[:, j_idx]
+            r_j = -s.unsqueeze(0) * U[:, i_idx] + c.unsqueeze(0) * U[:, j_idx]


Are r_i and r_j are backwards?
I think it should be:

$\cos - \sin$ for i, and

$\sin + \cos$ for j.
Not sure if this has a significant impact on validity of method. If it does, then tests should be revised first to see why this error was not detected

Yes... well... in the paper the rotation matrices were written the other way around, I think. I did the math separately and this way everything is consistent.

VolodyaCO requested a review from kevinchern December 19, 2025 00:01

VolodyaCO self-assigned this Dec 19, 2025

VolodyaCO added the enhancement New feature or request label Dec 19, 2025

VolodyaCO requested review from anahitamansouri and thisac December 19, 2025 00:25

kevinchern reviewed Dec 19, 2025

View reviewed changes

tests/helper_models.py Outdated Show resolved Hide resolved

kevinchern reviewed Dec 19, 2025

View reviewed changes

VolodyaCO marked this pull request as ready for review December 26, 2025 17:28

thisac reviewed Dec 30, 2025

View reviewed changes

VolodyaCO force-pushed the givens-rotation branch from f18b476 to 46062e0 Compare December 31, 2025 19:12

anahitamansouri reviewed Jan 5, 2026

View reviewed changes

Add Givens orthogonal layer

7f76571

VolodyaCO force-pushed the givens-rotation branch from 46062e0 to 7f76571 Compare January 5, 2026 18:38

kevinchern requested changes Jan 8, 2026

View reviewed changes

		from einops import einsum


		class NaiveGivensRotationLayer(nn.Module):

		@parameterized.expand([(n, bias) for n in [4, 5, 6, 9, 10] for bias in [True, False]])
		def test_forward_agreement(self, n, bias):



		def _get_blocks_edges(n: int) -> list[list[tuple[int, int]]]:
		"""Uses the circle method for Round Robin pairing to create blocks of edges for parallel Givens

	"""Uses the circle method for Round Robin pairing to create blocks of edges for parallel Givens
	"""Uses the circle method for round-robin pairing to create blocks of edges for parallel Givens

		return grad_theta, None, None


		class GivensRotationLayer(nn.Module):

	# Remove pairs involving the dummy dimension:
	# Remove pairs involving the dummy dimension

	"""Computes the VJP needed for backward propagation.
	"""Computes the vector-Jacobian product needed for backward propagation.

	angles_in_block = angles[idx_block + b * blocks.size(1)] # shape (n/2,)
	angles_in_block = angles[idx_block + b * block_size] # shape (n/2,)

		r_i = c.unsqueeze(0) * U[:, i_idx] + s.unsqueeze(0) * U[:, j_idx]
		r_j = -s.unsqueeze(0) * U[:, i_idx] + c.unsqueeze(0) * U[:, j_idx]

Givens orthogonal layer #57

Are you sure you want to change the base?

Givens orthogonal layer #57

Uh oh!

Conversation

VolodyaCO commented Dec 19, 2025

Uh oh!

VolodyaCO commented Dec 19, 2025

Uh oh!

Uh oh!

kevinchern Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevinchern commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VolodyaCO commented Dec 19, 2025

Uh oh!

kevinchern commented Dec 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VolodyaCO commented Dec 31, 2025

Uh oh!

anahitamansouri left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevinchern left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

kevinchern Dec 19, 2025 •

edited

Loading

kevinchern commented Dec 19, 2025 •

edited

Loading