Normalized Riemannian optimizer #36

mkhona-nvidia · 2025-10-02T23:47:08Z

Added an optimizer for normalized weights, i.e. weights whose columns or rows sum to 1. This is referred to as the "Oblique manifold" and this optimizer performs Riemannian descent on the oblique manifold (see An Introduction to Optimization on Smooth Manifolds by Nicolas Boumal at https://www.nicolasboumal.net/book/ for details)

copy-pr-bot · 2025-10-02T23:47:12Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

skyw

Haven't finished reviewing all the tests. Will do another path when requested changes are done.

emerging_optimizers/riemannian_optimizers/normalized_optimizer.py

tests/test_normalized_optimizer.py

skyw

Couple of left overs need further fix, parameter for example.

I didn't check convergence test very carefully. Also L1 tests haven't been set up in CI, post a local run result in the chat before final approval.

tests/test_normalized_optimizer.py

tests/normalized_optimizer_convergence_test.py

mkhona-nvidia · 2025-10-04T00:00:39Z

Couple of left overs need further fix, parameter for example.

I didn't check convergence test very carefully. Also L1 tests haven't been set up in CI, post a local run result in the chat before final approval.

Here are results of convergence test:

Running tests under Python 3.12.8: /Users/mkhona/miniconda3/envs/pytorch_env/bin/python
[ RUN ] NormalizedOptimizerConvergenceTest.test_oblique_adam_convergence
[ OK ] NormalizedOptimizerConvergenceTest.test_oblique_adam_convergence
[ RUN ] NormalizedOptimizerConvergenceTest.test_oblique_sgd_convergence
[ OK ] NormalizedOptimizerConvergenceTest.test_oblique_sgd_convergence
[ RUN ] NormalizedOptimizerConvergenceTest.test_optimizer_modes_convergence_adam_col
Final accuracy: 89.2
[ OK ] NormalizedOptimizerConvergenceTest.test_optimizer_modes_convergence_adam_col
[ RUN ] NormalizedOptimizerConvergenceTest.test_optimizer_modes_convergence_adam_row
Final accuracy: 80.4
[ OK ] NormalizedOptimizerConvergenceTest.test_optimizer_modes_convergence_adam_row
[ RUN ] NormalizedOptimizerConvergenceTest.test_optimizer_modes_convergence_sgd_col
Final accuracy: 74.8
[ OK ] NormalizedOptimizerConvergenceTest.test_optimizer_modes_convergence_sgd_col
[ RUN ] NormalizedOptimizerConvergenceTest.test_optimizer_modes_convergence_sgd_row
Final accuracy: 100.0
[ OK ] NormalizedOptimizerConvergenceTest.test_optimizer_modes_convergence_sgd_row

mkhona-nvidia · 2025-10-06T16:44:23Z

/ok to test 25c7835

mkhona-nvidia · 2025-10-06T21:03:22Z

/ok to test f87162d

Signed-off-by: mikail <[email protected]>

mkhona-nvidia · 2025-10-07T18:39:06Z

/ok to test 6eab498

Signed-off-by: mikail <[email protected]>

skyw

Rerun test after addressing last a few comment, plz.

skyw · 2025-10-07T19:05:59Z

tests/test_normalized_optimizer.py

+        # Set seed for CUDA if available
+        if torch.cuda.is_available():
+            torch.cuda.manual_seed_all(1234)
+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


Don't auto select device, especially in test. If the test is accidently assigned to a machine without GPU, it can end up never being tested on GPU. Same for other test cases.

Signed-off-by: mikail <[email protected]>

mkhona-nvidia · 2025-10-07T19:15:36Z

/ok to test f97ef77

* added normalized optimizers and fixed docstrings and formatting Signed-off-by: mikail <[email protected]>

mkhona-nvidia changed the title ~~Mkhona/normalized opt~~ Normalized Riemannian optimizer Oct 3, 2025

mkhona-nvidia requested a review from skyw October 3, 2025 17:00

skyw requested changes Oct 3, 2025

View reviewed changes

mkhona-nvidia force-pushed the mkhona/normalized_opt branch from 2f8543f to dd2194d Compare October 3, 2025 22:38

skyw requested changes Oct 3, 2025

View reviewed changes

mkhona-nvidia force-pushed the mkhona/normalized_opt branch from b9a92ce to 25c7835 Compare October 4, 2025 00:04

copy-pr-bot bot temporarily deployed to test October 6, 2025 16:44 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 6, 2025 16:44 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 6, 2025 16:53 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci October 6, 2025 16:53 Failure

mkhona-nvidia force-pushed the mkhona/normalized_opt branch from 25c7835 to a9f16b0 Compare October 6, 2025 20:59

copy-pr-bot bot temporarily deployed to test October 6, 2025 21:03 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 6, 2025 21:07 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 6, 2025 21:08 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci October 6, 2025 21:08 Failure

copy-pr-bot bot had a problem deploying to nemo-ci October 6, 2025 21:42 Failure

copy-pr-bot bot had a problem deploying to nemo-ci October 6, 2025 21:59 Failure

copy-pr-bot bot had a problem deploying to nemo-ci October 7, 2025 01:14 Failure

copy-pr-bot bot temporarily deployed to test October 7, 2025 03:46 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 7, 2025 03:46 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 7, 2025 03:48 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci October 7, 2025 03:48 Failure

mkhona-nvidia added 4 commits October 7, 2025 11:38

added normalized optimizers and fixed docstrings and formatting

0a68eb3

Signed-off-by: mikail <[email protected]>

added tests for normalized optimizer

81d2a64

Signed-off-by: mikail <[email protected]>

separated out riemannian grad as a private function

b32a71e

Signed-off-by: mikail <[email protected]>

added a simpek convergence test

b13a403

Signed-off-by: mikail <[email protected]>

mkhona-nvidia added 13 commits October 7, 2025 11:38

added test to L1

0170de8

Signed-off-by: mikail <[email protected]>

added tests to L0

134e08b

Signed-off-by: mikail <[email protected]>

changed docstring

3ae3147

Signed-off-by: mikail <[email protected]>

changed to one torch.add for memory pressure

f130aa6

Signed-off-by: mikail <[email protected]>

added missing type hints

6859351

Signed-off-by: mikail <[email protected]>

added missing types for optimizer args

017cc39

Signed-off-by: mikail <[email protected]>

added some more missing type hints

1adfebb

Signed-off-by: mikail <[email protected]>

cleaned up test cases

ed90ff6

Signed-off-by: mikail <[email protected]>

fixed testing as per PR

2b4cb4e

Signed-off-by: mikail <[email protected]>

fixed inplace init

5bc9a49

Signed-off-by: mikail <[email protected]>

fixed formatting

07c52bd

Signed-off-by: mikail <[email protected]>

added missing device movement

344689f

Signed-off-by: mikail <[email protected]>

moved model to device appropriately

6eab498

Signed-off-by: mikail <[email protected]>

mkhona-nvidia force-pushed the mkhona/normalized_opt branch from c4db61a to 6eab498 Compare October 7, 2025 18:38

copy-pr-bot bot temporarily deployed to test October 7, 2025 18:39 Inactive

added edm2 paper in docstring

5d7dcf2

Signed-off-by: mikail <[email protected]>

skyw requested changes Oct 7, 2025

View reviewed changes

mkhona-nvidia added 2 commits October 7, 2025 12:10

added explicit device flag to control device

b8a9bdc

Signed-off-by: mikail <[email protected]>

added flags to test

f97ef77

Signed-off-by: mikail <[email protected]>

copy-pr-bot bot temporarily deployed to test October 7, 2025 19:15 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 7, 2025 19:21 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 7, 2025 19:30 Inactive

skyw approved these changes Oct 7, 2025

View reviewed changes

skyw merged commit fb1add8 into NVIDIA-NeMo:main Oct 7, 2025
12 checks passed

mkhona-nvidia added a commit to mkhona-nvidia/Emerging-Optimizers that referenced this pull request Oct 7, 2025

Add normalized Riemannian optimizer (NVIDIA-NeMo#36)

907d4f5

* added normalized optimizers and fixed docstrings and formatting Signed-off-by: mikail <[email protected]>

mkhona-nvidia added a commit to mkhona-nvidia/Emerging-Optimizers that referenced this pull request Oct 7, 2025

Add normalized Riemannian optimizer (NVIDIA-NeMo#36)

de8f3f8

* added normalized optimizers and fixed docstrings and formatting Signed-off-by: mikail <[email protected]>

mkhona-nvidia deleted the mkhona/normalized_opt branch October 15, 2025 02:58

Normalized Riemannian optimizer #36

Normalized Riemannian optimizer #36

Uh oh!

Conversation

mkhona-nvidia commented Oct 2, 2025

Uh oh!

copy-pr-bot bot commented Oct 2, 2025

Uh oh!

skyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkhona-nvidia commented Oct 4, 2025

Uh oh!

mkhona-nvidia commented Oct 6, 2025

Uh oh!

mkhona-nvidia commented Oct 6, 2025

Uh oh!

mkhona-nvidia commented Oct 7, 2025

Uh oh!

skyw left a comment

Choose a reason for hiding this comment

Uh oh!

skyw Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

mkhona-nvidia commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants