Skip to content

Conversation

@sbhavani
Copy link
Contributor

No description provided.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 30, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

README.md Outdated

### Why They Matter

Shampoo optimizers have demonstrated significant practical impact in large-scale language model training. Most notably, they were used to train the **Kimi K2 model** ([arXiv:2507.20534](https://arxiv.org/abs/2507.20534)), showcasing their effectiveness at scale. These optimizers can:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kimi used Muon. And it is also debatable whether it is shampoo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should just change to emerging optimizers to not exclude any future ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to mention emerging optimizers

README.md Outdated

### Optimizers Included

This project focuses on the following Shampoo class optimizers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More will come.

@sbhavani sbhavani requested review from mkhona-nvidia and skyw October 2, 2025 14:37
@skyw
Copy link
Contributor

skyw commented Oct 6, 2025

@sbhavani could you address the CI failure?

@snowmanwwg plz also review.

@skyw skyw requested review from snowmanwwg and removed request for mkhona-nvidia October 6, 2025 15:10
Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com>
Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com>
…matrix-based preconditioning'

Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com>
Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com>
Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com>
Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com>
Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com>
@skyw skyw added the docs-only With great power comes great responsibility. label Oct 8, 2025
@skyw
Copy link
Contributor

skyw commented Oct 8, 2025

/ok to test e5a78dd

@chtruong814
Copy link
Contributor

/ok to test 0408134

@skyw skyw merged commit 3cc549f into NVIDIA-NeMo:main Oct 8, 2025
12 checks passed
pablo-garay pushed a commit that referenced this pull request Oct 10, 2025
* Update README with background, usage examples

Signed-off-by: Santosh Bhavani <sbhavani@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-only With great power comes great responsibility.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants