Skip to content

Conversation

@mkhona-nvidia
Copy link
Contributor

@mkhona-nvidia mkhona-nvidia commented Oct 24, 2025

Independent weight decay: https://arxiv.org/abs/2510.19093

Enabled by use_independent_wd which is defaulted to False to not break prior runs

Signed-off-by: mikail <[email protected]>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 24, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mkhona-nvidia mkhona-nvidia self-assigned this Oct 24, 2025
@mkhona-nvidia mkhona-nvidia requested a review from skyw October 24, 2025 17:31
use_nesterov: bool = False,
weight_decay: float = 0.01,
use_decoupled_weight_decay: bool = True,
use_independent_weight_decay: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change: The variable name is getting too long. let's make them "use_decoupled_we", "use_independent_wd".

I don't think there are ambiguities around wd being short for weight decay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed name inside optim to use_decoupled_wd and use_independent_wd. The init call still uses the longer name, to not break megatron wrappers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Megatron can update, there will be a lot of update on dev anyway.
Nothing will break as dependency was made on commit not head of branch

use_nesterov: bool = False,
weight_decay: float = 0.01,
use_decoupled_weight_decay: bool = True,
use_independent_weight_decay: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Megatron can update, there will be a lot of update on dev anyway.
Nothing will break as dependency was made on commit not head of branch

skyw
skyw previously approved these changes Oct 24, 2025
@skyw
Copy link
Contributor

skyw commented Oct 24, 2025

/ok to test 50f613b

@mkhona-nvidia
Copy link
Contributor Author

/ok to test 417bb0f

@skyw skyw merged commit 19d8201 into NVIDIA-NeMo:main Oct 24, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants