Skip to content

feat: support fsdp2 muon optimizer#1486

Open
RangiLyu wants to merge 2 commits intoInternLM:mainfrom
RangiLyu:fsdp2-muon
Open

feat: support fsdp2 muon optimizer#1486
RangiLyu wants to merge 2 commits intoInternLM:mainfrom
RangiLyu:fsdp2-muon

Conversation

@RangiLyu
Copy link
Contributor

@RangiLyu RangiLyu commented Feb 6, 2026

image

@nil0x9
Copy link
Contributor

nil0x9 commented Feb 9, 2026

It might be more desirable to introduce some mechanism to separate fused params IMHO? For example, MoE projections are implemented with GroupedLinear in XTuner, where 2D param actually means a bundle of linear weights. In this case, it might be more beneficial to split this single param tensor and treat them as different projections in NS iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments