[dev] feat(moe): Support packed sequence for gated delta net (GDN)#2644
[dev] feat(moe): Support packed sequence for gated delta net (GDN)#2644yuzhongw-nvidia wants to merge 2 commits intoNVIDIA:devfrom
Conversation
2575c6d to
4f8888d
Compare
4f8888d to
9ccf5a4
Compare
73d512d to
ae8806c
Compare
|
/ok to test ae8806c |
83a8607 to
545a2a5
Compare
|
/ok to test 545a2a5 |
545a2a5 to
befbcd2
Compare
|
Also need to update this line https://github.com/yuzhongw-nvidia/Megatron-LM/blob/befbcd2a10b60deb4edbd0f758275b26a6df83c7/megatron/core/ssm/gated_delta_net.py#L741 to |
Thanks. Resolved. |
58fdd22 to
e8ed23c
Compare
|
/ok to test 80d2d1c |
Signed-off-by: yuzhongw <yuzhongw@nvidia.com> Co-authored-by: kunlunl <kunlunl@nvidia.com>
80d2d1c to
e94395d
Compare
yaox12
left a comment
There was a problem hiding this comment.
LGTM. Please resolve the conflicts.
Thanks! There are some E2E numerical issues reported by @xiaoyao0115, and we are working on them. Will resolve the conflicts after that. |
|
try the commit [e94395d] to fine-tune qwen3.5-4B with sequence packing and see grad_norm explosion, leading to final accuracy 0. |
What does this PR do ?
Support packed sequence for gated delta net (GDN).
PR for main: #2645
Contribution process
flowchart LR A[Pre-checks] --> B[PR Tests] subgraph Code Review/Approval C1[Expert Review] --> C2[Final Review] end B --> C1 C2 --> D[Merge]Pre-checks
Core 0.8)Code review
The following process is enforced via the CODEOWNERS file for changes into
megatron/core. For changes outside ofmegatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.For MRs into `main` branch
(Step 1): Add PR label
Expert Review(Step 2): Collect the expert reviewers reviews
Expert Reviewlabel when your PR is ready for review.Final Review might get declined if these requirements are not fulfilled.
(Step 3): Final Review
Final Reviewlabel(Optional Step 4): Cherry-pick into release branch
If this PR also needs to be merged into
core_r*release branches, after this PR has been merged, selectCherry-pickto open a new PR into the release branch.For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.Merging your PR
Any member of core-adlr and
core-nemowill be able to merge your PR.