Update hf2mcore_deepseek_v3_moe.py by lmc8133 · Pull Request #495 · alibaba/Pai-Megatron-Patch

lmc8133 · 2025-03-07T09:16:44Z

When setting expert-tensor-parallel-size=1, the weights of routed-experts will be saved repeatly during different tp_rank. It will consume ~1.3T*${TP} disk space, it's too huge!!

when setting expert-tensor-parallel-size=1, the weights of routed-experts will be saved repeatly during different tp_rank

CLAassistant · 2025-03-07T09:16:51Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Update hf2mcore_deepseek_v3_moe.py

5d693f5

when setting expert-tensor-parallel-size=1, the weights of routed-experts will be saved repeatly during different tp_rank

lmc8133 mentioned this pull request Mar 7, 2025

DeepSeek V3权重转换脚本存在bug #494

Closed

jerryli1981 force-pushed the main branch from 0f87624 to 2753cac Compare March 11, 2025 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update hf2mcore_deepseek_v3_moe.py#495

Update hf2mcore_deepseek_v3_moe.py#495
lmc8133 wants to merge 1 commit intoalibaba:mainfrom
lmc8133:convert_hf_2_mcore

lmc8133 commented Mar 7, 2025

Uh oh!

CLAassistant commented Mar 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lmc8133 commented Mar 7, 2025

Uh oh!

CLAassistant commented Mar 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants