@@ -64,13 +64,13 @@ pip install --upgrade nvidia-nccl-cu12
6464
6565``` bash
6666cd /mnt/data
67- mkdir qwen- ckpts
68- cd qwen- ckpts
67+ mkdir -p ckpts/huggingface
68+ cd ckpts/huggingface
6969modelscope download --model Qwen/Qwen3-Next-80B-A3B-Instruct --local_dir Qwen3-Next-80B-A3B-Instruct
7070
7171cd /mnt/data
72- mkdir qwen- datasets
73- cd qwen- datasets
72+ mkdir datasets
73+ cd datasets
7474wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/mmap_qwen3_datasets_text_document.bin
7575wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/mmap_qwen3_datasets_text_document.idx
7676
@@ -82,12 +82,33 @@ wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models
8282## Qwen3-Next模型训练流程精简版
8383您可以直接将精简版的内容复制到DLC的执行命令栏中进行修改以及训练。精简版将参数分为了三大类:MODEL_ARGS,TRAINING_ARGS以及INFRA_ARGS。
8484``` bash
85- bash run_mcore_qwen3_lite.sh \
85+ bash run_mcore_qwen3_lite.sh
8686```
8787
8888## Qwen3-Next模型训练流程标准版
8989### 模型格式转换
90- TBD
90+ 为了进行权重转换,需要传入的参数列表如下
91+ ```
92+ MODEL_SIZE=$1 # 模型大小,A3B
93+ LOAD_DIR=$2 # 源权重路径
94+ SAVE_DIR=$3 # 目标权重路径
95+ MG2HF=$4 # 转换方向 可选: true, false
96+ USE_CUDA=$5 # 是否使用GPU转换 建议: true
97+ PR=$6 # 转换精度 可选: fp32 bf16 fp16
98+ HF_DIR=$7 # HF权重路径(mcore2hf时必须提供)
99+ ```
100+ 例如,使用下述脚本将checkpoint转换到MCore格式
101+
102+ ``` bash
103+ cd /workspace/Pai-Megatron-Patch/toolkits/distributed_checkpoints_convertor
104+ bash scripts/qwen3_next/run_8xH20.sh \
105+ A3B \
106+ /mnt/data/ckpts/huggingface/Qwen3-Next-80B-A3B-Instruct \
107+ /mnt/data/ckpts/mcore/Qwen3-Next-80B-A3B-Instruct-to-mcore \
108+ false \
109+ true \
110+ bf16
111+ ```
91112
92113### 预训练及指令微调
93114在Qwen3-Next中,我们已将预训练和微调整合到` run_mcore_qwen3.sh ` 脚本,对于不同的使用场景,二者各参数的意义有所不同。
@@ -152,17 +173,17 @@ false \
152173none \
153174false \
154175100000 \
155- /mnt/data/qwen- datasets/mmap_qwen3_datasets_text_document \
156- /mnt// data/qwen- datasets/mmap_qwen3_datasets_text_document \
157- /mnt/data/qwen- ckpts/Qwen3-Next-80B-A3B-Instruct \
176+ /mnt/data/datasets/mmap_qwen3_datasets_text_document \
177+ /mnt/data/datasets/mmap_qwen3_datasets_text_document \
178+ /mnt/data/ckpts/mcore/ Qwen3-Next-80B-A3B-Instruct-to-mcore \
1581791000000000 \
15918010000 \
160181/workspace/output_mcore_qwen3_next_continue_pretrain
161182```
162183
163184#### 指令微调示例
164185制作idxmap用于微调的数据集可以参考[ 链接] ( https://github.com/alibaba/Pai-Megatron-Patch/tree/main/toolkits/sft_data_preprocessing ) 。
165- 当准备好微调数据集后,将SFT开关设置为` true ` 即可进行指令微调。
186+ 当准备好微调数据集后,将SFT开关设置为` true ` 即可进行指令微调。注意准备的SFT数据集序列长度必须与实际训练保持一致
166187
167188``` bash
168189cd /workspace/Pai-Megatron-Patch/examples/qwen3_next
@@ -188,9 +209,9 @@ true \
188209none \
189210false \
190211100000 \
191- /mnt/data/qwen- datasets/mmap_qwen3_datasets_text_document \
192- /mnt// data/qwen- datasets/mmap_qwen3_datasets_text_document \
193- /mnt/data/qwen- ckpts/Qwen3-Next-80B-A3B-Instruct \
212+ /mnt/data/datasets/mmap_qwen3_datasets_sft_text_document \
213+ /mnt/data/datasets/mmap_qwen3_datasets_sft_text_document \
214+ /mnt/data/ckpts/mcore/ Qwen3-Next-80B-A3B-Instruct-to-mcore \
19421510000 \
195216100 \
196217/workspace/output_mcore_qwen3_next_finetune
@@ -221,9 +242,9 @@ true \
221242none \
222243false \
223244100000 \
224- /mnt/data/qwen- datasets/alpaca_zh-train-general.json \
225- /mnt/data/qwen- datasets/alpaca_zh-valid-general.json \
226- /mnt/data/qwen- ckpts/Qwen3-Next-80B-A3B-Instruct \
245+ /mnt/data/datasets/alpaca_zh-train-general.json \
246+ /mnt/data/datasets/alpaca_zh-valid-general.json \
247+ /mnt/data/ckpts/mcore/ Qwen3-Next-80B-A3B-Instruct-to-mcore \
22724810000 \
228249100 \
229250/workspace/output_mcore_qwen3_next_finetune
0 commit comments