Skip to content

Commit 8e6cbb0

Browse files
authored
update qwen3 next readme (#678)
* add submodule 250908 * gated softmax attention * end2end run test (currently replace DeltaNet with MambaMixer) * checkout backend to correct commit id * add convertor (no mamba layer) * test hf2mg convert (skip mixer cvt) * add conversion of Qwen3-Next * fix qgkv layout * fix conversion of Qwen3-Next * fix m2h conversion * fix some issues * update shell script * a faster model parallel option * update readme
1 parent 6b0c239 commit 8e6cbb0

File tree

2 files changed

+38
-17
lines changed

2 files changed

+38
-17
lines changed

examples/qwen3_next/README.md

Lines changed: 37 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -64,13 +64,13 @@ pip install --upgrade nvidia-nccl-cu12
6464

6565
```bash
6666
cd /mnt/data
67-
mkdir qwen-ckpts
68-
cd qwen-ckpts
67+
mkdir -p ckpts/huggingface
68+
cd ckpts/huggingface
6969
modelscope download --model Qwen/Qwen3-Next-80B-A3B-Instruct --local_dir Qwen3-Next-80B-A3B-Instruct
7070

7171
cd /mnt/data
72-
mkdir qwen-datasets
73-
cd qwen-datasets
72+
mkdir datasets
73+
cd datasets
7474
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/mmap_qwen3_datasets_text_document.bin
7575
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/mmap_qwen3_datasets_text_document.idx
7676

@@ -82,12 +82,33 @@ wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models
8282
## Qwen3-Next模型训练流程精简版
8383
您可以直接将精简版的内容复制到DLC的执行命令栏中进行修改以及训练。精简版将参数分为了三大类:MODEL_ARGS,TRAINING_ARGS以及INFRA_ARGS。
8484
```bash
85-
bash run_mcore_qwen3_lite.sh \
85+
bash run_mcore_qwen3_lite.sh
8686
```
8787

8888
## Qwen3-Next模型训练流程标准版
8989
### 模型格式转换
90-
TBD
90+
为了进行权重转换,需要传入的参数列表如下
91+
```
92+
MODEL_SIZE=$1 # 模型大小,A3B
93+
LOAD_DIR=$2 # 源权重路径
94+
SAVE_DIR=$3 # 目标权重路径
95+
MG2HF=$4 # 转换方向 可选: true, false
96+
USE_CUDA=$5 # 是否使用GPU转换 建议: true
97+
PR=$6 # 转换精度 可选: fp32 bf16 fp16
98+
HF_DIR=$7 # HF权重路径(mcore2hf时必须提供)
99+
```
100+
例如,使用下述脚本将checkpoint转换到MCore格式
101+
102+
```bash
103+
cd /workspace/Pai-Megatron-Patch/toolkits/distributed_checkpoints_convertor
104+
bash scripts/qwen3_next/run_8xH20.sh \
105+
A3B \
106+
/mnt/data/ckpts/huggingface/Qwen3-Next-80B-A3B-Instruct \
107+
/mnt/data/ckpts/mcore/Qwen3-Next-80B-A3B-Instruct-to-mcore \
108+
false \
109+
true \
110+
bf16
111+
```
91112

92113
### 预训练及指令微调
93114
在Qwen3-Next中,我们已将预训练和微调整合到`run_mcore_qwen3.sh`脚本,对于不同的使用场景,二者各参数的意义有所不同。
@@ -152,17 +173,17 @@ false \
152173
none \
153174
false \
154175
100000 \
155-
/mnt/data/qwen-datasets/mmap_qwen3_datasets_text_document \
156-
/mnt//data/qwen-datasets/mmap_qwen3_datasets_text_document \
157-
/mnt/data/qwen-ckpts/Qwen3-Next-80B-A3B-Instruct \
176+
/mnt/data/datasets/mmap_qwen3_datasets_text_document \
177+
/mnt/data/datasets/mmap_qwen3_datasets_text_document \
178+
/mnt/data/ckpts/mcore/Qwen3-Next-80B-A3B-Instruct-to-mcore \
158179
1000000000 \
159180
10000 \
160181
/workspace/output_mcore_qwen3_next_continue_pretrain
161182
```
162183

163184
#### 指令微调示例
164185
制作idxmap用于微调的数据集可以参考[链接](https://github.com/alibaba/Pai-Megatron-Patch/tree/main/toolkits/sft_data_preprocessing)
165-
当准备好微调数据集后,将SFT开关设置为`true`即可进行指令微调。
186+
当准备好微调数据集后,将SFT开关设置为`true`即可进行指令微调。注意准备的SFT数据集序列长度必须与实际训练保持一致
166187

167188
```bash
168189
cd /workspace/Pai-Megatron-Patch/examples/qwen3_next
@@ -188,9 +209,9 @@ true \
188209
none \
189210
false \
190211
100000 \
191-
/mnt/data/qwen-datasets/mmap_qwen3_datasets_text_document \
192-
/mnt//data/qwen-datasets/mmap_qwen3_datasets_text_document \
193-
/mnt/data/qwen-ckpts/Qwen3-Next-80B-A3B-Instruct \
212+
/mnt/data/datasets/mmap_qwen3_datasets_sft_text_document \
213+
/mnt/data/datasets/mmap_qwen3_datasets_sft_text_document \
214+
/mnt/data/ckpts/mcore/Qwen3-Next-80B-A3B-Instruct-to-mcore \
194215
10000 \
195216
100 \
196217
/workspace/output_mcore_qwen3_next_finetune
@@ -221,9 +242,9 @@ true \
221242
none \
222243
false \
223244
100000 \
224-
/mnt/data/qwen-datasets/alpaca_zh-train-general.json \
225-
/mnt/data/qwen-datasets/alpaca_zh-valid-general.json \
226-
/mnt/data/qwen-ckpts/Qwen3-Next-80B-A3B-Instruct \
245+
/mnt/data/datasets/alpaca_zh-train-general.json \
246+
/mnt/data/datasets/alpaca_zh-valid-general.json \
247+
/mnt/data/ckpts/mcore/Qwen3-Next-80B-A3B-Instruct-to-mcore \
227248
10000 \
228249
100 \
229250
/workspace/output_mcore_qwen3_next_finetune

examples/qwen3_next/run_mcore_qwen3_lite.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ SEQ_LEN=2048
2424
SFT=false
2525
DATA_PATH=/mnt/data/datasets/mmap_qwen3_datasets_text_document
2626
PRETRAIN_CHECKPOINT_PATH=/mnt/data/ckpts/huggingface/Qwen3-Next-80B-A3B-Instruct
27-
TENSORBOARD_DIR=/mnt/data/jerry.lp/tensorboard/test_qwen3_next_pretrain
27+
TENSORBOARD_DIR=/mnt/data/tensorboard/test_qwen3_next_pretrain
2828
mkdir -p ${TENSORBOARD_DIR}
2929
TRAIN_TOKENS=10000000
3030
WARMUP_TOKENS=10000

0 commit comments

Comments
 (0)