|
| 1 | +# Qwen3 MoE 模型在Pai-Megatron-Patch的最佳实践 |
| 2 | + |
| 3 | +## Table of Contents |
| 4 | + * [安装](#安装) |
| 5 | + * [数据集&模型下载](#数据集和模型下载) |
| 6 | + * [Megatron-Core模型训练流程](#Megatron-Core模型训练流程) |
| 7 | + * [模型格式转换](#Megatron-Core模型格式转换) |
| 8 | + * [继续预训练](#预训练示例) |
| 9 | + * [指令微调](#指令微调示例) |
| 10 | + * [下游任务评估](#下游任务评估) |
| 11 | + * [Megatron-Core模型格式转换](#评估格式转换) |
| 12 | + * [运行评估工具](#运行评估工具) |
| 13 | + |
| 14 | + |
| 15 | +## 安装 |
| 16 | + |
| 17 | +请在阿里云人工智能平台PAI产品中填写专属镜像地址: `dsw-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pai-megatron-patch:25.04` |
| 18 | + |
| 19 | +运行下列代码克隆Pai-Megatron-Patch |
| 20 | +```bash |
| 21 | +git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git |
| 22 | +cd Pai-Megatron-Patch |
| 23 | +``` |
| 24 | + |
| 25 | +目前Qwen3-MoE已支持使用FlashAttention-3加速计算,但只能在Hopper架构的GPU卡上进行运算。若需要在H卡上使用FA3,请在DSW的容器中按如下指令安装并保存镜像 |
| 26 | +```bash |
| 27 | +pip install "git+https://github.com/Dao-AILab/flash-attention.git#egg=flashattn-hopper&subdirectory=hopper" |
| 28 | +python_path=`python -c "import site; print(site.getsitepackages()[0])"` |
| 29 | +mkdir -p $python_path/flashattn_hopper |
| 30 | +wget -P $python_path/flashattn_hopper https://raw.githubusercontent.com/Dao-AILab/flash-attention/main/hopper/flash_attn_interface.py |
| 31 | +``` |
| 32 | + |
| 33 | +## 预训练数据集和模型下载 |
| 34 | + |
| 35 | +```bash |
| 36 | +cd /mnt |
| 37 | +mkdir qwen-ckpts |
| 38 | +cd qwen-ckpts |
| 39 | +git clone https://www.modelscope.cn/Qwen/Qwen3-30B-A3B.git |
| 40 | + |
| 41 | +mkdir qwen-datasets |
| 42 | +wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/mmap_qwen3_datasets_text_document.bin |
| 43 | +wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/mmap_qwen3_datasets_text_document.idx |
| 44 | + |
| 45 | +wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/datasets/alpaca_zh-train-general.json |
| 46 | +wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/datasets/alpaca_zh-valid-general.json |
| 47 | + |
| 48 | + |
| 49 | +``` |
| 50 | + |
| 51 | +## Megatron-Core模型训练流程 |
| 52 | +### Megatron-Core模型格式转换 |
| 53 | +当前qwen3已升级至`torch_dist`格式权重训练,为了进行权重转换,需要传入的参数列表如下 |
| 54 | +``` |
| 55 | +MODEL_SIZE=$1 # 模型大小,0.6B, 1.7B, 4B, 8B, 14B, 32B, A3B, A22B |
| 56 | +LOAD_DIR=$2 # 源权重路径 |
| 57 | +SAVE_DIR=$3 # 目标权重路径 |
| 58 | +MG2HF=$4 # 转换方向 可选: true, false |
| 59 | +USE_CUDA=$5 # 是否使用GPU转换 建议: true |
| 60 | +PR=$6 # 转换精度 可选: fp32 bf16 fp16 |
| 61 | +HF_DIR=$7 # HF权重路径(mcore2hf时必须提供) |
| 62 | +``` |
| 63 | +例如,使用下述脚本将checkpoint转换到MCore格式 |
| 64 | + |
| 65 | +```bash |
| 66 | +cd /workspace/Pai-Megatron-Patch/toolkits/distributed_checkpoints_convertor |
| 67 | +bash scripts/qwen3/run_8xH20.sh \ |
| 68 | +A3B \ |
| 69 | +/mnt/qwen-ckpts/Qwen3-30B-A3B \ |
| 70 | +/mnt/qwen-ckpts/Qwen3-30B-A3B-to-mcore \ |
| 71 | +false \ |
| 72 | +true \ |
| 73 | +bf16 |
| 74 | +``` |
| 75 | + |
| 76 | +如果需要自定义转换脚本,请参阅分布式转换工具。 |
| 77 | + |
| 78 | +### Megatron-Core预训练及指令微调 |
| 79 | +在Qwen3 MoE中,我们已将预训练和微调整合到`run_mcore_qwen3.sh`脚本,对于不同的使用场景,二者各参数的意义有所不同。 |
| 80 | + |
| 81 | +#### 预训练&微调命令统一描述 |
| 82 | +需要传入的参数列表如下: |
| 83 | +```bash |
| 84 | +ENV=$1 # 运行环境配置开关: dsw单机训练训练,dlc表示多机训练环境 |
| 85 | +MODEL_SIZE=$2 # 模型结构参数量级: 0.6B, 1.7B, 4B, 8B, 14B, 32B, A3B, A22B |
| 86 | +BATCH_SIZE=$3 # 一次迭代一个数据并行内的样本数 |
| 87 | +GLOBAL_BATCH_SIZE=$4 # 一次迭代多个数据并行的总样本数 |
| 88 | +LR=$5 # 学习率 |
| 89 | +MIN_LR=$6 # 最小学习率 |
| 90 | +SEQ_LEN=$7 # 序列长度 |
| 91 | +PAD_LEN=$8 # Padding长度 |
| 92 | +PR=${9} # 训练精度: fp16, bf16, fp8 |
| 93 | +TP=${10} # 模型并行度 |
| 94 | +PP=${11} # 流水并行度 |
| 95 | +CP=${12} # 上下文并行度 |
| 96 | +ETP=${13} # 专家张量并行度 |
| 97 | +EP=${14} # 专家模型并行度 |
| 98 | +SP=${15} # 是否使用序列并行: true, false |
| 99 | +DO=${16} # 是否使用Megatron版Zero-1降显存优化器: true, false |
| 100 | +FL=${17} # 是否优先使用Flash Attention: true, false |
| 101 | +SFT=${18} # 是否执行微调训练: true, false |
| 102 | +AC=${19} # 激活检查点模式: sel, full, offload, false |
| 103 | +OPTIMIZER_OFFLOAD=${20} # 是否启用Offload optimizer: false, 或输入0~1的小数作为参数offload比例 |
| 104 | +SAVE_INTERVAL=${21} # 保存ckpt的间隔 |
| 105 | +DATASET_PATH=${22} # 训练数据集路径 |
| 106 | +VALID_DATASET_PATH=${23} # 验证数据集路径 |
| 107 | +PRETRAIN_CHECKPOINT_PATH=${24} # 预训练模型路径 |
| 108 | +TRAIN_TOKENS_OR_ITERS=${25} # 训练TOKEN或者Iter数 |
| 109 | +WARMUP_TOKENS_OR_ITERS=${26} # 预热TOKEN或者Iter数 |
| 110 | +OUTPUT_BASEPATH=${27} # 训练输出日志文件路径 |
| 111 | +``` |
| 112 | + |
| 113 | +#### 预训练示例 |
| 114 | +使用以下命令启动对qwen2的继续预训练。 |
| 115 | +备注:当`AC=offload`或`full`时,可设置`MP_AC_LAYERS`环境变量来控制Checkpointing或Offload的TransformerLayer层数(默认值:`1`)。 |
| 116 | + |
| 117 | +```bash |
| 118 | +cd /workspace/Pai-Megatron-Patch/examples/qwen3 |
| 119 | +sh run_mcore_qwen3.sh \ |
| 120 | +dlc \ |
| 121 | +A3B \ |
| 122 | +1 \ |
| 123 | +8 \ |
| 124 | +1e-5 \ |
| 125 | +1e-6 \ |
| 126 | +128 \ |
| 127 | +128 \ |
| 128 | +bf16 \ |
| 129 | +4 \ |
| 130 | +2 \ |
| 131 | +1 \ |
| 132 | +1 \ |
| 133 | +4 \ |
| 134 | +true \ |
| 135 | +true \ |
| 136 | +true \ |
| 137 | +false \ |
| 138 | +sel \ |
| 139 | +false \ |
| 140 | +100000 \ |
| 141 | +/mnt/qwen-datasets/mmap_qwen3_datasets_text_document \ |
| 142 | +/mnt/qwen-datasets/mmap_qwen3_datasets_text_document \ |
| 143 | +/mnt/qwen-ckpts/Qwen3-30B-A3B-to-mcore \ |
| 144 | +10000 \ |
| 145 | +100 \ |
| 146 | +/mnt/logs/output_mcore_qwen3_pretrain |
| 147 | +``` |
| 148 | + |
| 149 | +#### 指令微调示例 |
| 150 | +制作idxmap用于微调的数据集可以参考[链接](https://github.com/alibaba/Pai-Megatron-Patch/tree/main/toolkits/sft_data_preprocessing)。 |
| 151 | +当准备好微调数据集后,将SFT开关设置为`true`即可进行指令微调。 |
| 152 | + |
| 153 | +```bash |
| 154 | +cd /workspace/Pai-Megatron-Patch/examples/qwen3 |
| 155 | +sh run_mcore_qwen3.sh \ |
| 156 | +dlc \ |
| 157 | +A3B \ |
| 158 | +1 \ |
| 159 | +8 \ |
| 160 | +1e-5 \ |
| 161 | +1e-6 \ |
| 162 | +128 \ |
| 163 | +128 \ |
| 164 | +bf16 \ |
| 165 | +4 \ |
| 166 | +2 \ |
| 167 | +1 \ |
| 168 | +1 \ |
| 169 | +4 \ |
| 170 | +true \ |
| 171 | +true \ |
| 172 | +true \ |
| 173 | +true \ |
| 174 | +sel \ |
| 175 | +false \ |
| 176 | +100000 \ |
| 177 | +/mnt/qwen-datasets/path_to_your_dataset \ |
| 178 | +/mnt/qwen-datasets/path_to_your_dataset \ |
| 179 | +/path/to/pretraining/checkpoint \ |
| 180 | +10000 \ |
| 181 | +100 \ |
| 182 | +/workspace/output_mcore_qwen3_finetune |
| 183 | +``` |
| 184 | +通过设置MP_DATASET_TYPE环境变量,本脚本还可使用json格式的数据集进行指令微调 |
| 185 | +```bash |
| 186 | +export MP_DATASET_TYPE="raw" |
| 187 | +cd /workspace/Pai-Megatron-Patch/examples/qwen3 |
| 188 | +sh run_mcore_qwen3_moe.sh \ |
| 189 | +dlc \ |
| 190 | +A3B \ |
| 191 | +1 \ |
| 192 | +8 \ |
| 193 | +1e-5 \ |
| 194 | +1e-6 \ |
| 195 | +128 \ |
| 196 | +128 \ |
| 197 | +bf16 \ |
| 198 | +4 \ |
| 199 | +2 \ |
| 200 | +1 \ |
| 201 | +1 \ |
| 202 | +4 \ |
| 203 | +true \ |
| 204 | +true \ |
| 205 | +true \ |
| 206 | +true \ |
| 207 | +sel \ |
| 208 | +false \ |
| 209 | +100000 \ |
| 210 | +/mnt/qwen-datasets/alpaca_zh-train-general.json \ |
| 211 | +/mnt/qwen-datasets/alpaca_zh-valid-general.json \ |
| 212 | +/mnt/qwen-ckpts/Qwen3-30B-A3B-to-mcore \ |
| 213 | +10000 \ |
| 214 | +100 \ |
| 215 | +/workspace/output_mcore_qwen3_finetune |
| 216 | +``` |
| 217 | + |
| 218 | +## 下游任务评估 |
| 219 | + |
| 220 | +### 评估格式转换 |
| 221 | +您需要将训练/微调后保存的Megatron-Core转换为HuggingFace格式来进行推理评估。 |
| 222 | + |
| 223 | +```bash |
| 224 | +cd /workspace/Pai-Megatron-Patch/toolkits/distributed_checkpoints_convertor |
| 225 | +bash scripts/qwen3/run_8xH20.sh \ |
| 226 | +A3B \ |
| 227 | +/mnt/qwen-ckpts/Qwen3-30B-A3B-to-mcore \ |
| 228 | +/mnt/qwen-ckpts/Qwen3-30B-A3B-mcore-to-hf \ |
| 229 | +true \ |
| 230 | +true \ |
| 231 | +bf16 \ |
| 232 | +/mnt/qwen-ckpts/Qwen3-30B-A3B |
| 233 | +``` |
| 234 | + |
| 235 | +### 运行评估工具 |
| 236 | +下载评估数据 |
| 237 | +```bash |
| 238 | +# In container |
| 239 | +cd /workspace |
| 240 | + |
| 241 | +wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/evaluation-datasets/evaluate.tgz |
| 242 | +wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/evaluation-datasets/cmmlu.tgz |
| 243 | +wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/evaluation-datasets/ceval.tgz |
| 244 | + |
| 245 | +tar -xvzf cmmlu.tgz |
| 246 | +tar -xvzf ceval.tgz |
| 247 | +tar -xvzf evaluate.tgz |
| 248 | +``` |
| 249 | +运行以下指令对转换后的模型进行评估。 |
| 250 | +```bash |
| 251 | +cd /workspace/Pai-Megatron-Patch/LM-Evaluation-Harness-240310 |
| 252 | +accelerate launch --main_process_port 29051 -m lm_eval \ |
| 253 | +--model hf \ |
| 254 | +--model_args pretrained=/mnt/qwen-ckpts/Qwen3-30B-A3B-mcore-te-to-hf,trust_remote_code=True \ |
| 255 | +--tasks cmmlu,ceval-valid \ |
| 256 | +--batch_size 16 |
| 257 | +``` |
0 commit comments