alibaba
diff --git a/‎README.md‎
Lines changed: 2 additions & 0 deletions b/‎README.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README_zh-CN.md‎
Lines changed: 2 additions & 0 deletions b/‎README_zh-CN.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎examples/qwen3/README.md‎
Lines changed: 257 additions & 0 deletions b/‎examples/qwen3/README.md‎
Lines changed: 257 additions & 0 deletions
@@ -3,6 +3,7 @@
 
 |             |                                                  Megatron-LM-Dense                                                   |                                                   Megatron-Core-Dense                                                   |                                                    Megatron-Core-MoE                                                     |                                                  MegaBlocks-MoE                                                   |
 |:------------|:--------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------:|
+| Qwen3         |  N/A |      [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README.md#Megatron-Core模型训练流程)       |                                                           [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README.md#Megatron-Core模型训练流程)                                                              | N/A |
 | QwQ         |  N/A |      [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwq/README.md#Megatron-Core模型训练流程)       |                                                           N/A                                                            | N/A |
 | Qwen2.5-VL  |  N/A |   [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5_vl/README.md#Megatron-Core模型训练流程)   |                                                           N/A                                                            | N/A |
 | Moonlight   |                                                         N/A                                                          |                                                           N/A                                                           | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README.md#Megatron-Core-MoE模型训练流程) |                                                        N/A         |
@@ -25,6 +26,7 @@ English | [简体中文](./README_zh-CN.md)
 Pai-Megatron-Patch (https://github.com/alibaba/Pai-Megatron-Patch) is a deep learning training toolkit built for developers to train and predict LLMs & VLMs by using Megatron framework easily. With the continuous development of LLMs, the model structure and scale are rapidly evolving. Although these models can be conveniently manufactured using Transformers or DeepSpeed training framework, the training efficiency is comparably low. This phenomenon becomes even severer when the model scale exceeds 10 billion. The primary objective of Pai-Megatron-Patch is to effectively utilize the computational power of GPUs for LLM. This tool allows convenient training of commonly used LLM with all the accelerating techniques provided by Megatron-LM.
 
 What's New:
+- **Support all Qwen3 Training with torch_dist checkpoint** [🔥🔥 2025.04.29]
 - **[Experimental]Support distributed checkpoint conversion for large LLM** [🔥🔥 2025.04.16]
 - **Upgrade DeepSeek-V3 SFT with fully Mcore implementation.** [🔥🔥 2025.03.31]
 - **Support training QwQ by using Megatron-Core.** [🔥🔥 2025.03.27]
 
@@ -2,6 +2,7 @@
 
 |             |                                                  Megatron-LM-Dense                                                   |                                                   Megatron-Core-Dense                                                   |                                                    Megatron-Core-MoE                                                     |                                                  MegaBlocks-MoE                                                   |
 |:------------|:--------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------:|
+| Qwen3         |  N/A |      [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README.md#Megatron-Core模型训练流程)       |                                                           [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README.md#Megatron-Core模型训练流程)                                                              | N/A |
 | QwQ         |  N/A |      [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwq/README.md#Megatron-Core模型训练流程)       |                                                           N/A                                                            | N/A |
 | Qwen2.5-VL  |  N/A |   [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5_vl/README.md#Megatron-Core模型训练流程)   |                                                           N/A                                                            | N/A |
 | Moonlight   |                                                         N/A                                                          |                                                           N/A                                                           | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README.md#Megatron-Core-MoE模型训练流程) |                                                        N/A         |
@@ -45,6 +46,7 @@ Pai-Megatron-Patch是各类开源大模型和Megatron训练加速引擎之间的
 - [阿里云PAI获得FewCLUE基于大模型的小样本学习双料冠军](https://developer.aliyun.com/article/788081?spm=a2c6h.12873639.article-detail.17.11c5383cHpFZks&tlog=yuekan_8)
 
 新功能：
+- **支持全系列Qwen3模型基于torch_dist权重格式的训练微调** [🔥🔥 2025.04.29]
 - **[实验性]实现用于超大参数量模型的MG/HF权重分布式转换** [🔥🔥 2025.04.16]
 - **升级完善DeepSeek-V3训练微调流程** [🔥🔥 2025.03.31]
 - **支持用Megatron-Core框架训练QwQ模型** [🔥🔥 2025.03.27]
 
@@ -0,0 +1,257 @@
+# Qwen3 MoE 模型在Pai-Megatron-Patch的最佳实践
+
+## Table of Contents
+   * [安装](#安装)
+   * [数据集&模型下载](#数据集和模型下载)
+   * [Megatron-Core模型训练流程](#Megatron-Core模型训练流程)
+      * [模型格式转换](#Megatron-Core模型格式转换)
+      * [继续预训练](#预训练示例)
+      * [指令微调](#指令微调示例)
+   * [下游任务评估](#下游任务评估)
+      * [Megatron-Core模型格式转换](#评估格式转换)
+      * [运行评估工具](#运行评估工具)
+
+
+## 安装
+
+请在阿里云人工智能平台PAI产品中填写专属镜像地址： `dsw-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pai-megatron-patch:25.04` 
+
+运行下列代码克隆Pai-Megatron-Patch
+```bash
+git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git
+cd Pai-Megatron-Patch
+```
+
+目前Qwen3-MoE已支持使用FlashAttention-3加速计算，但只能在Hopper架构的GPU卡上进行运算。若需要在H卡上使用FA3，请在DSW的容器中按如下指令安装并保存镜像
+```bash
+pip install "git+https://github.com/Dao-AILab/flash-attention.git#egg=flashattn-hopper&subdirectory=hopper"
+python_path=`python -c "import site; print(site.getsitepackages()[0])"`
+mkdir -p $python_path/flashattn_hopper
+wget -P $python_path/flashattn_hopper https://raw.githubusercontent.com/Dao-AILab/flash-attention/main/hopper/flash_attn_interface.py
+```
+
+## 预训练数据集和模型下载
+
+```bash
+cd /mnt
+mkdir qwen-ckpts
+cd qwen-ckpts
+git clone https://www.modelscope.cn/Qwen/Qwen3-30B-A3B.git
+
+mkdir qwen-datasets
+wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/mmap_qwen3_datasets_text_document.bin
+wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/mmap_qwen3_datasets_text_document.idx
+
+wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/datasets/alpaca_zh-train-general.json
+wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/datasets/alpaca_zh-valid-general.json
+
+
+```
+
+## Megatron-Core模型训练流程
+### Megatron-Core模型格式转换
+当前qwen3已升级至`torch_dist`格式权重训练，为了进行权重转换，需要传入的参数列表如下
+```
+MODEL_SIZE=$1               # 模型大小，0.6B, 1.7B, 4B, 8B, 14B, 32B, A3B, A22B
+LOAD_DIR=$2                 # 源权重路径
+SAVE_DIR=$3                 # 目标权重路径
+MG2HF=$4                    # 转换方向 可选: true, false
+USE_CUDA=$5                 # 是否使用GPU转换 建议: true
+PR=$6                       # 转换精度 可选: fp32 bf16 fp16
+HF_DIR=$7                   # HF权重路径(mcore2hf时必须提供)
+```
+例如，使用下述脚本将checkpoint转换到MCore格式
+
+```bash
+cd /workspace/Pai-Megatron-Patch/toolkits/distributed_checkpoints_convertor
+bash scripts/qwen3/run_8xH20.sh \
+A3B \
+/mnt/qwen-ckpts/Qwen3-30B-A3B \
+/mnt/qwen-ckpts/Qwen3-30B-A3B-to-mcore  \
+false \
+true \
+bf16
+```
+
+如果需要自定义转换脚本，请参阅分布式转换工具。
+
+### Megatron-Core预训练及指令微调
+在Qwen3 MoE中，我们已将预训练和微调整合到`run_mcore_qwen3.sh`脚本，对于不同的使用场景，二者各参数的意义有所不同。
+
+#### 预训练&微调命令统一描述
+需要传入的参数列表如下：
+```bash
+ENV=$1                          # 运行环境配置开关: dsw单机训练训练，dlc表示多机训练环境
+MODEL_SIZE=$2                   # 模型结构参数量级: 0.6B, 1.7B, 4B, 8B, 14B, 32B, A3B, A22B
+BATCH_SIZE=$3                   # 一次迭代一个数据并行内的样本数
+GLOBAL_BATCH_SIZE=$4            # 一次迭代多个数据并行的总样本数
+LR=$5                           # 学习率
+MIN_LR=$6                       # 最小学习率
+SEQ_LEN=$7                      # 序列长度
+PAD_LEN=$8                      # Padding长度
+PR=${9}                         # 训练精度: fp16, bf16, fp8
+TP=${10}                        # 模型并行度
+PP=${11}                        # 流水并行度
+CP=${12}                        # 上下文并行度
+ETP=${13}                       # 专家张量并行度
+EP=${14}                        # 专家模型并行度
+SP=${15}                        # 是否使用序列并行: true, false
+DO=${16}                        # 是否使用Megatron版Zero-1降显存优化器: true, false
+FL=${17}                        # 是否优先使用Flash Attention: true, false
+SFT=${18}                       # 是否执行微调训练: true, false
+AC=${19}                        # 激活检查点模式: sel, full, offload, false
+OPTIMIZER_OFFLOAD=${20}         # 是否启用Offload optimizer: false, 或输入0～1的小数作为参数offload比例
+SAVE_INTERVAL=${21}             # 保存ckpt的间隔
+DATASET_PATH=${22}              # 训练数据集路径
+VALID_DATASET_PATH=${23}        # 验证数据集路径
+PRETRAIN_CHECKPOINT_PATH=${24}  # 预训练模型路径
+TRAIN_TOKENS_OR_ITERS=${25}     # 训练TOKEN或者Iter数
+WARMUP_TOKENS_OR_ITERS=${26}    # 预热TOKEN或者Iter数        
+OUTPUT_BASEPATH=${27}           # 训练输出日志文件路径
+```
+
+#### 预训练示例
+使用以下命令启动对qwen2的继续预训练。
+备注：当`AC=offload`或`full`时，可设置`MP_AC_LAYERS`环境变量来控制Checkpointing或Offload的TransformerLayer层数（默认值：`1`）。
+
+```bash
+cd /workspace/Pai-Megatron-Patch/examples/qwen3
+sh run_mcore_qwen3.sh  \
+dlc  \
+A3B   \
+1    \
+8 \
+1e-5   \
+1e-6   \
+128  \
+128  \
+bf16  \
+4   \
+2  \
+1 \
+1 \
+4 \
+true \
+true   \
+true \
+false \
+sel   \
+false \
+100000  \
+/mnt/qwen-datasets/mmap_qwen3_datasets_text_document   \
+/mnt/qwen-datasets/mmap_qwen3_datasets_text_document   \
+/mnt/qwen-ckpts/Qwen3-30B-A3B-to-mcore  \
+10000  \
+100   \
+/mnt/logs/output_mcore_qwen3_pretrain
+```
+
+#### 指令微调示例
+制作idxmap用于微调的数据集可以参考[链接](https://github.com/alibaba/Pai-Megatron-Patch/tree/main/toolkits/sft_data_preprocessing)。
+当准备好微调数据集后，将SFT开关设置为`true`即可进行指令微调。
+
+```bash
+cd /workspace/Pai-Megatron-Patch/examples/qwen3
+sh run_mcore_qwen3.sh  \
+dlc  \
+A3B   \
+1    \
+8 \
+1e-5   \
+1e-6   \
+128  \
+128  \
+bf16  \
+4   \
+2  \
+1 \
+1 \
+4 \
+true \
+true   \
+true \
+true \
+sel   \
+false \
+100000  \
+/mnt/qwen-datasets/path_to_your_dataset   \
+/mnt/qwen-datasets/path_to_your_dataset   \
+/path/to/pretraining/checkpoint  \
+10000  \
+100   \
+/workspace/output_mcore_qwen3_finetune
+```
+通过设置MP_DATASET_TYPE环境变量，本脚本还可使用json格式的数据集进行指令微调
+```bash
+export MP_DATASET_TYPE="raw"
+cd /workspace/Pai-Megatron-Patch/examples/qwen3
+sh run_mcore_qwen3_moe.sh  \
+dlc  \
+A3B   \
+1    \
+8 \
+1e-5   \
+1e-6   \
+128  \
+128  \
+bf16  \
+4   \
+2  \
+1 \
+1 \
+4 \
+true \
+true   \
+true \
+true \
+sel   \
+false \
+100000  \
+/mnt/qwen-datasets/alpaca_zh-train-general.json    \
+/mnt/qwen-datasets/alpaca_zh-valid-general.json   \
+/mnt/qwen-ckpts/Qwen3-30B-A3B-to-mcore  \
+10000  \
+100   \
+/workspace/output_mcore_qwen3_finetune
+```
+
+## 下游任务评估
+
+### 评估格式转换
+您需要将训练/微调后保存的Megatron-Core转换为HuggingFace格式来进行推理评估。
+
+```bash
+cd /workspace/Pai-Megatron-Patch/toolkits/distributed_checkpoints_convertor
+bash scripts/qwen3/run_8xH20.sh \
+A3B \
+/mnt/qwen-ckpts/Qwen3-30B-A3B-to-mcore \
+/mnt/qwen-ckpts/Qwen3-30B-A3B-mcore-to-hf  \
+true \
+true \
+bf16 \
+/mnt/qwen-ckpts/Qwen3-30B-A3B
+```
+
+### 运行评估工具
+下载评估数据
+```bash
+# In container
+cd /workspace
+
+wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/evaluation-datasets/evaluate.tgz 
+wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/evaluation-datasets/cmmlu.tgz 
+wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/evaluation-datasets/ceval.tgz 
+
+tar -xvzf cmmlu.tgz 
+tar -xvzf ceval.tgz 
+tar -xvzf evaluate.tgz
+```
+运行以下指令对转换后的模型进行评估。
+```bash
+cd /workspace/Pai-Megatron-Patch/LM-Evaluation-Harness-240310
+accelerate launch --main_process_port 29051 -m lm_eval \
+--model hf \
+--model_args pretrained=/mnt/qwen-ckpts/Qwen3-30B-A3B-mcore-te-to-hf,trust_remote_code=True \
+--tasks cmmlu,ceval-valid  \
+--batch_size 16
+```