alibaba
diff --git a/‎.gitmodules‎
Lines changed: 2 additions & 2 deletions b/‎.gitmodules‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎Megatron-LM-250314‎
Lines changed: 0 additions & 1 deletion b/‎Megatron-LM-250314‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎Megatron-LM-250328‎ b/‎Megatron-LM-250328‎
diff --git a/‎README.md‎
Lines changed: 2 additions & 1 deletion b/‎README.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎README_zh-CN.md‎
Lines changed: 2 additions & 1 deletion b/‎README_zh-CN.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎examples/deepseek_v3/README.md‎
Lines changed: 1 addition & 1 deletion b/‎examples/deepseek_v3/README.md‎
Lines changed: 1 addition & 1 deletion
@@ -25,6 +25,6 @@
 [submodule "Megatron-LM-250217"]
 	path = Megatron-LM-250217
 	url = https://github.com/NVIDIA/Megatron-LM.git
-[submodule "Megatron-LM-250314"]
-	path = Megatron-LM-250314
+[submodule "Megatron-LM-250328"]
+	path = Megatron-LM-250328
 	url = https://github.com/NVIDIA/Megatron-LM.git
@@ -14,7 +14,7 @@
 | LLama3      | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3/README.md#Megatron-LM-Dense模型训练流程)  |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3/README.md#Megatron-Core-Dense模型训练流程)  |                                                           N/A                                                            |                                                        N/A                                                        |
 | LLama2      | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama2/README.md#Megatron-LM-Dense模型训练流程)  |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama2/README.md#Megatron-Core-Dense模型训练流程)  |                                                           N/A                                                            |                                                        N/A                                                        |
 | Mistral     | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-LM-Dense模型训练流程) |    [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core模型训练流程)     |     [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core模型训练流程)     |                                                        N/A                                                        |
-| Qwen2       |                                                         N/A                                                          |     [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程)      |      [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程)      |                                                        N/A                                                        |
+| Qwen2       |                                                         N/A                                                          |     [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程)      |      [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README_moe.md#Megatron-Core模型训练流程)      |                                                        N/A                                                        |
 | Qwen1.5     | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-LM-Dense模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-Core-Dense模型训练流程)  |   [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-Core-MoE模型训练流程)   | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#MegaBlocks-MoE模型训练流程) |
 | DeepSeek-V2 |                                                         N/A                                                          |                                                           N/A                                                           | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v2/README.md#Megatron-Core-MoE模型训练流程) |                                                        N/A                                                        |
 
@@ -25,6 +25,7 @@ English | [简体中文](./README_zh-CN.md)
 Pai-Megatron-Patch (https://github.com/alibaba/Pai-Megatron-Patch) is a deep learning training toolkit built for developers to train and predict LLMs & VLMs by using Megatron framework easily. With the continuous development of LLMs, the model structure and scale are rapidly evolving. Although these models can be conveniently manufactured using Transformers or DeepSpeed training framework, the training efficiency is comparably low. This phenomenon becomes even severer when the model scale exceeds 10 billion. The primary objective of Pai-Megatron-Patch is to effectively utilize the computational power of GPUs for LLM. This tool allows convenient training of commonly used LLM with all the accelerating techniques provided by Megatron-LM.
 
 What's New:
+- **Upgrade DeepSeek-V3 SFT with fully Mcore implementation.** [🔥🔥 2025.03.31]
 - **Support training QwQ by using Megatron-Core.** [🔥🔥 2025.03.27]
 - **Support training Qwen2.5-VL by using Megatron-Core.** [🔥🔥 2025.03.21]
 - **Support training Moonlight-16B-A3B from Moonshot AI KIMI by using Megatron-Core.** [🔥🔥 2025.03.14]
 
@@ -13,7 +13,7 @@
 | LLama3      | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3/README.md#Megatron-LM-Dense模型训练流程)  |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3/README.md#Megatron-Core-Dense模型训练流程)  |                                                           N/A                                                            |                                                        N/A                                                        |
 | LLama2      | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama2/README.md#Megatron-LM-Dense模型训练流程)  |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama2/README.md#Megatron-Core-Dense模型训练流程)  |                                                           N/A                                                            |                                                        N/A                                                        |
 | Mistral     | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-LM-Dense模型训练流程) |    [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core模型训练流程)     |     [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core模型训练流程)     |                                                        N/A                                                        |
-| Qwen2       |                                                         N/A                                                          |     [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程)      |      [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程)      |                                                        N/A                                                        |
+| Qwen2       |                                                         N/A                                                          |     [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程)      |      [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README_moe.md#Megatron-Core模型训练流程)      |                                                        N/A                                                        |
 | Qwen1.5     | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-LM-Dense模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-Core-Dense模型训练流程)  |   [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-Core-MoE模型训练流程)   | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#MegaBlocks-MoE模型训练流程) |
 | DeepSeek-V2 |                                                         N/A                                                          |                                                           N/A                                                           | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v2/README.md#Megatron-Core-MoE模型训练流程) |                                                        N/A                                                        |
 
@@ -45,6 +45,7 @@ Pai-Megatron-Patch是各类开源大模型和Megatron训练加速引擎之间的
 - [阿里云PAI获得FewCLUE基于大模型的小样本学习双料冠军](https://developer.aliyun.com/article/788081?spm=a2c6h.12873639.article-detail.17.11c5383cHpFZks&tlog=yuekan_8)
 
 新功能：
+- **升级完善DeepSeek-V3训练微调流程** [🔥🔥 2025.03.31]
 - **支持用Megatron-Core框架训练QwQ模型** [🔥🔥 2025.03.27]
 - **支持用Megatron-Core框架训练Qwen2.5-VL模型** [🔥🔥 2025.03.21]
 - **支持用Megatron-Core框架训练来自月之暗面KIMI的Moonlight-16B-A3B模型** [🔥🔥 2025.03.14]
 
@@ -14,7 +14,7 @@
 
 ## 安装
 
-请在阿里云人工智能平台PAI产品中填写专属镜像地址： `dsw-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pai-megatron-patch:25.01` 
+请在阿里云人工智能平台PAI产品中填写专属镜像地址： `dsw-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pai-megatron-patch:25.02` 
 
 运行下列代码克隆Pai-Megatron-Patch
 ```bash