Skip to content

Commit 2c544a1

Browse files
authored
upgrade dskv3 (#540)
1 parent 4471f7c commit 2c544a1

30 files changed

+726
-3578
lines changed

.gitmodules

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,6 @@
2525
[submodule "Megatron-LM-250217"]
2626
path = Megatron-LM-250217
2727
url = https://github.com/NVIDIA/Megatron-LM.git
28-
[submodule "Megatron-LM-250314"]
29-
path = Megatron-LM-250314
28+
[submodule "Megatron-LM-250328"]
29+
path = Megatron-LM-250328
3030
url = https://github.com/NVIDIA/Megatron-LM.git

Megatron-LM-250314

Lines changed: 0 additions & 1 deletion
This file was deleted.

Megatron-LM-250328

Submodule Megatron-LM-250328 added at 6ba97dd

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
| LLama3 | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3/README.md#Megatron-LM-Dense模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3/README.md#Megatron-Core-Dense模型训练流程) | N/A | N/A |
1515
| LLama2 | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama2/README.md#Megatron-LM-Dense模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama2/README.md#Megatron-Core-Dense模型训练流程) | N/A | N/A |
1616
| Mistral | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-LM-Dense模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core模型训练流程) | N/A |
17-
| Qwen2 | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程) | N/A |
17+
| Qwen2 | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README_moe.md#Megatron-Core模型训练流程) | N/A |
1818
| Qwen1.5 | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-LM-Dense模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-Core-Dense模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-Core-MoE模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#MegaBlocks-MoE模型训练流程) |
1919
| DeepSeek-V2 | N/A | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v2/README.md#Megatron-Core-MoE模型训练流程) | N/A |
2020

@@ -25,6 +25,7 @@ English | [简体中文](./README_zh-CN.md)
2525
Pai-Megatron-Patch (https://github.com/alibaba/Pai-Megatron-Patch) is a deep learning training toolkit built for developers to train and predict LLMs & VLMs by using Megatron framework easily. With the continuous development of LLMs, the model structure and scale are rapidly evolving. Although these models can be conveniently manufactured using Transformers or DeepSpeed training framework, the training efficiency is comparably low. This phenomenon becomes even severer when the model scale exceeds 10 billion. The primary objective of Pai-Megatron-Patch is to effectively utilize the computational power of GPUs for LLM. This tool allows convenient training of commonly used LLM with all the accelerating techniques provided by Megatron-LM.
2626

2727
What's New:
28+
- **Upgrade DeepSeek-V3 SFT with fully Mcore implementation.** [🔥🔥 2025.03.31]
2829
- **Support training QwQ by using Megatron-Core.** [🔥🔥 2025.03.27]
2930
- **Support training Qwen2.5-VL by using Megatron-Core.** [🔥🔥 2025.03.21]
3031
- **Support training Moonlight-16B-A3B from Moonshot AI KIMI by using Megatron-Core.** [🔥🔥 2025.03.14]

README_zh-CN.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
| LLama3 | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3/README.md#Megatron-LM-Dense模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3/README.md#Megatron-Core-Dense模型训练流程) | N/A | N/A |
1414
| LLama2 | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama2/README.md#Megatron-LM-Dense模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama2/README.md#Megatron-Core-Dense模型训练流程) | N/A | N/A |
1515
| Mistral | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-LM-Dense模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core模型训练流程) | N/A |
16-
| Qwen2 | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程) | N/A |
16+
| Qwen2 | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README_moe.md#Megatron-Core模型训练流程) | N/A |
1717
| Qwen1.5 | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-LM-Dense模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-Core-Dense模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-Core-MoE模型训练流程) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#MegaBlocks-MoE模型训练流程) |
1818
| DeepSeek-V2 | N/A | N/A | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v2/README.md#Megatron-Core-MoE模型训练流程) | N/A |
1919

@@ -45,6 +45,7 @@ Pai-Megatron-Patch是各类开源大模型和Megatron训练加速引擎之间的
4545
- [阿里云PAI获得FewCLUE基于大模型的小样本学习双料冠军](https://developer.aliyun.com/article/788081?spm=a2c6h.12873639.article-detail.17.11c5383cHpFZks&tlog=yuekan_8)
4646

4747
新功能:
48+
- **升级完善DeepSeek-V3训练微调流程** [🔥🔥 2025.03.31]
4849
- **支持用Megatron-Core框架训练QwQ模型** [🔥🔥 2025.03.27]
4950
- **支持用Megatron-Core框架训练Qwen2.5-VL模型** [🔥🔥 2025.03.21]
5051
- **支持用Megatron-Core框架训练来自月之暗面KIMI的Moonlight-16B-A3B模型** [🔥🔥 2025.03.14]

examples/deepseek_v3/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515
## 安装
1616

17-
请在阿里云人工智能平台PAI产品中填写专属镜像地址: `dsw-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pai-megatron-patch:25.01`
17+
请在阿里云人工智能平台PAI产品中填写专属镜像地址: `dsw-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pai-megatron-patch:25.02`
1818

1919
运行下列代码克隆Pai-Megatron-Patch
2020
```bash

0 commit comments

Comments
 (0)