Support all Mcore Models Context Parallel Reinforcement Training via ChatLearn (#659)

jerryli1981 · web-flow · commit 16d121f8f8e2 · 2025-09-03T11:01:40.000+08:00
* Support all Mcore Models Context Parallel Reinforcement Training via ChatLearn

* Support all Mcore Models Context Parallel Reinforcement Training via ChatLearn
diff --git a/README.md b/README.md
@@ -3,28 +3,22 @@
 
 |             |                                                      Megatron-Core                                                       |                                                                                        ChatLearn                                                                                        |    verl     |
 |:------------|:------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------:|
-| Qwen3       |      [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README.md#Megatron-Core模型训练流程)      |                                             [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README_grpo.md)                                             | Coming Soon |
-| QwQ         |       [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwq/README.md#Megatron-Core模型训练流程)       |                                                                                           N/A                                                                                           |     N/A     |
-| Qwen2.5-VL  |   [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5_vl/README.md#Megatron-Core模型训练流程)    |                                                                                           N/A                                                                                           |     N/A     |
-| Moonlight   |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README.md#Megatron-Core-MoE模型训练流程)  |                                           [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README_grpo.md)                                           | Coming Soon |
-| DeepSeek-V3 | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README.md#Megatron-Core-MoE模型训练流程) |                                         [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README_grpo.md)                                                                                     | Coming Soon |
-| Qwen2-VL    |    [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_vl/README.md#Megatron-Core模型训练流程)     |                                                                                           N/A                                                                                           |     N/A     |
-| LLaVA       |   [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llava_mcore/README.md#Megatron-Core模型训练流程)   |                                                                                           N/A                                                                                           |     N/A     |
-| Qwen2.5     |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5/README.md#Megatron-Core-Dense模型训练流程)  |                                                                                           N/A                                                                                           |     N/A     |  
-| LLama3.1    | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3_1/README.md#Megatron-Core-Dense模型训练流程)  |                                                                                           N/A                                                                                           |     N/A     |
-| LLama3      |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3/README.md#Megatron-Core-Dense模型训练流程)   |                                                                                           N/A                                                                                           |     N/A     |
-| LLama2      |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama2/README.md#Megatron-Core-Dense模型训练流程)   |                                                                                           N/A                                                                                           |     N/A     |
-| Mistral     |     [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core模型训练流程)     |                                                                                           N/A                                                                                           |     N/A     |
-| Qwen2       |      [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程)      |                                                                                           N/A                                                                                           |     N/A     |
-| Qwen1.5     |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-Core-Dense模型训练流程)  |                                                                                           N/A                                                                                           |     N/A     |
-| DeepSeek-V2 | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v2/README.md#Megatron-Core-MoE模型训练流程) |                                                                                           N/A                                                                                           |     N/A     |
+|Qwen3       |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README.md)|[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README_grpo.md) | Coming Soon |
+|QwQ         |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwq/README.md)| N/A | N/A |
+|Qwen2.5-VL  |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5_vl/README.md)| N/A | N/A |
+|Moonlight   |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README.md)|[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README_grpo.md)| N/A |
+|DeepSeek-V3 |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README.md)| N/A | N/A |
+|DeepSeek-R1 | N/A |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README_grpo.md)| Coming Soon |
+
+
 
 ## Introduction
 English | [简体中文](./README_zh-CN.md)
 
 Pai-Megatron-Patch (https://github.com/alibaba/Pai-Megatron-Patch) is a deep learning training toolkit built for developers to train and predict LLMs & VLMs by using Megatron framework easily. With the continuous development of LLMs, the model structure and scale are rapidly evolving. Although these models can be conveniently manufactured using Transformers or DeepSpeed training framework, the training efficiency is comparably low. This phenomenon becomes even severer when the model scale exceeds 10 billion. The primary objective of Pai-Megatron-Patch is to effectively utilize the computational power of GPUs for LLM. This tool allows convenient training of commonly used LLM with all the accelerating techniques provided by Megatron-LM.
 
 What's New:
+- **Support all Mcore Models Context Parallel Reinforcement Training via ChatLearn** [🔥🔥 2025.08.31]
 - **Support all Mcore Models GSPO Reinforcement Training via ChatLearn** [🔥🔥 2025.08.11]
 - **Support DeepSeek-V3-671B GRPO Reinforcement Training using Megatron-Core and ChatLearn** [🔥🔥 2025.07.31]
 - **Support Qwen3-235B GRPO Reinforcement Training using Megatron-Core and ChatLearn** [🔥🔥 2025.07.20]
diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -1,22 +1,16 @@
 ## 快速开始
 
+
 |             |                                                      Megatron-Core                                                       |                                                                                        ChatLearn                                                                                        |    verl     |
 |:------------|:------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------:|
-| Qwen3       |      [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README.md#Megatron-Core模型训练流程)      |                                             [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README_grpo.md)                                             | Coming Soon |
-| QwQ         |       [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwq/README.md#Megatron-Core模型训练流程)       |                                                                                           N/A                                                                                           |     N/A     |
-| Qwen2.5-VL  |   [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5_vl/README.md#Megatron-Core模型训练流程)    |                                                                                           N/A                                                                                           |     N/A     |
-| Moonlight   |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README.md#Megatron-Core-MoE模型训练流程)  |                                           [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README_grpo.md)                                           | Coming Soon |
-| DeepSeek-V3 | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README.md#Megatron-Core-MoE模型训练流程) |                                         [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README_grpo.md)                                                                                     | Coming Soon |
-| Qwen2-VL    |    [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_vl/README.md#Megatron-Core模型训练流程)     |                                                                                           N/A                                                                                           |     N/A     |
-| LLaVA       |   [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llava_mcore/README.md#Megatron-Core模型训练流程)   |                                                                                           N/A                                                                                           |     N/A     |
-| Qwen2.5     |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5/README.md#Megatron-Core-Dense模型训练流程)  |                                                                                           N/A                                                                                           |     N/A     |  
-| LLama3.1    | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3_1/README.md#Megatron-Core-Dense模型训练流程)  |                                                                                           N/A                                                                                           |     N/A     |
-| LLama3      |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama3/README.md#Megatron-Core-Dense模型训练流程)   |                                                                                           N/A                                                                                           |     N/A     |
-| LLama2      |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/llama2/README.md#Megatron-Core-Dense模型训练流程)   |                                                                                           N/A                                                                                           |     N/A     |
-| Mistral     |     [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/mistral/README.md#Megatron-Core模型训练流程)     |                                                                                           N/A                                                                                           |     N/A     |
-| Qwen2       |      [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2/README.md#Megatron-Core模型训练流程)      |                                                                                           N/A                                                                                           |     N/A     |
-| Qwen1.5     |  [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen1_5/README.md#Megatron-Core-Dense模型训练流程)  |                                                                                           N/A                                                                                           |     N/A     |
-| DeepSeek-V2 | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v2/README.md#Megatron-Core-MoE模型训练流程) |                                                                                           N/A                                                                                           |     N/A     |
+|Qwen3       |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README.md)|[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README_grpo.md) | Coming Soon |
+|QwQ         |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwq/README.md)| N/A | N/A |
+|Qwen2.5-VL  |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5_vl/README.md)| N/A | N/A |
+|Moonlight   |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README.md)|[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README_grpo.md)| N/A |
+|DeepSeek-V3 |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README.md)| N/A | N/A |
+|DeepSeek-R1 | N/A |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README_grpo.md)| Coming Soon |
+
+
 
 ## Pai-Megatron-Patch是什么
 [English](./README.md) | 简体中文
@@ -46,6 +40,7 @@ Pai-Megatron-Patch是各类开源大模型和Megatron训练加速引擎之间的
 - [阿里云PAI获得FewCLUE基于大模型的小样本学习双料冠军](https://developer.aliyun.com/article/788081?spm=a2c6h.12873639.article-detail.17.11c5383cHpFZks&tlog=yuekan_8)
 
 新功能：
+- **支持所有Mcore模型使用ChatLearn进行强化学习Context Parallel上下文并行降显存训练** [🔥🔥 2025.08.31]
 - **支持所有Mcore模型使用ChatLearn进行强化学习GSPO训练** [🔥🔥 2025.08.11]
 - **支持DeepSeek-V3-671B模型使用Mcore+ChatLearn进行强化学习GRPO训练** [🔥🔥 2025.07.31]
 - **支持Qwen3-235B模型使用Mcore+ChatLearn进行强化学习GRPO训练** [🔥🔥 2025.07.20]
diff --git a/backends/rl/ChatLearn b/backends/rl/ChatLearn
@@ -1 +1 @@
-Subproject commit 40967cf5d54abecbdb7d93619fd84566a29f01e5
+Subproject commit 45d023ca71d7812097288d6161be4c74ad51159b