fix: incorrect paths in document and bugs in dynamic scheduling (RLinf#303)

Lin-xs · web-flow · commit 57745af96519 · 2025-11-21T19:11:28.000+08:00
Signed-off-by: Lin-xs &lt;1833080950@qq.com&gt;
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -2,7 +2,7 @@
 /docker @andylin-hao
 
 /examples/embodiment @guozhen1997
-/examples/math @guozhen1997 @andylin-hao @Lin-xs
+/examples/reasoning @guozhen1997 @andylin-hao @Lin-xs
 
 /ray_utils @Lin-xs
 /requirements @andylin-hao
diff --git a/docs/source-en/rst_source/examples/reasoning.rst b/docs/source-en/rst_source/examples/reasoning.rst
@@ -102,8 +102,8 @@ Before launching, check the configuration file. Key fields include:
 
 Recommended configurations can be found in:
 
-- ``examples/math/config/qwen2.5-1.5b-grpo-megatron.yaml``  
-- ``examples/math/config/qwen2.5-7b-grpo-megatron.yaml``  
+- ``examples/reasoning/config/math/qwen2.5-1.5b-grpo-megatron.yaml``  
+- ``examples/reasoning/config/math/qwen2.5-7b-grpo-megatron.yaml``  
 
 **3. Launch Command**
 
@@ -118,7 +118,7 @@ Run the following commands to start the Ray cluster and begin training:
    if [ "$RANK" -eq 0 ]; then
        bash check_ray.sh 128; # set to number of accelerators/GPUs in the cluster
        cd /path_to_RLinf;
-       bash examples/math/qwen2.5/run_main_math_grpo_megatron.sh grpo-1.5b-megatron # change config file
+       bash examples/reasoning/run_main_grpo_math.sh qwen2.5-1.5b-grpo-megatron # change config file
    else
      if [ "$RANK" -eq 1 ]; then
          sleep 3m
diff --git a/docs/source-en/rst_source/start/distribute.rst b/docs/source-en/rst_source/start/distribute.rst
@@ -84,7 +84,7 @@ Edit the sample YAML:
 
 .. code-block:: yaml
 
-   # examples/math/config/qwen2.5-1.5b-grpo-megatron.yaml
+   # examples/reasoning/config/math/qwen2.5-1.5b-grpo-megatron.yaml
    cluster:
      num_nodes: 4          # adapt to your cluster
      component_placement:
@@ -94,19 +94,19 @@ Launch from the head node:
 
 .. code-block:: bash
 
-   bash examples/math/run_main_math_grpo_megatron.sh \
+   bash examples/reasoning/run_main_grpo_math.sh \
         qwen2.5-1.5b-grpo-megatron
 
 
 Disaggregated
 ^^^^^^^^^^^^^^^^^^
 
 Different stages receive disjoint GPU ranges,
-allowing fine-grained pipeliningng. Edit the pipeline YAML:
+allowing fine-grained pipelining. Edit the pipeline YAML:
 
 .. code-block:: yaml
 
-   # examples/math/config/qwen2.5-1.5b-grpo-megatron-pipeline.yaml
+   # examples/reasoning/config/math/qwen2.5-1.5b-grpo-megatron-pipeline.yaml
    cluster:
      num_nodes: 4
      component_placement:
@@ -122,5 +122,5 @@ Start the job:
 
 .. code-block:: bash
 
-   bash examples/math/run_main_math_pipeline_grpo_megatron.sh \
+   bash examples/reasoning/run_main_grpo_math.sh \
         qwen2.5-1.5b-grpo-megatron-pipeline
diff --git a/docs/source-en/rst_source/start/llm.rst b/docs/source-en/rst_source/start/llm.rst
@@ -49,7 +49,7 @@ Launch Training
 For user convenience, our configuration file is set up to run with a single GPU by default.  
 However, if you have multiple GPUs and wish to accelerate the quickstart process,  
 we highly recommend updating the following configuration option in  
-``./examples/math/config/qwen2.5-1.5b-single-gpu.yaml``:  
+``./examples/reasoning/config/math/qwen2.5-1.5b-single-gpu.yaml``:  
 ``cluster.component_placement``.
 
 
@@ -75,7 +75,7 @@ After these modifications, launch the following script to start training!
 
 .. code-block:: bash
 
-   bash examples/math/run_main_math_grpo_megatron.sh qwen2.5-1.5b-single-gpu
+   bash examples/reasoning/run_main_grpo_math.sh qwen2.5-1.5b-single-gpu
 
 **Step 3: View the results:**
 
diff --git a/docs/source-en/rst_source/tutorials/advance/version.rst b/docs/source-en/rst_source/tutorials/advance/version.rst
@@ -6,7 +6,7 @@ reinforcement-learning pipeline. For the current release **SGLang and vLLM** is
 
 .. note::
 
-   RLinf is compatible with **SGLang 0.4.4 → 0.4.9**, **vLLM 0.8.5  → 0.8.5.post1**.  
+   RLinf is compatible with **SGLang 0.4.4 → 0.5.2**, **vLLM 0.8.5  → 0.8.5.post1**.  
    No manual patching is required – the framework detects the installed
    version and loads the matching shim automatically.
 
@@ -35,7 +35,7 @@ Install via pip
    pip install sglang==0.4.8
 
    # Latest supported
-   pip install sglang==0.4.9
+   pip install sglang==0.5.2
 
    # Install vLLM
    pip install vllm==0.8.5
@@ -106,43 +106,3 @@ Install from Source
         cuda_graph_max_bs: 128 # the maximum batch size for cuda graph. If the batch size is larger than this, cuda graph will not be used.
 
     ...
-
-
-Internal Version Routing
-------------------------
-
-Directory layout::
-
-   rlinf/hybrid_engines/sglang/
-   ├── __init__.py               # Version detection and routing
-   ├── sglang_worker.py          # Main worker implementation
-   ├── sglang_0_4_4/             # SGLang 0.4.4 specific implementation
-   │   ├── __init__.py
-   │   ├── io_struct.py          # I/O structures for 0.4.4
-   │   ├── sgl_engine.py         # Engine implementation for 0.4.4
-   │   ├── sgl_scheduler.py      # Scheduler for 0.4.4
-   │   └── tokenizer_manager.py  # Tokenizer management for 0.4.4
-   └── sglang_0_4_x/             # Future version implementations
-       └── ...
-
-The loader in ``__init__.py`` resolves the installed package:
-
-.. code-block:: python
-
-   from importlib.metadata import PackageNotFoundError, version
-
-   def get_version(pkg):
-       try:
-           return version(pkg)
-       except PackageNotFoundError:
-           return None
-
-   package_name = "sglang"
-   package_version = get_version(package_name)
-   
-   if package_version == "0.4.4":
-       sglang_version = "0.4.4"
-       from .sglang_0_4_4 import io_struct
-       from .sglang_0_4_4.sgl_engine import Engine
-   else:
-       raise ValueError(f"sglang version {package_version} not supported")
diff --git a/docs/source-en/rst_source/tutorials/extend/new_model_megatron.rst b/docs/source-en/rst_source/tutorials/extend/new_model_megatron.rst
@@ -468,7 +468,7 @@ Below is an example YAML configuration file for the qwen2.5 model family.
 
 After adapting your new model, you can refer to this YAML configuration file and make appropriate modifications.
 
-**File:** ``examples/math/config/qwen2.5-1.5b-grpo-megatron.yaml``  
+**File:** ``examples/reasoning/config/math/qwen2.5-1.5b-grpo-megatron.yaml``  
 
 Set Megatron parameters used by RLinf.
 
diff --git a/docs/source-en/rst_source/tutorials/mode/collocated.rst b/docs/source-en/rst_source/tutorials/mode/collocated.rst
@@ -59,4 +59,4 @@ Given the above placement configuration, users can use proper `ComponentPlacemen
     )
 
 `ModelParallelComponentPlacement` supports two types of placement: collocated and disaggregated. More importantly, it deals with rank arrangement that allows efficient model weight update from training to rollout. It parses the configuration and generates placements for different components. The generated placement is then enforced during worker launching.
-Refer to `Math RL training python script <https://github.com/RLinf/RLinf/blob/main/examples/math/main_math.py>`_ for the complete code.
+Refer to `Math RL training python script <https://github.com/RLinf/RLinf/blob/main/examples/reasoning/main_grpo.py>`_ for the complete code.
diff --git a/docs/source-en/rst_source/tutorials/mode/disaggregated.rst b/docs/source-en/rst_source/tutorials/mode/disaggregated.rst
@@ -35,4 +35,4 @@ Currently, whether the execution is pipelined is decided by the underlying code
 
 **ComponentPlacement programming**
 
-As described in :doc:`collocated`, the placement configuration in the yaml file can be parsed by `ComponentPlacement` and enforced on workers. Refer to `Math RL training with pipelining <https://github.com/RLinf/RLinf/blob/main/examples/math/main_math_pipeline.py>`_ for the complete code.
+As described in :doc:`collocated`, the placement configuration in the yaml file can be parsed by `ComponentPlacement` and enforced on workers. Refer to `Math RL training with pipelining <https://github.com/RLinf/RLinf/blob/main/examples/reasoning/main_grpo.py>`_ for the complete code.
diff --git a/docs/source-en/rst_source/tutorials/scheduler/auto-placement.rst b/docs/source-en/rst_source/tutorials/scheduler/auto-placement.rst
@@ -45,7 +45,7 @@ Use the provided shell script to run the auto placement tool:
 
 .. code-block:: bash
 
-   cd examples/math
+   cd examples/reasoning
    ./run_placement_autotune.sh [config_name]
 
 Where ``config_name`` is the name of your configuration file.
diff --git a/docs/source-en/rst_source/tutorials/user/flow.rst b/docs/source-en/rst_source/tutorials/user/flow.rst
@@ -19,7 +19,7 @@ For example:
 - Configs for training a **VLA** agent in embodied tasks live under
   ``examples/embodiment/config``.
 - Configs for training an **LLM** on math reasoning live under
-  ``examples/math/config``.
+  ``examples/reasoning/config/math``.
 
 As a starting point, we recommend getting familiar with the YAML structure of
 these examples, then iterating toward your custom task. Key options include
diff --git a/docs/source-zh/rst_source/examples/reasoning.rst b/docs/source-zh/rst_source/examples/reasoning.rst
@@ -100,8 +100,8 @@ Math推理的强化学习训练
 
 推荐配置示例：  
 
-- ``examples/math/config/qwen2.5-1.5b-grpo-megatron.yaml``  
-- ``examples/math/config/qwen2.5-7b-grpo-megatron.yaml``  
+- ``examples/reasoning/config/math/qwen2.5-1.5b-grpo-megatron.yaml``  
+- ``examples/reasoning/config/math/qwen2.5-7b-grpo-megatron.yaml``  
 
 **3. 启动命令**
 
@@ -116,7 +116,7 @@ Math推理的强化学习训练
    if [ "$RANK" -eq 0 ]; then
        bash check_ray.sh 128;
        cd /path_to_RLinf;
-       bash examples/math/qwen2.5/run_main_math_grpo_megatron.sh grpo-1.5b-megatron # 修改配置文件
+       bash examples/reasoning/run_main_grpo_math.sh qwen2.5-1.5b-grpo-megatron # 修改配置文件
    else
      if [ "$RANK" -eq 1 ]; then
          sleep 3m
diff --git a/docs/source-zh/rst_source/start/distribute.rst b/docs/source-zh/rst_source/start/distribute.rst
@@ -79,7 +79,7 @@
 
 .. code-block:: yaml
 
-   # examples/math/config/qwen2.5-1.5b-grpo-megatron.yaml
+   # examples/reasoning/config/math/qwen2.5-1.5b-grpo-megatron.yaml
    cluster:
      num_nodes: 4          # 根据你的集群情况修改
      component_placement:
@@ -89,7 +89,7 @@
 
 .. code-block:: bash
 
-   bash examples/math/run_main_math_grpo_megatron.sh \
+   bash examples/reasoning/run_main_grpo_math.sh \
         qwen2.5-1.5b-grpo-megatron
 
 分离式模式
@@ -100,7 +100,7 @@
 
 .. code-block:: yaml
 
-   # examples/math/config/qwen2.5-1.5b-grpo-megatron-pipeline.yaml
+   # examples/reasoning/config/math/qwen2.5-1.5b-grpo-megatron-pipeline.yaml
    cluster:
      num_nodes: 4
      component_placement:
@@ -115,5 +115,5 @@
 
 .. code-block:: bash
 
-   bash examples/math/run_main_math_pipeline_grpo_megatron.sh \
+   bash examples/reasoning/run_main_grpo_math.sh \
         qwen2.5-1.5b-grpo-megatron-pipeline
diff --git a/docs/source-zh/rst_source/start/llm.rst b/docs/source-zh/rst_source/start/llm.rst
@@ -48,7 +48,7 @@
 为方便用户，我们提供的配置文件默认支持单卡训练。  
 如果你拥有多张 GPU 并希望加快训练过程，  
 我们推荐你修改配置文件  
-``./examples/math/config/qwen2.5-1.5b-single-gpu.yaml`` 中的参数 ``cluster.component_placement``。
+``./examples/reasoning/config/math/qwen2.5-1.5b-single-gpu.yaml`` 中的参数 ``cluster.component_placement``。
 
 你可以根据实际资源将该项设置为 **0-1**， **0-3** 或 **0-7**来使用 2/4/8 张 GPU。
 查看 :doc:`../tutorials/user/yaml` 以获取有关 Placement 配置的更详细说明。
@@ -72,7 +72,7 @@
 
 .. code-block:: bash
 
-   bash examples/math/run_main_math_grpo_megatron.sh qwen2.5-1.5b-single-gpu
+   bash examples/reasoning/run_main_grpo_math.sh qwen2.5-1.5b-single-gpu
 
 **步骤 3：查看训练结果**
 
diff --git a/docs/source-zh/rst_source/tutorials/advance/version.rst b/docs/source-zh/rst_source/tutorials/advance/version.rst
@@ -6,7 +6,7 @@ RLinf 可以将不同的 *generation backends* 接入其强化学习流水线。
 
 .. note::
 
-   RLinf 兼容 **SGLang 0.4.4 → 0.4.9**, **vLLM 0.8.5  → 0.8.5.post1**  
+   RLinf 兼容 **SGLang 0.4.4 → 0.5.2**, **vLLM 0.8.5  → 0.8.5.post1**  
    不需要手动打补丁 —— 框架会自动检测已安装的版本并加载匹配的 shim。  
 
 安装要求
@@ -34,7 +34,7 @@ RLinf 可以将不同的 *generation backends* 接入其强化学习流水线。
    pip install sglang==0.4.8
 
    # 最新支持版本
-   pip install sglang==0.4.9
+   pip install sglang==0.5.2
 
    # 安装vLLM
    pip install vllm==0.8.5
@@ -105,42 +105,3 @@ RLinf 可以将不同的 *generation backends* 接入其强化学习流水线。
 
     ...
 
-
-内部版本路由
-------------------------
-
-SGLang 目录结构::  
-
-   rlinf/hybrid_engines/sglang/
-   ├── __init__.py               # 版本检测与路由
-   ├── sglang_worker.py          # 主 Worker 实现
-   ├── sglang_0_4_4/             # SGLang 0.4.4 专用实现
-   │   ├── __init__.py
-   │   ├── io_struct.py          # 0.4.4 的 I/O 结构
-   │   ├── sgl_engine.py         # 0.4.4 的引擎实现
-   │   ├── sgl_scheduler.py      # 0.4.4 的调度器
-   │   └── tokenizer_manager.py  # 0.4.4 的分词器管理
-   └── sglang_0_4_x/             # 未来版本实现
-       └── ...
-
-``__init__.py`` 中的加载器会解析已安装的包版本：  
-
-.. code-block:: python
-
-   from importlib.metadata import PackageNotFoundError, version
-
-   def get_version(pkg):
-       try:
-           return version(pkg)
-       except PackageNotFoundError:
-           return None
-
-   package_name = "sglang"
-   package_version = get_version(package_name)
-   
-   if package_version == "0.4.4":
-       sglang_version = "0.4.4"
-       from .sglang_0_4_4 import io_struct
-       from .sglang_0_4_4.sgl_engine import Engine
-   else:
-       raise ValueError(f"sglang version {package_version} not supported")
diff --git a/docs/source-zh/rst_source/tutorials/extend/new_model_megatron.rst b/docs/source-zh/rst_source/tutorials/extend/new_model_megatron.rst
@@ -458,7 +458,7 @@ SglangActor 接收权重代码 ``rlinf/hybrid_engines/sglang/common/sgl_schedule
 
 您可以在适配好新模型后，参考这个 yaml 配置文件，进行相应的修改。
 
-**文件：** ``examples/math/config/qwen2.5-1.5b-grpo-megatron.yaml``
+**文件：** ``examples/reasoning/config/math/qwen2.5-1.5b-grpo-megatron.yaml``
 
 设置 RLinf 使用的 Megatron 参数。  
 
diff --git a/docs/source-zh/rst_source/tutorials/mode/collocated.rst b/docs/source-zh/rst_source/tutorials/mode/collocated.rst
@@ -64,4 +64,4 @@
 `ModelParallelComponentPlacement` 支持两种放置方式：共享式和分离式。
 更重要的是，它会处理 rank 的排列，从而实现从训练到 rollout 的高效模型权重更新。
 它会解析配置并为不同组件生成放置策略。生成的放置策略会在 Worker 启动时生效。
-完整代码请参考 `Math RL 训练代码 <https://github.com/RLinf/RLinf/blob/main/examples/math/main_math.py>`_。
+完整代码请参考 `Math RL 训练代码 <https://github.com/RLinf/RLinf/blob/main/examples/reasoning/main_grpo.py>`_。
diff --git a/docs/source-zh/rst_source/tutorials/mode/disaggregated.rst b/docs/source-zh/rst_source/tutorials/mode/disaggregated.rst
@@ -39,4 +39,4 @@ Worker 被分配到不同的 GPU 上。GPU 集合通过全局 GPU 索引指定
 
 如 :doc:`collocated` 中所描述，yaml 文件中的放置配置可以通过 `ComponentPlacement` 解析，  
 并应用到 Worker 上。完整代码请参考  
-`Math RL 细粒度流水训练代码 <https://github.com/RLinf/RLinf/blob/main/examples/math/main_math.py>`_。
+`Math RL 细粒度流水训练代码 <https://github.com/RLinf/RLinf/blob/main/examples/reasoning/main_grpo.py>`_。
diff --git a/docs/source-zh/rst_source/tutorials/scheduler/auto-placement.rst b/docs/source-zh/rst_source/tutorials/scheduler/auto-placement.rst
@@ -47,7 +47,7 @@ RL 训练前的自动放置
 
 .. code-block:: bash
 
-   cd examples/math
+   cd examples/reasoning
    ./run_placement_autotune.sh [config_name]
 
 其中 ``config_name`` 是你的配置文件名称。
diff --git a/docs/source-zh/rst_source/tutorials/user/flow.rst b/docs/source-zh/rst_source/tutorials/user/flow.rst
@@ -16,7 +16,7 @@ YAML 配置
 - 针对 **VLA** agent 的具身任务训练配置在  
   ``examples/embodiment/config``  
 - 针对数学推理的 **LLM** 模型训练配置在  
-  ``examples/math/config``  
+  ``examples/reasoning/config/math``  
 
 建议你先熟悉这些示例 YAML 的结构，然后逐步迭代以适配你的任务。关键选项包括（但不限于）：
 
diff --git a/rlinf/workers/rollout/sglang/sglang_worker.py b/rlinf/workers/rollout/sglang/sglang_worker.py
@@ -303,7 +303,8 @@ async def _async_generate_group(self, seq_group_info: SeqGroupInfo):
                         f"exceeding max_new_tokens={self._sampling_params['max_new_tokens']}, "
                         f"it will be truncatured."
                     )
-                    result = copy.deepcopy(seq_group_info.results[idx])
+                    result = seq_group_info.results[idx]
+                    seq_group_info.results[idx] = None
                     result["meta_info"]["finish_reason"]["type"] = "length"
                     seq_group_info.record_sglang_result(idx, result)
                     continue

Original file line number	Diff line number	Diff line change
@@ -59,4 +59,4 @@ Given the above placement configuration, users can use proper `ComponentPlacemen
`59`	`59`	`)`
`60`	`60`
`61`	`61`	`ModelParallelComponentPlacement` supports two types of placement: collocated and disaggregated. More importantly, it deals with rank arrangement that allows efficient model weight update from training to rollout. It parses the configuration and generates placements for different components. The generated placement is then enforced during worker launching.
`62`		-Refer to `Math RL training python script <https://github.com/RLinf/RLinf/blob/main/examples/math/main_math.py>`_ for the complete code.
	`62`	+Refer to `Math RL training python script <https://github.com/RLinf/RLinf/blob/main/examples/reasoning/main_grpo.py>`_ for the complete code.
Original file line number	Diff line number	Diff line change
`@@ -35,4 +35,4 @@ Currently, whether the execution is pipelined is decided by the underlying code`
`35`	`35`
`36`	`36`	`ComponentPlacement programming`
`37`	`37`
`38`		-As described in :doc:`collocated`, the placement configuration in the yaml file can be parsed by `ComponentPlacement` and enforced on workers. Refer to `Math RL training with pipelining <https://github.com/RLinf/RLinf/blob/main/examples/math/main_math_pipeline.py>`_ for the complete code.
	`38`	+As described in :doc:`collocated`, the placement configuration in the yaml file can be parsed by `ComponentPlacement` and enforced on workers. Refer to `Math RL training with pipelining <https://github.com/RLinf/RLinf/blob/main/examples/reasoning/main_grpo.py>`_ for the complete code.