andylin-hao
diff --git a/‎docs/source-en/rst_source/start/installation.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/source-en/rst_source/start/installation.rst‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source-en/rst_source/start/llm-eval.rst‎
Lines changed: 47 additions & 48 deletions b/‎docs/source-en/rst_source/start/llm-eval.rst‎
Lines changed: 47 additions & 48 deletions
diff --git a/‎docs/source-en/rst_source/start/llm.rst‎
Lines changed: 39 additions & 36 deletions b/‎docs/source-en/rst_source/start/llm.rst‎
Lines changed: 39 additions & 36 deletions
@@ -79,7 +79,7 @@ We provide two official Docker images optimized for different backend configurat
 
 - **Math reasoning with Megatron + SGLang/vLLM**:  
 
-  - ``rlinf/rlinf:math-rlinf0.1-torch2.5.1-sglang0.4.4-vllm0.7.1-megatron0.11.0-te2.1`` (used for enhancing LLM reasoning on MATH tasks)
+  - ``rlinf/rlinf:math-rlinf0.1-torch2.6.0-sglang0.4.6-vllm0.8.5-megatron0.13.0-te2.1`` (used for enhancing LLM reasoning on MATH tasks)
 
 - **Embodied with FSDP + Huggingface**:  
 
 
@@ -3,10 +3,12 @@ Evaluation 2: Reasoner Scenario
 
 Introduction
 ------------
-We provide an integrated evaluation toolkit for long chain-of-thought (CoT) mathematical reasoning.  
-The `toolkit <https://github.com/RLinf/LLMEvalKit>`_ includes both code and datasets, allowing researchers to benchmark trained LLMs on math-related reasoning tasks.  
 
-**Acknowledgements:** This evaluation toolkit is adapted from `Qwen2.5-Math <https://github.com/QwenLM/Qwen2.5-Math>`_.
+We provide an integrated evaluation toolkit for long chain-of-thought (CoT) mathematical reasoning tasks.  
+The `toolkit <https://github.com/RLinf/LLMEvalKit>`_ includes both code and datasets,  
+making it convenient for researchers to evaluate trained large language models on mathematical reasoning.
+
+**Acknowledgements:** This evaluation toolkit is adapted from the `Qwen2.5-Math <https://github.com/QwenLM/Qwen2.5-Math>`_ project.
 
 Environment Setup
 -----------------
@@ -16,7 +18,7 @@ First, clone the repository:
 
    git clone https://github.com/RLinf/LLMEvalKit.git 
 
-To use the package, install the required dependencies:
+Install dependencies:
 
 .. code-block:: bash
 
@@ -30,27 +32,25 @@ If you are using our Docker image, you only need to additionally install:
    pip install timeout-decorator
 
 Quick Start
------------
+-----------------
 
-Step 1: Convert Checkpoints
+Model Conversion
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+During training, models are saved in Megatron format. You can use the conversion scripts located at ``RLinf/toolkits/ckpt_convertor/`` to convert them to Huggingface format.
 
-Checkpoints saved during model training are in Megatron format. To facilitate evaluation, you can convert them to Huggingface format using the provided conversion scripts located in ``toolkits/ckpt_convertor/``.
-
-You have two options for using the scripts:
+You have two ways to use the scripts:
 
-**Method 1: Edit the Script Files**
+**Method 1: Edit the script files**
 
-Manually open either ``mg2hf_7b.sh`` or ``mg2hf_1.5b.sh`` and set the following variables to your desired locations.
+Manually open ``mg2hf_7b.sh`` or ``mg2hf_1.5b.sh``, and set the following variables to your desired paths.
 
-1. ``CKPT_PATH_MG``: Megatron checkpoint path, e.g., ``results/run_name/checkpoints/global_step_xx/actor/``;
-2. ``CKPT_PATH_HF``: Huggingface target path, any place you like;
-3. ``CKPT_PATH_ORIGINAL_HF``: the path to the base model checkpoint, e.g., ``/path/to/DeepSeek-R1-Distill-Qwen-1.5B``.
+1. ``CKPT_PATH_MG`` (Megatron checkpoint path, e.g., ``results/run_name/checkpoints/global_step_xx/actor/``), 
+2. ``CKPT_PATH_HF`` (Huggingface target path, any path), and
+3. ``CKPT_PATH_ORIGINAL_HF`` (base model checkpoint used for initializing training, e.g., ``/path/to/DeepSeek-R1-Distill-Qwen-1.5B``) 
 
-**Method 2: Command-Line Arguments**
-
-A more flexible approach is to pass the paths directly as command-line arguments.
+**Method 2: Command-line arguments**
 
+A more flexible approach is to pass paths directly through command-line arguments.
 .. code-block:: bash
 
    # For 1.5B models
@@ -59,24 +59,20 @@ A more flexible approach is to pass the paths directly as command-line arguments
    # For 7B models
    bash mg2hf_7b.sh /path/to/megatron_checkpoint /target/path/to/huggingface_checkpoint /path/to/base_model_checkpoint
 
-Step 2: Run Evaluation
+Run Evaluation Script
 ^^^^^^^^^^^^^^^^^^^^^^
 
-Once your checkpoints are converted, you can run evaluations.
-
-**Single Dataset Evaluation**
-
-To evaluate the model on a single dataset, use the following command. Make sure to replace the placeholder paths and variables with your own.
+If you want to run evaluation on a **single dataset**, you can execute the following command:
 
 .. code-block:: bash
 
-   MODEL_NAME_OR_PATH=/model/path  # replace with your model path
+   MODEL_NAME_OR_PATH=/model/path  # Replace with your model path
    OUTPUT_DIR=${MODEL_NAME_OR_PATH}/math_eval
    SPLIT="test"
    NUM_TEST_SAMPLE=-1
    export CUDA_VISIBLE_DEVICES="0"
 
-   DATA_NAME="aime24"  # options: aime24, aime25, gpqa_diamond
+   DATA_NAME="aime24"  # Options include: aime24, aime25, gpqa_diamond
    PROMPT_TYPE="r1-distilled-qwen"
    # NOTE:
    # for aime24 and aime25, use PROMPT_TYPE="r1-distilled-qwen";
@@ -93,25 +89,25 @@ To evaluate the model on a single dataset, use the following command. Make sure
        --use_vllm \
        --save_outputs
 
-**Batch Evaluation**
-
-For an automated batch evaluation on multiple datasets, use the ``main_eval.sh`` script. This will sequentially evaluate the model on the AIME24, AIME25, and GPQA-diamond datasets.
+For **batch evaluation**, you can run the ``main_eval.sh`` script. This script will sequentially evaluate the model on the AIME24, AIME25, and GPQA-diamond datasets.
 
 .. code-block:: bash
 
-   bash main_eval.sh /path/to/model_checkpoint
+   bash LLMEvalKit/evaluation/main_eval.sh /path/to/model_checkpoint
+
+You can specify ``CUDA_VISIBLE_DEVICES`` in the script for more flexible GPU management.  
+
 
-Note: you can manually change ``CUDA_VISIBLE_DEVICES`` within the ``main_eval.sh`` script to manage GPU usage flexibly. 
+Evaluation Results
+------------------------------
 
-Results
--------
-The results are printed to the console and stored in ``OUTPUT_DIR``.  
-Stored outputs include:
+Results will be printed in the terminal and saved in ``OUTPUT_DIR``. Batch evaluation defaults to saving in the ``LLMEvalKit/evaluation/outputs`` directory.  
+The results include:
 
-1. Metadata (``xx_metrics.json``): summary statistics.  
-2. Full model outputs (``xx.jsonl``): complete reasoning traces and predictions.  
+1. Metadata (``xx_metrics.json``): statistical summary  
+2. Complete model outputs (``xx.jsonl``): includes complete reasoning process and prediction results  
 
-Example Metadata:
+Metadata example:
 
 .. code-block:: javascript
 
@@ -125,9 +121,9 @@ Example Metadata:
        "time_use_in_minite": "62:06"
    }
 
-``acc`` reports the **average accuracy across all sampled responses**, which serves as the main evaluation metric.  
+The field ``acc`` represents the **average accuracy across all sampled responses**, which is the main evaluation metric.
 
-Example Model Output:
+Model output example:
 
 .. code-block:: javascript
 
@@ -144,8 +140,9 @@ Example Model Output:
       "score": [true] // whether the extracted answers are correct
    }
 
-Datasets
---------
+Supported Datasets
+------------------------------
+
 The toolkit currently supports the following evaluation datasets:
 
 .. list-table:: Supported Datasets
@@ -155,17 +152,19 @@ The toolkit currently supports the following evaluation datasets:
    * - Dataset
      - Description
    * - ``aime24``
-     - Problems from the **American Invitational Mathematics Examination (AIME) 2024**, focusing on high-school Olympiad-level mathematics reasoning.
+     - Problems from **AIME 2024** (American Invitational Mathematics Examination), focusing on high-school Olympiad-level mathematical reasoning.
    * - ``aime25``
-     - Problems from the **AIME 2025**, same format as AIME24 but with different test set.
+     - Problems from **AIME 2025**, same format as AIME24 but with a different test set.
    * - ``gpqa_diamond``
-     - A subset of **GPQA (Graduate-level Google-Proof Q&A)** with the most challenging questions (Diamond split). Covers multi-disciplinary topics (e.g., mathematics, physics, computer science) requiring deep reasoning beyond memorization.
+     - The most challenging subset (Diamond split) of **GPQA (Graduate-level Google-Proof Q&A)**,  
+       containing cross-disciplinary problems (e.g., mathematics, physics, computer science) that require deep reasoning capabilities rather than memorization.
+
+Parameter Configuration
+------------------------------
 
-Configuration
--------------
-The main configurable parameters are:
+The main configurable parameters are as follows:
 
-.. list-table:: Configuration Parameters
+.. list-table:: Configuration Parameter Description
    :header-rows: 1
    :widths: 20 80
 
 
@@ -1,18 +1,18 @@
 Quickstart 2: GRPO Training of LLMs on MATH
 ==============================================
 
-This quick-start walks you through training
-`DeepSeek-R1-Distill-Qwen-1.5B <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B>`_
-on the
+This quick-start tutorial will guide you through training the
+`DeepSeek-R1-Distill-Qwen-1.5B <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B>`_ model
+on the math reasoning dataset
 `AReaL-boba <https://huggingface.co/datasets/inclusionAI/AReaL-boba-Data>`_
-math-reasoning dataset with **RLinf**.  
-For maximum simplicity, you can run the following scripts within a single GPU.
+using **RLinf**.
+
+To simplify the process, you can directly run the following scripts on a single GPU to complete the training.
 
 Dataset Introduction
 --------------------
 
-*AReaL-boba* covers a broad spectrum of mathematical and logical
-problems. A example is shown below.
+*AReaL-boba* covers a variety of mathematical and logical reasoning problems. Below is an example:
 
 .. code-block:: text
 
@@ -30,9 +30,9 @@ problems. A example is shown below.
    [ "\\boxed{e}" ]
 
 Launch Training
------------------
+--------------------
 
-**Step 1: Download the model and the datasets:**
+**Step 1: Download the model and dataset**
 
 .. code-block:: bash
 
@@ -44,57 +44,60 @@ Launch Training
    hf download inclusionAI/AReaL-boba-Data --repo-type=dataset \
    --local-dir /path/to/dataset/boba
 
-**Step 2: Execute the provided launch script:**
-
-For user convenience, our configuration file is set up to run with a single GPU by default.  
-However, if you have multiple GPUs and wish to accelerate the quickstart process,  
-we highly recommend updating the following configuration option in  
-``./examples/reasoning/config/math/qwen2.5-1.5b-single-gpu.yaml``:  
-``cluster.component_placement``.
-
+**Step 2: Modify the configuration file**
 
-You can set it to **0-1**, **0-3** or  **0-7** to use 2/4/8 GPUs depending on your available resources.
-Refer to :doc:`../tutorials/user/yaml` for a more detailed explanation of the placement configuration.
 
-.. code-block:: yaml
+Before running the script, please modify the ``./examples/reasoning/config/math/qwen2.5-1.5b-single-gpu.yaml`` file
+according to your model and dataset download paths.
 
-   cluster:
-     num_nodes: 1
-     component_placement:
-        actor,rollout: 0
+Specifically, set the model configuration to the path where the ``DeepSeek-R1-Distill-Qwen-1.5B`` checkpoint is located, and set the data configuration to the path where the ``AReaL-boba-106k.jsonl`` dataset is located.
 
-Finally, before running the script, you need to modify the corresponding configuration options in the YAML file according to the download paths of the model and dataset. Specifically, update:
-
-- ``rollout.model.model_path``
+- ``rollout.model.model_path``  
 - ``data.train_data_paths``
 - ``data.val_data_paths``
 - ``actor.tokenizer.tokenizer_model``
 
-After these modifications, launch the following script to start training!
+**Step 3: Launch training**
 
+After completing the above modifications, run the following script to launch training:
 
 .. code-block:: bash
 
    bash examples/reasoning/run_main_grpo_math.sh qwen2.5-1.5b-single-gpu
 
-**Step 3: View the results:**
 
-* Final checkpoints & metrics: ``../results``
+View Training Results
+--------------------------------
 
-* TensorBoard summaries: ``../results/grpo-1.5b/tensorboard/``  
-  Launch with:
+- Final model and metrics files are located at: ``../results``  
+- TensorBoard logs are located at: ``../results/grpo-1.5b/tensorboard/``  
+  Launch as follows:
 
   .. code-block:: bash
 
-     tensorboard --logdir ../results/grpo-1.5b/tensorboard/ --port 6006
+     tensorboard --host 0.0.0.0 --logdir ../results/grpo-1.5b/tensorboard/
 
+After opening TensorBoard, you will see the following interface:  
+Recommended key metrics to focus on include:
 
-Open TensorBoard, and you should see an interface similar to the one below.  
-Key metrics to pay attention to include  
-``rollout/response_length`` and ``rollout/reward_scores``.  
+- ``rollout/response_length``  
+- ``rollout/reward_scores``  
 
 .. raw:: html
 
    <img src="https://github.com/RLinf/misc/raw/main/pic/math-quickstart-metric.jpg" width="800"/>
 
+.. note::
+   For user convenience, the configuration file we provide supports single GPU training by default.  
+   If you have multiple GPUs and wish to speed up the training process,  
+   we recommend that you modify the parameter ``cluster.component_placement`` in the configuration file.
+
+   You can set this item to **0-1**, **0-3** or **0-7** to use 2/4/8 GPUs depending on your actual resources.
+   See :doc:`../tutorials/user/yaml` for more detailed instructions on Placement configuration.
+
+   .. code-block:: yaml
 
+      cluster:
+      num_nodes: 1
+      component_placement:
+         actor,rollout,reward: 0-3