Skip to content

Commit 30cbc31

Browse files
authored
docs(quickstart): modified quickStart documentation (RLinf#430)
* docs modifed --------- Signed-off-by: liyanghao <liyanghao@infini-ai.com>
1 parent ce7d13f commit 30cbc31

File tree

8 files changed

+227
-187
lines changed

8 files changed

+227
-187
lines changed

docs/source-en/rst_source/start/installation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ We provide two official Docker images optimized for different backend configurat
7979

8080
- **Math reasoning with Megatron + SGLang/vLLM**:
8181

82-
- ``rlinf/rlinf:math-rlinf0.1-torch2.5.1-sglang0.4.4-vllm0.7.1-megatron0.11.0-te2.1`` (used for enhancing LLM reasoning on MATH tasks)
82+
- ``rlinf/rlinf:math-rlinf0.1-torch2.6.0-sglang0.4.6-vllm0.8.5-megatron0.13.0-te2.1`` (used for enhancing LLM reasoning on MATH tasks)
8383

8484
- **Embodied with FSDP + Huggingface**:
8585

docs/source-en/rst_source/start/llm-eval.rst

Lines changed: 47 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,12 @@ Evaluation 2: Reasoner Scenario
33

44
Introduction
55
------------
6-
We provide an integrated evaluation toolkit for long chain-of-thought (CoT) mathematical reasoning.
7-
The `toolkit <https://github.com/RLinf/LLMEvalKit>`_ includes both code and datasets, allowing researchers to benchmark trained LLMs on math-related reasoning tasks.
86

9-
**Acknowledgements:** This evaluation toolkit is adapted from `Qwen2.5-Math <https://github.com/QwenLM/Qwen2.5-Math>`_.
7+
We provide an integrated evaluation toolkit for long chain-of-thought (CoT) mathematical reasoning tasks.
8+
The `toolkit <https://github.com/RLinf/LLMEvalKit>`_ includes both code and datasets,
9+
making it convenient for researchers to evaluate trained large language models on mathematical reasoning.
10+
11+
**Acknowledgements:** This evaluation toolkit is adapted from the `Qwen2.5-Math <https://github.com/QwenLM/Qwen2.5-Math>`_ project.
1012

1113
Environment Setup
1214
-----------------
@@ -16,7 +18,7 @@ First, clone the repository:
1618
1719
git clone https://github.com/RLinf/LLMEvalKit.git
1820
19-
To use the package, install the required dependencies:
21+
Install dependencies:
2022

2123
.. code-block:: bash
2224
@@ -30,27 +32,25 @@ If you are using our Docker image, you only need to additionally install:
3032
pip install timeout-decorator
3133
3234
Quick Start
33-
-----------
35+
-----------------
3436

35-
Step 1: Convert Checkpoints
37+
Model Conversion
3638
^^^^^^^^^^^^^^^^^^^^^^^^^^^
39+
During training, models are saved in Megatron format. You can use the conversion scripts located at ``RLinf/toolkits/ckpt_convertor/`` to convert them to Huggingface format.
3740

38-
Checkpoints saved during model training are in Megatron format. To facilitate evaluation, you can convert them to Huggingface format using the provided conversion scripts located in ``toolkits/ckpt_convertor/``.
39-
40-
You have two options for using the scripts:
41+
You have two ways to use the scripts:
4142

42-
**Method 1: Edit the Script Files**
43+
**Method 1: Edit the script files**
4344

44-
Manually open either ``mg2hf_7b.sh`` or ``mg2hf_1.5b.sh`` and set the following variables to your desired locations.
45+
Manually open ``mg2hf_7b.sh`` or ``mg2hf_1.5b.sh``, and set the following variables to your desired paths.
4546

46-
1. ``CKPT_PATH_MG``: Megatron checkpoint path, e.g., ``results/run_name/checkpoints/global_step_xx/actor/``;
47-
2. ``CKPT_PATH_HF``: Huggingface target path, any place you like;
48-
3. ``CKPT_PATH_ORIGINAL_HF``: the path to the base model checkpoint, e.g., ``/path/to/DeepSeek-R1-Distill-Qwen-1.5B``.
47+
1. ``CKPT_PATH_MG`` (Megatron checkpoint path, e.g., ``results/run_name/checkpoints/global_step_xx/actor/``),
48+
2. ``CKPT_PATH_HF`` (Huggingface target path, any path), and
49+
3. ``CKPT_PATH_ORIGINAL_HF`` (base model checkpoint used for initializing training, e.g., ``/path/to/DeepSeek-R1-Distill-Qwen-1.5B``)
4950

50-
**Method 2: Command-Line Arguments**
51-
52-
A more flexible approach is to pass the paths directly as command-line arguments.
51+
**Method 2: Command-line arguments**
5352

53+
A more flexible approach is to pass paths directly through command-line arguments.
5454
.. code-block:: bash
5555
5656
# For 1.5B models
@@ -59,24 +59,20 @@ A more flexible approach is to pass the paths directly as command-line arguments
5959
# For 7B models
6060
bash mg2hf_7b.sh /path/to/megatron_checkpoint /target/path/to/huggingface_checkpoint /path/to/base_model_checkpoint
6161
62-
Step 2: Run Evaluation
62+
Run Evaluation Script
6363
^^^^^^^^^^^^^^^^^^^^^^
6464

65-
Once your checkpoints are converted, you can run evaluations.
66-
67-
**Single Dataset Evaluation**
68-
69-
To evaluate the model on a single dataset, use the following command. Make sure to replace the placeholder paths and variables with your own.
65+
If you want to run evaluation on a **single dataset**, you can execute the following command:
7066

7167
.. code-block:: bash
7268
73-
MODEL_NAME_OR_PATH=/model/path # replace with your model path
69+
MODEL_NAME_OR_PATH=/model/path # Replace with your model path
7470
OUTPUT_DIR=${MODEL_NAME_OR_PATH}/math_eval
7571
SPLIT="test"
7672
NUM_TEST_SAMPLE=-1
7773
export CUDA_VISIBLE_DEVICES="0"
7874
79-
DATA_NAME="aime24" # options: aime24, aime25, gpqa_diamond
75+
DATA_NAME="aime24" # Options include: aime24, aime25, gpqa_diamond
8076
PROMPT_TYPE="r1-distilled-qwen"
8177
# NOTE:
8278
# for aime24 and aime25, use PROMPT_TYPE="r1-distilled-qwen";
@@ -93,25 +89,25 @@ To evaluate the model on a single dataset, use the following command. Make sure
9389
--use_vllm \
9490
--save_outputs
9591
96-
**Batch Evaluation**
97-
98-
For an automated batch evaluation on multiple datasets, use the ``main_eval.sh`` script. This will sequentially evaluate the model on the AIME24, AIME25, and GPQA-diamond datasets.
92+
For **batch evaluation**, you can run the ``main_eval.sh`` script. This script will sequentially evaluate the model on the AIME24, AIME25, and GPQA-diamond datasets.
9993

10094
.. code-block:: bash
10195
102-
bash main_eval.sh /path/to/model_checkpoint
96+
bash LLMEvalKit/evaluation/main_eval.sh /path/to/model_checkpoint
97+
98+
You can specify ``CUDA_VISIBLE_DEVICES`` in the script for more flexible GPU management.
99+
103100

104-
Note: you can manually change ``CUDA_VISIBLE_DEVICES`` within the ``main_eval.sh`` script to manage GPU usage flexibly.
101+
Evaluation Results
102+
------------------------------
105103

106-
Results
107-
-------
108-
The results are printed to the console and stored in ``OUTPUT_DIR``.
109-
Stored outputs include:
104+
Results will be printed in the terminal and saved in ``OUTPUT_DIR``. Batch evaluation defaults to saving in the ``LLMEvalKit/evaluation/outputs`` directory.
105+
The results include:
110106

111-
1. Metadata (``xx_metrics.json``): summary statistics.
112-
2. Full model outputs (``xx.jsonl``): complete reasoning traces and predictions.
107+
1. Metadata (``xx_metrics.json``): statistical summary
108+
2. Complete model outputs (``xx.jsonl``): includes complete reasoning process and prediction results
113109

114-
Example Metadata:
110+
Metadata example:
115111

116112
.. code-block:: javascript
117113
@@ -125,9 +121,9 @@ Example Metadata:
125121
"time_use_in_minite": "62:06"
126122
}
127123
128-
``acc`` reports the **average accuracy across all sampled responses**, which serves as the main evaluation metric.
124+
The field ``acc`` represents the **average accuracy across all sampled responses**, which is the main evaluation metric.
129125

130-
Example Model Output:
126+
Model output example:
131127

132128
.. code-block:: javascript
133129
@@ -144,8 +140,9 @@ Example Model Output:
144140
"score": [true] // whether the extracted answers are correct
145141
}
146142
147-
Datasets
148-
--------
143+
Supported Datasets
144+
------------------------------
145+
149146
The toolkit currently supports the following evaluation datasets:
150147

151148
.. list-table:: Supported Datasets
@@ -155,17 +152,19 @@ The toolkit currently supports the following evaluation datasets:
155152
* - Dataset
156153
- Description
157154
* - ``aime24``
158-
- Problems from the **American Invitational Mathematics Examination (AIME) 2024**, focusing on high-school Olympiad-level mathematics reasoning.
155+
- Problems from **AIME 2024** (American Invitational Mathematics Examination), focusing on high-school Olympiad-level mathematical reasoning.
159156
* - ``aime25``
160-
- Problems from the **AIME 2025**, same format as AIME24 but with different test set.
157+
- Problems from **AIME 2025**, same format as AIME24 but with a different test set.
161158
* - ``gpqa_diamond``
162-
- A subset of **GPQA (Graduate-level Google-Proof Q&A)** with the most challenging questions (Diamond split). Covers multi-disciplinary topics (e.g., mathematics, physics, computer science) requiring deep reasoning beyond memorization.
159+
- The most challenging subset (Diamond split) of **GPQA (Graduate-level Google-Proof Q&A)**,
160+
containing cross-disciplinary problems (e.g., mathematics, physics, computer science) that require deep reasoning capabilities rather than memorization.
161+
162+
Parameter Configuration
163+
------------------------------
163164

164-
Configuration
165-
-------------
166-
The main configurable parameters are:
165+
The main configurable parameters are as follows:
167166

168-
.. list-table:: Configuration Parameters
167+
.. list-table:: Configuration Parameter Description
169168
:header-rows: 1
170169
:widths: 20 80
171170

Lines changed: 39 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
Quickstart 2: GRPO Training of LLMs on MATH
22
==============================================
33

4-
This quick-start walks you through training
5-
`DeepSeek-R1-Distill-Qwen-1.5B <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B>`_
6-
on the
4+
This quick-start tutorial will guide you through training the
5+
`DeepSeek-R1-Distill-Qwen-1.5B <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B>`_ model
6+
on the math reasoning dataset
77
`AReaL-boba <https://huggingface.co/datasets/inclusionAI/AReaL-boba-Data>`_
8-
math-reasoning dataset with **RLinf**.
9-
For maximum simplicity, you can run the following scripts within a single GPU.
8+
using **RLinf**.
9+
10+
To simplify the process, you can directly run the following scripts on a single GPU to complete the training.
1011

1112
Dataset Introduction
1213
--------------------
1314

14-
*AReaL-boba* covers a broad spectrum of mathematical and logical
15-
problems. A example is shown below.
15+
*AReaL-boba* covers a variety of mathematical and logical reasoning problems. Below is an example:
1616

1717
.. code-block:: text
1818
@@ -30,9 +30,9 @@ problems. A example is shown below.
3030
[ "\\boxed{e}" ]
3131
3232
Launch Training
33-
-----------------
33+
--------------------
3434

35-
**Step 1: Download the model and the datasets:**
35+
**Step 1: Download the model and dataset**
3636

3737
.. code-block:: bash
3838
@@ -44,57 +44,60 @@ Launch Training
4444
hf download inclusionAI/AReaL-boba-Data --repo-type=dataset \
4545
--local-dir /path/to/dataset/boba
4646
47-
**Step 2: Execute the provided launch script:**
48-
49-
For user convenience, our configuration file is set up to run with a single GPU by default.
50-
However, if you have multiple GPUs and wish to accelerate the quickstart process,
51-
we highly recommend updating the following configuration option in
52-
``./examples/reasoning/config/math/qwen2.5-1.5b-single-gpu.yaml``:
53-
``cluster.component_placement``.
54-
47+
**Step 2: Modify the configuration file**
5548

56-
You can set it to **0-1**, **0-3** or **0-7** to use 2/4/8 GPUs depending on your available resources.
57-
Refer to :doc:`../tutorials/user/yaml` for a more detailed explanation of the placement configuration.
5849

59-
.. code-block:: yaml
50+
Before running the script, please modify the ``./examples/reasoning/config/math/qwen2.5-1.5b-single-gpu.yaml`` file
51+
according to your model and dataset download paths.
6052

61-
cluster:
62-
num_nodes: 1
63-
component_placement:
64-
actor,rollout: 0
53+
Specifically, set the model configuration to the path where the ``DeepSeek-R1-Distill-Qwen-1.5B`` checkpoint is located, and set the data configuration to the path where the ``AReaL-boba-106k.jsonl`` dataset is located.
6554

66-
Finally, before running the script, you need to modify the corresponding configuration options in the YAML file according to the download paths of the model and dataset. Specifically, update:
67-
68-
- ``rollout.model.model_path``
55+
- ``rollout.model.model_path``
6956
- ``data.train_data_paths``
7057
- ``data.val_data_paths``
7158
- ``actor.tokenizer.tokenizer_model``
7259

73-
After these modifications, launch the following script to start training!
60+
**Step 3: Launch training**
7461

62+
After completing the above modifications, run the following script to launch training:
7563

7664
.. code-block:: bash
7765
7866
bash examples/reasoning/run_main_grpo_math.sh qwen2.5-1.5b-single-gpu
7967
80-
**Step 3: View the results:**
8168
82-
* Final checkpoints & metrics: ``../results``
69+
View Training Results
70+
--------------------------------
8371

84-
* TensorBoard summaries: ``../results/grpo-1.5b/tensorboard/``
85-
Launch with:
72+
- Final model and metrics files are located at: ``../results``
73+
- TensorBoard logs are located at: ``../results/grpo-1.5b/tensorboard/``
74+
Launch as follows:
8675

8776
.. code-block:: bash
8877
89-
tensorboard --logdir ../results/grpo-1.5b/tensorboard/ --port 6006
78+
tensorboard --host 0.0.0.0 --logdir ../results/grpo-1.5b/tensorboard/
9079
80+
After opening TensorBoard, you will see the following interface:
81+
Recommended key metrics to focus on include:
9182

92-
Open TensorBoard, and you should see an interface similar to the one below.
93-
Key metrics to pay attention to include
94-
``rollout/response_length`` and ``rollout/reward_scores``.
83+
- ``rollout/response_length``
84+
- ``rollout/reward_scores``
9585

9686
.. raw:: html
9787

9888
<img src="https://github.com/RLinf/misc/raw/main/pic/math-quickstart-metric.jpg" width="800"/>
9989

90+
.. note::
91+
For user convenience, the configuration file we provide supports single GPU training by default.
92+
If you have multiple GPUs and wish to speed up the training process,
93+
we recommend that you modify the parameter ``cluster.component_placement`` in the configuration file.
94+
95+
You can set this item to **0-1**, **0-3** or **0-7** to use 2/4/8 GPUs depending on your actual resources.
96+
See :doc:`../tutorials/user/yaml` for more detailed instructions on Placement configuration.
97+
98+
.. code-block:: yaml
10099
100+
cluster:
101+
num_nodes: 1
102+
component_placement:
103+
actor,rollout,reward: 0-3

0 commit comments

Comments
 (0)