@@ -69,31 +69,6 @@ state-of-the-art 7B and 32B models for mathematical reasoning. Check out our
6969
7070</details >
7171
72- ## 🚀 Getting Started
73-
74- Our training scripts automatically download the required dataset (openai/gsm8k) and
75- model (Qwen/Qwen2-1.5B-Instruct). To run on a single node:
76-
77- ``` bash
78- python3 -m areal.launcher.local \
79- examples/math/gsm8k_grpo.py \
80- --config examples/math/gsm8k_grpo.yaml
81- ```
82-
83- To run on a Ray cluster with 2 nodes and 8 GPUs per node (remember to update paths in
84- the YAML file to point to your shared storage):
85-
86- ``` bash
87- python3 -m areal.launcher.ray \
88- examples/math/gsm8k_grpo.py \
89- --config examples/math/gsm8k_grpo.yaml \
90- cluster.n_nodes=2 \
91- cluster.n_gpus_per_node=8
92- ```
93-
94- For comprehensive setup instructions, see
95- [ our quickstart guide] ( https://inclusionai.github.io/AReaL/tutorial/quickstart.html ) .
96-
9772## 📚 Examples
9873
9974| Task | Description | Performance |
@@ -108,15 +83,19 @@ For comprehensive setup instructions, see
10883
10984## 🔧 Support Matrix
11085
111- ### Algorithms
86+ ### 🧠 Algorithms
11287
113- - ** [ GRPO] ( docs/algorithms/grpo.md ) **
114- - ** [ DAPO] ( docs/algorithms/dapo.md ) **
115- - ** [ LitePPO] ( docs/algorithms/litePPO.md ) **
116- - ** [ DrGRPO] ( docs/algorithms/dr.GRPO.md ) **
117- - ** [ RLHF Reward Modeling] ( examples/alignment/ ) **
118- - ** [ PPO] ( examples/math/gsm8k_ppo.py ) **
119- - ** [ SFT] ( examples/math/gsm8k_sft.py ) **
88+ | Algorithm | Documentation | Paper | Configuration |
89+ | ------------------------ | ------------------------------------- | ---------------------------------------------- | ------------------------------------------------------------ |
90+ | ** GRPO** | [ 📖 Docs] ( docs/algorithms/grpo.md ) | [ 📄 Paper] ( https://arxiv.org/pdf/2402.03300 ) | [ 🔗 GSM8K Example] ( examples/math/gsm8k_grpo.yaml ) |
91+ | ** PPO** | - | [ 📄 Paper] ( https://arxiv.org/pdf/2203.02155 ) | [ 🔗 GSM8K Example] ( examples/math/gsm8k_ppo.yaml ) |
92+ | ** DAPO** | [ 📖 Docs] ( docs/algorithms/dapo.md ) | [ 📄 Paper] ( https://arxiv.org/abs/2503.14476 ) | [ 🔗 GSM8K Example] ( examples/experimental/dapo/gsm8k_dapo.py ) |
93+ | ** LitePPO** | [ 📖 Docs] ( docs/algorithms/litePPO.md ) | [ 📄 Paper] ( https://arxiv.org/abs/2508.08221 ) | - |
94+ | ** Dr.GRPO** | [ 📖 Docs] ( docs/algorithms/dr.GRPO.md ) | [ 📄 Paper] ( https://arxiv.org/abs/2503.20783 ) | - |
95+ | ** REINFORCE++** | - | [ 📄 Paper] ( https://arxiv.org/pdf/2501.03262 ) | [ 🔗 GSM8K Example] ( examples/math/gsm8k_reinforce.yaml ) |
96+ | ** RLOO** | [ 📖 Docs] ( docs/algorithms/rloo.md ) | [ 📄 Paper] ( https://arxiv.org/pdf/2402.14740v1 ) | [ 🔗 GSM8K Example] ( examples/math/gsm8k_rloo.yaml ) |
97+ | ** RLHF Reward Modeling** | - | - | [ 🔗 RLHF Example] ( examples/alignment/ ) |
98+ | ** SFT** | - | - | [ 🔗 GSM8K Example] ( examples/math/gsm8k_sft.py ) |
12099
121100### Models
122101
@@ -142,16 +121,39 @@ For comprehensive setup instructions, see
142121| ** vLLM** | ✅ | ❓ | ❓ | ❓ | ❓ |
143122| ** SGLang** | ✅ | ❌ | ❌ | ✅ | ✅ |
144123
145- ## 📖 Resources
124+ ## 🚀 Getting Started
146125
147- - [ Documentation] ( https://inclusionai.github.io/AReaL/ )
148- - [ CLI Configurations] ( https://inclusionai.github.io/AReaL/cli_reference.html )
149- - [ Contributing] ( https://inclusionai.github.io/AReaL/contrib.html )
126+ Our training scripts automatically download the required dataset (openai/gsm8k) and
127+ model (Qwen/Qwen2-1.5B-Instruct). To run on a single node:
128+
129+ ``` bash
130+ python3 -m areal.launcher.local \
131+ examples/math/gsm8k_grpo.py \
132+ --config examples/math/gsm8k_grpo.yaml
133+ ```
134+
135+ To run on a Ray cluster with 2 nodes and 8 GPUs per node (remember to update paths in
136+ the YAML file to point to your shared storage):
137+
138+ ``` bash
139+ python3 -m areal.launcher.ray \
140+ examples/math/gsm8k_grpo.py \
141+ --config examples/math/gsm8k_grpo.yaml \
142+ cluster.n_nodes=2 \
143+ cluster.n_gpus_per_node=8
144+ ```
145+
146+ For comprehensive setup instructions, see
147+ [ our quickstart guide] ( https://inclusionai.github.io/AReaL/tutorial/quickstart.html ) .
150148
151- ### Quickstart
149+ ## 📖 Resources
152150
153151- [ Installation] ( https://inclusionai.github.io/AReaL/tutorial/installation.html )
154152- [ Quickstart] ( https://inclusionai.github.io/AReaL/tutorial/quickstart.html )
153+ - [ CLI Configurations] ( https://inclusionai.github.io/AReaL/cli_reference.html )
154+ - [ Debugging Best Practices] ( https://inclusionai.github.io/AReaL/best_practices/debugging.html )
155+ - [ Handling OOM Issues] ( https://inclusionai.github.io/AReaL/best_practices/handling_oom.html )
156+ - [ Contributing] ( https://inclusionai.github.io/AReaL/contrib.html )
155157
156158### Code Walkthrough
157159
@@ -163,11 +165,6 @@ For comprehensive setup instructions, see
163165- [ Customize Agentic/RVLR rollout workflows with AReaL-lite] ( https://inclusionai.github.io/AReaL/customization/agent.html )
164166- [ Customize algorithms with AReaL-lite] ( https://inclusionai.github.io/AReaL/customization/algorithm.html )
165167
166- ### Advanced Usage
167-
168- - [ Debugging Best Practices] ( https://inclusionai.github.io/AReaL/best_practices/debugging.html )
169- - [ Handling OOM Issues] ( https://inclusionai.github.io/AReaL/best_practices/handling_oom.html )
170-
171168## 🗺️ Future Roadmap
172169
173170- [ 2025 Q3 Roadmap] ( https://github.com/inclusionAI/AReaL/issues/257 )
0 commit comments