Skip to content

Commit ffc6cac

Browse files
authored
[Doc] add a new page with example list from the dataset perspective (#434)
1 parent dd5136a commit ffc6cac

File tree

4 files changed

+96
-0
lines changed

4 files changed

+96
-0
lines changed

docs/sphinx_doc/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ Welcome to Trinity-RFT's documentation!
4444
tutorial/example_dpo.md
4545
tutorial/example_megatron.md
4646
tutorial/example_data_functionalities.md
47+
tutorial/example_dataset_perspective.md
4748

4849
.. toctree::
4950
:maxdepth: 2
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Example Summary
2+
3+
> From the Dataset Perspective
4+
5+
This guide provides an example list from the dataset perspective, where you can find out what datasets the examples have covered easily.
6+
7+
| Dataset | Algorithm | Use Case | References |
8+
|--------------------------------------------------------------------------------------------------------------| --- |----------------------------------------------------------------------------------------| --- |
9+
| [openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k) | GRPO | Regular RFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html) |
10+
| | GRPO | Asynchronous training | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/async_gsm8k), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_async_mode.html) |
11+
| | Multi-Step GRPO | AgentScope ReAct agent training | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_react), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html) |
12+
| | AsymRE | Regular RFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k) |
13+
| | CISPO | Regular RFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/cispo_gsm8k) |
14+
| | GRPO | Training with prioritized tasks | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_task_pipeline), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html#example-data-processor-for-task-pipeline) |
15+
| | GRPO | Training with reward reshaping on experiences | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_experience_pipeline), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html#example-data-processor-for-experience-pipeline) |
16+
| | GRPO | Training with RULER (Relative Universal LLM-Elicited Rewards) | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler) |
17+
| | GRPO | Training a policy model as its own reward model | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler) |
18+
| | GRPO | Training using LoRA | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_lora_gsm8k) |
19+
| | OPMD | Off-policy RFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/opmd_gsm8k), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_advanced.html) |
20+
| | REC | Training with group-relative reinforce variants | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) |
21+
| | sPPO | Training with sPPO algorithm | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k) |
22+
| | TOPR | Tapered off-policy RFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k) |
23+
| Math category tasks | GRPO | Training with rewards from RM-Gallery | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_math) |
24+
| | AsymRE | Regular RFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_math) |
25+
| | MIX | Training with "expert" data generated by a more advanced LLM | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_math), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) |
26+
| [ALFWorld](https://github.com/alfworld/alfworld) | GRPO | Concatenated multi-turn RFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_alfworld), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html) |
27+
| | Multi-Step GRPO | General multi-step RFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_alfworld_general_multi_step), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html) |
28+
| [SciWorld](https://github.com/allenai/ScienceWorld) | GRPO | Concatenated multi-turn RFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_sciworld) |
29+
| [WebShop](https://github.com/princeton-nlp/WebShop) | GRPO | Concatenated multi-turn RFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_webshop), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html) |
30+
| [callanwu/WebWalkerQA](https://huggingface.co/datasets/callanwu/WebWalkerQA) | Multi-Step GRPO | Multi-turn web search agent training | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
31+
| [corbt/enron-emails](https://huggingface.co/datasets/corbt/enron-emails) | Multi-Step GRPO | Multi-turn email search agent training | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_email_search), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_search_email.html) |
32+
| [open-r1/DAPO-Math-17k-Processed](https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed) | GRPO | Regular RFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dapo_math) |
33+
| [LLM360/guru-RL-92k](https://huggingface.co/datasets/LLM360/guru-RL-92k) | GRPO | Training with bayesian online task selection | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) |
34+
| [Frozen Lake](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) | GRPO | Concatenated multi-turn RFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_frozen_lake) |
35+
| [anisha2102/RaR-Medicine](https://huggingface.co/datasets/anisha2102/RaR-Medicine) | GRPO | Training with rewards from LLM judge and rubrics for a non-verifiable medicine QA task | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
36+
| [Team-ACE/ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) | GRPO | Regular RFT for tool calling | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_toolcall) |
37+
| [hiyouga/geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k) | GRPO | Regular RFT for VLM | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_vlm) |
38+
| | MIX | Training with "expert" data generated by a more advanced LLM | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_vlm) |
39+
| [datajuicer/RealMedConv](https://huggingface.co/datasets/datajuicer/RealMedConv) | GRPO | Regular RFT for learning to ask in a proactive way | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) |
40+
| [datajuicer/Trinity-ToolAce-RL-split](https://huggingface.co/datasets/datajuicer/Trinity-ToolAce-RL-split) | CHORD | Training with dynamic SFT + RL integration | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord) |
41+
| [datajuicer/Trinity-ToolAce-SFT-split](https://huggingface.co/datasets/datajuicer/Trinity-ToolAce-SFT-split) | CHORD | Training with dynamic SFT + RL integration | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord) |
42+
| [Jiayi-Pan/Countdown-Tasks-3to4](https://huggingface.co/datasets/Jiayi-Pan/Countdown-Tasks-3to4) | PPO | Training based on the critic model | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown) |
43+
| | PPO | Training with Megatron-LM as the backend. | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_megatron) |
44+
| | PPO | Training with experience replay | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay) |
45+
| [open-r1/Mixture-of-Thoughts](https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts) | SFT | Regular SFT | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sft_mot), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html#configuration-for-sft) |
46+
| [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset) | DPO | Training based on prepared human preferences | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dpo_humanlike), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html) |
47+
| toy dataset | DPO | Training based on human-in-the-loop real-time preference annotation | [example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dpo_human_in_the_loop), [doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html#example-human-in-the-loop) |

docs/sphinx_doc/source_zh/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@
4242
tutorial/example_dpo.md
4343
tutorial/example_megatron.md
4444
tutorial/example_data_functionalities.md
45+
tutorial/example_dataset_perspective.md
4546

4647
.. toctree::
4748
:maxdepth: 2

0 commit comments

Comments
 (0)