File tree Expand file tree Collapse file tree 4 files changed +6
-1
lines changed
Expand file tree Collapse file tree 4 files changed +6
-1
lines changed Original file line number Diff line number Diff line change @@ -130,6 +130,7 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor
130130| AsymRE [[ Paper] ( https://arxiv.org/pdf/2506.20520 )] | [[ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k )] | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py )] | ` algorithm_type: asymre ` |
131131| CISPO [[ Paper] ( https://arxiv.org/pdf/2506.13585 )] | - | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py )] | ` algorithm_type: cispo ` |
132132| SAPO [[ Paper] ( https://arxiv.org/pdf/2511.20347 )] | - | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py )] | ` algorithm_type: sapo ` |
133+ | On-Policy Distillation [[ Blog] ( https://qwenlm.github.io/blog/tinker/ )] [[ Paper] ( https://arxiv.org/pdf/2402.10038 )] | [[ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/on_policy_distill )] | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/common/workflows/on_policy_distill_workflow.py )] | ` algorithm_type: on_policy_distill ` |
133134
134135
135136
@@ -142,7 +143,7 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor
142143 - [ Step 2: prepare dataset and model] ( #step-2-prepare-dataset-and-model )
143144 - [ Step 3: configurations] ( #step-3-configurations )
144145 - [ Step 4: run the RFT process] ( #step-4-run-the-rft-process )
145- - [ Contribution guide ] ( #contribution-guide )
146+ - [ Contribution Guide ] ( #contribution-guide )
146147- [ Acknowledgements] ( #acknowledgements )
147148- [ Citation] ( #citation )
148149
Original file line number Diff line number Diff line change @@ -129,6 +129,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
129129| AsymRE [[ 论文] ( https://arxiv.org/pdf/2506.20520 )] | [[ GSM8K 例子] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k )] | [[ 代码] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py )] | ` algorithm_type: asymre ` |
130130| CISPO [[ 论文] ( https://arxiv.org/pdf/2506.13585 )] | - | [[ 代码] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py )] | ` algorithm_type: cispo ` |
131131| SAPO [[ 论文] ( https://arxiv.org/pdf/2511.20347 )] | - | [[ 代码] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py )] | ` algorithm_type: sapo ` |
132+ | On-Policy Distillation [[ 博客] ( https://qwenlm.github.io/blog/tinker/ )] [[ 论文] ( https://arxiv.org/pdf/2402.10038 )] | [[ GSM8K 示例] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/on_policy_distill )] | [[ 代码] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/common/workflows/on_policy_distill_workflow.py )] | ` algorithm_type: on_policy_distill ` |
132133
133134
134135
Original file line number Diff line number Diff line change @@ -86,6 +86,8 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor
8686| AsymRE [[ Paper] ( https://arxiv.org/pdf/2506.20520 )] | [[ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k )] | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py )] | ` algorithm_type: asymre ` |
8787| CISPO [[ Paper] ( https://arxiv.org/pdf/2506.13585 )] | - | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py )] | ` algorithm_type: cispo ` |
8888| SAPO [[ Paper] ( https://arxiv.org/pdf/2511.20347 )] | - | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py )] | ` algorithm_type: sapo ` |
89+ | On-Policy Distillation [[ Blog] ( https://qwenlm.github.io/blog/tinker/ )] [[ Paper] ( https://arxiv.org/pdf/2402.10038 )] | [[ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/on_policy_distill )] | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/common/workflows/on_policy_distill_workflow.py )] | ` algorithm_type: on_policy_distill ` |
90+
8991
9092
9193
Original file line number Diff line number Diff line change @@ -82,6 +82,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
8282| AsymRE [[ 论文] ( https://arxiv.org/pdf/2506.20520 )] | [[ GSM8K 例子] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k )] | [[ 代码] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py )] | ` algorithm_type: asymre ` |
8383| CISPO [[ 论文] ( https://arxiv.org/pdf/2506.13585 )] | - | [[ 代码] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py )] | ` algorithm_type: cispo ` |
8484| SAPO [[ 论文] ( https://arxiv.org/pdf/2511.20347 )] | - | [[ 代码] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py )] | ` algorithm_type: sapo ` |
85+ | On-Policy Distillation [[ 博客] ( https://qwenlm.github.io/blog/tinker/ )] [[ 论文] ( https://arxiv.org/pdf/2402.10038 )] | [[ GSM8K 示例] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/on_policy_distill )] | [[ 代码] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/common/workflows/on_policy_distill_workflow.py )] | ` algorithm_type: on_policy_distill ` |
8586
8687
8788
You can’t perform that action at this time.
0 commit comments