modelscope
diff --git a/‎README.md‎
Lines changed: 138 additions & 83 deletions b/‎README.md‎
Lines changed: 138 additions & 83 deletions
diff --git a/‎docs/sphinx_doc/assets/config-manager.png‎
1.01 MB b/‎docs/sphinx_doc/assets/config-manager.png‎
1.01 MB
diff --git a/‎docs/sphinx_doc/assets/trinity-architecture.pdf‎
59 KB b/‎docs/sphinx_doc/assets/trinity-architecture.pdf‎
59 KB
diff --git a/‎docs/sphinx_doc/assets/trinity-architecture.png‎
205 KB b/‎docs/sphinx_doc/assets/trinity-architecture.png‎
205 KB
diff --git a/‎docs/sphinx_doc/assets/trinity-data-pipeline-buffer.png‎
761 KB b/‎docs/sphinx_doc/assets/trinity-data-pipeline-buffer.png‎
761 KB
diff --git a/‎docs/sphinx_doc/assets/trinity-data-pipelines.png‎
134 KB b/‎docs/sphinx_doc/assets/trinity-data-pipelines.png‎
134 KB
diff --git a/‎docs/sphinx_doc/assets/trinity-mix.png‎
379 KB b/‎docs/sphinx_doc/assets/trinity-mix.png‎
379 KB
diff --git a/‎docs/sphinx_doc/assets/trinity-mode.png‎
668 KB b/‎docs/sphinx_doc/assets/trinity-mode.png‎
668 KB
diff --git a/‎docs/sphinx_doc/source/main.md‎
Lines changed: 157 additions & 96 deletions b/‎docs/sphinx_doc/source/main.md‎
Lines changed: 157 additions & 96 deletions
diff --git a/‎docs/sphinx_doc/source/tutorial/example_mix_algo.md‎
Lines changed: 5 additions & 0 deletions b/‎docs/sphinx_doc/source/tutorial/example_mix_algo.md‎
Lines changed: 5 additions & 0 deletions
@@ -20,6 +20,11 @@ $$
 The first term corresponds to the standard GRPO objective, which aims to maximize the expected reward. The last term is an auxiliary objective defined on expert data, encouraging the policy to imitate expert behavior. $\mu$ is a weighting factor that controls the relative importance of the two terms.
 
 
+
+A visualization of this pipeline is as follows:
+
+![](../../assets/trinity-mix.png)
+
 ## Step 0: Prepare the Expert Data
 
 We prompt a powerful LLM to generate responses with the CoT process for some pre-defined questions. The collected dta are viewed as some experiences from an expert. We store them in a `jsonl` file `expert_data.jsonl` with the following format: