agentscope-ai · pan-x-c · Jul 15, 2025 · Jul 14, 2025 · Jul 14, 2025 · Jul 14, 2025
diff --git a/README.md b/README.md
diff --git a/docs/sphinx_doc/assets/config-manager.png b/docs/sphinx_doc/assets/config-manager.png
diff --git a/docs/sphinx_doc/assets/trinity-architecture.pdf b/docs/sphinx_doc/assets/trinity-architecture.pdf
diff --git a/docs/sphinx_doc/assets/trinity-architecture.png b/docs/sphinx_doc/assets/trinity-architecture.png
diff --git a/docs/sphinx_doc/assets/trinity-data-pipeline-buffer.png b/docs/sphinx_doc/assets/trinity-data-pipeline-buffer.png
diff --git a/docs/sphinx_doc/assets/trinity-data-pipelines.png b/docs/sphinx_doc/assets/trinity-data-pipelines.png
diff --git a/docs/sphinx_doc/assets/trinity-mix.png b/docs/sphinx_doc/assets/trinity-mix.png
diff --git a/docs/sphinx_doc/assets/trinity-mode.png b/docs/sphinx_doc/assets/trinity-mode.png
diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md
diff --git a/docs/sphinx_doc/source/tutorial/example_mix_algo.md b/docs/sphinx_doc/source/tutorial/example_mix_algo.md
@@ -20,6 +20,11 @@ $$
 The first term corresponds to the standard GRPO objective, which aims to maximize the expected reward. The last term is an auxiliary objective defined on expert data, encouraging the policy to imitate expert behavior. $\mu$ is a weighting factor that controls the relative importance of the two terms.
 
 
+
+A visualization of this pipeline is as follows:
+
+![](../../assets/trinity-mix.png)
+
 ## Step 0: Prepare the Expert Data
 
 We prompt a powerful LLM to generate responses with the CoT process for some pre-defined questions. The collected dta are viewed as some experiences from an expert. We store them in a `jsonl` file `expert_data.jsonl` with the following format: