Merge pull request #319 from behroozazarkhalili/add-grpo-advanced-reward-notebook

sergiopaniego · web-flow · commit 152ea5cba9a9 · 2025-08-27T16:57:59.000+02:00
Add TRL GRPO Reasoning with Advanced Reward notebook
diff --git a/notebooks/en/_toctree.yml b/notebooks/en/_toctree.yml
@@ -78,6 +78,8 @@
           title: Scaling Test-Time Compute for Longer Thinking in LLMs
         - local: fine_tuning_llm_grpo_trl
           title: Post training an LLM for reasoning with GRPO in TRL
+        - local: trl_grpo_reasoning_advanced_reward
+          title: TRL GRPO Reasoning with Advanced Reward
         - local: medical_rag_and_reasoning
           title: HuatuoGPT-o1 Medical RAG and Reasoning
         - local: fine_tune_chatbot_docs_synthetic
diff --git a/notebooks/en/index.md b/notebooks/en/index.md
@@ -8,9 +8,9 @@ applications and solving various machine learning tasks using open-source tools
 Check out the recently added notebooks:
 
 - [Post training an VLM for reasoning with GRPO using TRL](fine_tuning_vlm_grpo_trl)
+- [TRL GRPO Reasoning with Advanced Reward](trl_grpo_reasoning_advanced_reward)
 - [Fine-Tuning a Vision Language Model with TRL using MPO](fine_tuning_vlm_mpo)
 - [Fine tuning a VLM for Object Detection Grounding using TRL](fine_tuning_vlm_object_detection_grounding)
-- [Hyperparameter Optimization with Optuna and Transformers](optuna_hpo_with_transformers)
 - [Fine-tuning T5 for Automatic GitHub Tag Generation with PEFT](finetune_t5_for_search_tag_generation)
 
 You can also check out the notebooks in the cookbook's [GitHub repo](https://github.com/huggingface/cookbook).
diff --git a/notebooks/en/trl_grpo_reasoning_advanced_reward.ipynb b/notebooks/en/trl_grpo_reasoning_advanced_reward.ipynb