We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent e1098ce commit 883b428Copy full SHA for 883b428
docs/training/rlhf.md
@@ -12,4 +12,5 @@ See the following basic examples to get started if you don't want to use an exis
12
13
See the following notebooks showing how to use vLLM for GRPO:
14
15
+- [Efficient Online Training with GRPO and vLLM in TRL](https://huggingface.co/learn/cookbook/grpo_vllm_online_training)
16
- [Qwen-3 4B GRPO using Unsloth + vLLM](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb)
0 commit comments