We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 67e83ae commit 7a78320Copy full SHA for 7a78320
trl/trainer/rloo_trainer.py
@@ -97,8 +97,8 @@
97
class RLOOTrainer(BaseTrainer):
98
"""
99
Trainer for the Reinforce Leave One Out (RLOO) method. This algorithm was initially proposed in the paper [Back to
100
- Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs]
101
- (https://huggingface.co/papers/2402.14740).
+ Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in
+ LLMs](https://huggingface.co/papers/2402.14740).
102
103
Example:
104
0 commit comments