generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Pull requests: huggingface/trl
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Introduce backend rollout-completions interface and decouple OpenEnv helper from vLLM internals
#5256
opened Mar 10, 2026 by
rycerzes
Loading…
Change default
vllm_mode to "colocate" and add v0→v1 migration guide
#5255
opened Mar 10, 2026 by
qgallouedec
Loading…
batch params together in weight sync and async update the weights
#5249
opened Mar 9, 2026 by
winglian
Loading…
5 tasks
Introduce minimal generation backend interface for GRPO and RLOO trainers
#5244
opened Mar 8, 2026 by
rycerzes
Loading…
feat: log raw importance ratios and fraction of truncation/masking in vLLM importance sampling correction
#5243
opened Mar 8, 2026 by
muupan
Loading…
1 of 5 tasks
[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs
#5242
opened Mar 7, 2026 by
qgallouedec
Loading…
Update openenv examples to use
environment_factory
#5235
opened Mar 6, 2026 by
sergiopaniego
•
Draft
8 tasks
Allow reward functions to log extra columns and scalar metrics
#5233
opened Mar 6, 2026 by
manueldeprada
Loading…
fix(vllm): handle `logprobs=None´ and align logprob docs
#5203
opened Mar 2, 2026 by
LeonEricsson
Loading…
feat(
grpo_trainer.py): Variational Sequence-Level Soft Policy Optimization (VESPO)
#5199
opened Feb 27, 2026 by
casinca
Loading…
4 of 5 tasks
vLLM Server Sync via LoRA Adapter Reload (avoid merge + full weight sync) for GRPO
#5188
opened Feb 26, 2026 by
lfranceschetti
Loading…
Fix title consistency from "Transformer Reinforcement Learning" to "Transformers Reinforcement Learning"
#5183
opened Feb 26, 2026 by
qgallouedec
Loading…
5 tasks
Fix GRPO tool mask alignment after tool-call retokenization
#5145
opened Feb 21, 2026 by
MichalMraz
Loading…
[GKD] Buffer Implementation for Distillation Trainer
#5137
opened Feb 20, 2026 by
cmpatino
Loading…
3 tasks done
feat(experimental): Divergence Proximal Policy Optimization
#5117
opened Feb 17, 2026 by
LeonEricsson
Loading…
5 tasks
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.