Skip to content

feat(grpo_trainer.py): Variational Sequence-Level Soft Policy Optimization (VESPO) #15505

feat(grpo_trainer.py): Variational Sequence-Level Soft Policy Optimization (VESPO)

feat(grpo_trainer.py): Variational Sequence-Level Soft Policy Optimization (VESPO) #15505