Problem with gemma-3 when using grpo and trl vllm-serve #3913

juu9802 · 2025-08-18T00:17:47Z

juu9802
Aug 18, 2025

I’d like to share an issue I encountered and resolved while training the Gemma-3 Instruction model with vLLM serve and GRPO.

When running training with Gemma-3 Instruction model through a vLLM-based server, I ran into an issue where the KL loss diverged. To investigate, I checked the part where KL was being calculated based on log probabilities.

It turned out that when calling generate on the vLLM-serve endpoint, the responses were returned without an EOS token, and because of this, the mask was not applied correctly, leading to incorrect KL computation.

To fix this, I added logic in the _prepare_inputs function of GRPOTrainer.py to replace the first occurrence of a pad token with an EOS token.

It seems this issue arises due to the characteristic of the Gemma-3 family of models where EOS tokens are not generated. Hopefully, this helps others facing the same problem, so I’m sharing it here in the Discussion. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem with gemma-3 when using grpo and trl vllm-serve #3913

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Problem with gemma-3 when using grpo and trl vllm-serve #3913

Uh oh!

Uh oh!

juu9802 Aug 18, 2025

Replies: 0 comments

juu9802
Aug 18, 2025