Question about the performance of GRPO

Hello, 
I run verl-agent/examples/grpo_trainer/run_webshop.sh in the webshop to test the performance of the GRPO baseline and found that the final results are much higher than the results reported in the paper, even higher than SPEAR. Could you please explain why this is happening and why this situation occurs?

<img width="600" height="289" alt="Image" src="https://github.com/user-attachments/assets/b245ee31-0ac9-4b51-b754-bec18c5b1c2b" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the performance of GRPO #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about the performance of GRPO #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions