-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
Hi,
Thanks for your fantastic work on the LIBERO benchmark. While reproducing VLA-RL on the LIBERO-spatial suite using the fine-tuned checkpoint, I noticed that the performance tends to degrade as training progresses. I was wondering whether the sparse reward environment might be challenging for training VLAs effectively.
To help better reproduce the experimental results, would it be possible for you to share the weights of the Reward Model you used? Any additional guidance or insights would be greatly appreciated.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
