Request for Reward Model Checkpoint

Hi,

Thanks for your fantastic work on the LIBERO benchmark. While reproducing VLA-RL on the LIBERO-spatial suite using the fine-tuned checkpoint, I noticed that the performance tends to degrade as training progresses. I was wondering whether the sparse reward environment might be challenging for training VLAs effectively.

To help better reproduce the experimental results, would it be possible for you to share the weights of the Reward Model you used? Any additional guidance or insights would be greatly appreciated.

![Image](https://github.com/user-attachments/assets/f3b272e3-ad56-4f20-9003-070d7d5fac3f)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Reward Model Checkpoint #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Request for Reward Model Checkpoint #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions