Some issues when reproducing

Hello, thank you very much for your excellent work! While reproducing your results, I encountered some issues that I would like to ask for your advice on:

1. During reproduction, I found the training progress to be quite slow. Using 2×A100 GPUs and 10 spatial tasks, after 10 hours I only generated about 330 episodes. According to the paper, fine-tuning for 48 hours yielded 10,000 steps (100k episodes?).

2. I noticed that in the current code, openvla-7b is used as the value model, but it seems that critic warmup was not applied. Is that correct?

3. Looking at the code, it seems that the environment initialization procedure during fine-tuning and evaluation is the same. Is there any distinction between the initialization vectors for training vs. evaluation? Or is it possible that the evaluation could also use environments seen during fine-tuning?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some issues when reproducing #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Some issues when reproducing #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions