-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Hello, thank you very much for your excellent work! While reproducing your results, I encountered some issues that I would like to ask for your advice on:
-
During reproduction, I found the training progress to be quite slow. Using 2×A100 GPUs and 10 spatial tasks, after 10 hours I only generated about 330 episodes. According to the paper, fine-tuning for 48 hours yielded 10,000 steps (100k episodes?).
-
I noticed that in the current code, openvla-7b is used as the value model, but it seems that critic warmup was not applied. Is that correct?
-
Looking at the code, it seems that the environment initialization procedure during fine-tuning and evaluation is the same. Is there any distinction between the initialization vectors for training vs. evaluation? Or is it possible that the evaluation could also use environments seen during fine-tuning?