Skip to content

Some issues when reproducing #14

@szl2001

Description

@szl2001

Hello, thank you very much for your excellent work! While reproducing your results, I encountered some issues that I would like to ask for your advice on:

  1. During reproduction, I found the training progress to be quite slow. Using 2×A100 GPUs and 10 spatial tasks, after 10 hours I only generated about 330 episodes. According to the paper, fine-tuning for 48 hours yielded 10,000 steps (100k episodes?).

  2. I noticed that in the current code, openvla-7b is used as the value model, but it seems that critic warmup was not applied. Is that correct?

  3. Looking at the code, it seems that the environment initialization procedure during fine-tuning and evaluation is the same. Is there any distinction between the initialization vectors for training vs. evaluation? Or is it possible that the evaluation could also use environments seen during fine-tuning?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions