Skip to content

Detailed Training Configuration and Epoch Setting Inquiry #9

@YangJC112

Description

@YangJC112

Detailed Training Configuration and Epoch Setting Inquiry

Issue Description

I am attempting to reproduce a model training process and have referred to the following two training scripts:

torchrun --standalone --nnodes 1 --nproc-per-node 8 vla-scripts/train_generalist_calvin.py \
                                 --dataset_name "calvin" \
                                 --run_root_dir "run_log" \

torchrun --standalone --nnodes 1 --nproc-per-node 8 vla-scripts/train_spacialist_calvin.py \
                                 --num_inference_steps 5 \       # sampling steps for DiT
                                 --cond_drop_chance 0.1 \        # condition drop chance for classifier-free guidance
                                 --with_depth True \             # use depth input
                                 --with_gripper True \           # use gripper-view inputs (both RGB and depth)
                                 --with_tactile True \           # use visuo-tactile input
                                 --batch_size 8 \                # fine-tuning batch size
                                 --learning_rate 1e-4 \          # fine-tuning learning rate
                                 --dataset_name "calvin" \
                                 --run_root_dir "run_log" \

I have the following questions:

Are the parameters specified in these two scripts the complete training configuration? Are all other parameters not explicitly listed using the default values predefined in the code?
If I want to reproduce similar training results, what should the setting for epoch be?

Steps Already Taken

I have attempted to run the training process according to the above scripts. However, due to uncertainty about whether all parameters are using default values and the number of epochs, the training results differ significantly from expectations.

Expected Response

I would appreciate clarification on whether the two scripts include the complete training configuration and guidance on how to determine an appropriate number of epochs.
If there are other important parameters to consider, please let me know as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions