Skip to content

Question about the training time #37

@yanchi-3dv

Description

@yanchi-3dv

Hi, thank you for the excellent work and open-sourcing this project!

I’m trying to reproduce the training of VLM-3R, but I noticed that the training time seems much longer than expected.
Specifically, using 8×H200 GPUs, the training takes more than 70 hours to complete.

Could you please confirm if this training time is normal for your reported configuration?
If not, could you share any tips for speeding up the training?

Here are some details of my setup:
GPUs: 8×H200
PyTorch: 2.1.1 + CUDA 12.1
FlashAttention: 2.3.3
Training script: (default config)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions