Question about the training time

Hi, thank you for the excellent work and open-sourcing this project!

I’m trying to reproduce the training of VLM-3R, but I noticed that the training time seems much longer than expected.
Specifically, using 8×H200 GPUs, the training takes more than 70 hours to complete.

Could you please confirm if this training time is normal for your reported configuration?
If not, could you share any tips for speeding up the training?

Here are some details of my setup:
GPUs: 8×H200
PyTorch: 2.1.1 + CUDA 12.1
FlashAttention: 2.3.3
Training script: (default config)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the training time #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about the training time #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions