-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
Hi, thank you for the excellent work and open-sourcing this project!
I’m trying to reproduce the training of VLM-3R, but I noticed that the training time seems much longer than expected.
Specifically, using 8×H200 GPUs, the training takes more than 70 hours to complete.
Could you please confirm if this training time is normal for your reported configuration?
If not, could you share any tips for speeding up the training?
Here are some details of my setup:
GPUs: 8×H200
PyTorch: 2.1.1 + CUDA 12.1
FlashAttention: 2.3.3
Training script: (default config)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels