Qwen3-VL 8B Training Running excessively Slow #9646
-
|
Hello Everyone, I am trying to perform distributed LoRa training of Qwen3-VL 8B model on my custom data. Here is my training config, I had a similar training config while performing qwen2.5 vl 7b experiments and it ran perfectly fine. However in qwen3-vl, the training is happening exorbitatnly slow. Moreover, I have to add FORCE_TORCH_RUN=1 parameter inorder for the training to begin correctly. The issue I am observing is VRAM utilisation is more than 85% but GPU utilisation is less than 40% for both the GPUs. What I have tried
Happy to share more details if required :) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
看看你的torch版本,不要大于2.9.0及以上 |
Beta Was this translation helpful? Give feedback.
看看你的torch版本,不要大于2.9.0及以上