Skip to content

[Bug] QWEN3 VL 30B Multi GPU Train #4066

@alien087

Description

@alien087

Did you update? pip install --upgrade unsloth unsloth_zoo Yes
Colab or Kaggle or local / cloud Local
Number GPUs used, use nvidia-smi 2xL40s
Which notebook? Please link! https://pastebin.com/ZG0vf0P9
Which Unsloth version, TRL version, transformers version, PyTorch version? 2026.2.1 0.24.0 4.57.6 2.10.0+cu126
Which trainer? SFTTrainer

Environment;
OS: Ubuntu
Python: 3.11.14
Trainer: SFTTrainer
Model: FastVisionModel
GPU: 2x L40s
Conda Env

Issues:
I'm trying to finetune Qwen/Qwen3-VL-30B-A3B-Instruct with 2x L40s using Multi GPU Sharding thru
device_map = "balanced"
model, tokenizer = FastVisionModel.from_pretrained(
model_name,
load_in_4bit=load_in_4bit,
full_finetuning = False,
use_gradient_checkpointing="unsloth",
device_map = "balanced",

)

full script: https://pastebin.com/ZG0vf0P9

but i got error

torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function add>(*(FakeTensor(..., device='cuda:1', size=(), dtype=torch.int32), FakeTensor(..., device='cuda:0', size=(), dtype=torch.int64)), **{}): got RuntimeError('Unhandled FakeTensor Device Propagation for aten.add.Tensor, found two different devices cuda:1, cuda:0')

from user code:
   File "/home/pitai/.conda/envs/ft-io/lib/python3.11/site-packages/torch/nn/attention/flex_attention.py", line 1645, in _flex_attention_hop_wrapper
    return flex_attention_hop(*args, **kwargs)
  File "/home/pitai/.conda/envs/ft-io/lib/python3.11/site-packages/transformers/masking_utils.py", line 165, in inner_mask
    return mask_function(batch_idx, head_idx, q_idx + q_offset, kv_idx + kv_offset)

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions